Design of Power-Efficient Optical Transceivers and Design of High-Linearity Wireless Wideband Receivers by Zhang, Yudong
Design of Power-Efficient Optical Transceivers and Design of
High-Linearity Wireless Wideband Receivers
Yudong Zhang
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
under the Executive Committee







Design of Power-Efficient Optical Transceivers and Design of High-Linearity Wireless Wideband
Receivers
Yudong Zhang
The combination of silicon photonics and advanced heterogeneous integration is promising
for next-generation disaggregated data centers that demand large scale, high throughput, and low
power. In this dissertation, we discuss the design and theory of power-efficient optical transceivers
with System-in-Package (SiP) 2.5D integration. Combining prior arts and proposed circuit tech-
niques, a receiver chip and a transmitter chip including two 10 Gb/s data channels and one 2.5 GHz
clocking channel are designed and implemented in 28 nm CMOS technology. An innovative tran-
simpedance amplifier (TIA) and a single-ended to differential (S2D) converter are proposed and
analyzed for a low-voltage high-sensitivity receiver; a four-to-one serializer, programmable output
drivers, AC coupling units, and custom pads are implemented in a low-power transmitter; an im-
proved quadrature locked loop (QLL) is employed to generate accurate quadrature clocks. In addi-
tion, we present an analysis for inverter-based shunt-feedback TIA to explicitly depict the trade-off
among sensitivity, data rate, and power consumption. At last, the research on CDR-based clocking
schemes for optical links is also discussed. We introduce prior arts and propose a power-efficient
clocking scheme based on an injection-locked phase rotator.
Next, we analyze injection-locked ring oscillators (ILROs) that have been widely used for
quadrature clock generators (QCGs) in multi-lane optical or wireline transceivers due to their low
power, low area, and technology scalability. The asymmetrical or partial injection locking from 2
phases to 4 phases results in imbalances in amplitude and phase. We propose a modified frequency-
domain analysis to provide intuitive insight into the performance design trade-offs. The analysis is
validated by comparing analytical predictions with simulations for an ILRO-based QCG in 28 nm
CMOS technology.
This dissertation also discusses the design of high-linearity wireless wideband receivers. An
out-of-band (OB) IM3 cancellation technique is proposed and analyzed. By exploiting a baseband
auxiliary path (AP) with a high-pass feature, the in-band (IB) desired signal and out-of-band in-
terferers are split. OB third-order intermodulation products (IM3) are reconstructed in the AP and
cancelled in the baseband (BB). A 0.5-2.5 GHz frequency-translational noise-cancelling (FTNC)
receiver is implemented in 65 nm CMOS to demonstrate the proposed approach. It consumes
36 mW without cancellation at 1 GHz LO frequency and 1.2 V supply, and it achieves 8.8 MHz
baseband bandwidth, 40 dB gain, 3.3 dB NF, 5 dBm OB IIP3, and −6.5 dBm OB B1dB. After
IM3 cancellation, the effective OB-IIP3 increases to 32.5 dBm with an extra 34 mW for narrow-
band interferers (two tones). For wideband interferers, 18.8 dB cancellation is demonstrated over
10 MHz with two −15 dBm modulated interferers. The local oscillator (LO) leakage is −92 dBm
and −88 dB at 1 GHz and 2 GHz LO respectively. In summary, this technique achieves both high
OB linearity and good LO isolation.
Table of Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Optical Transceivers with Heterogeneous 2.5D Integration for Data Centers . . . . 1
1.1.1 Silicon photonics technologies in data centers . . . . . . . . . . . . . . . . 1
1.1.2 Heterogeneous integration in HPC and data centers . . . . . . . . . . . . . 3
1.1.3 Transciever design, theory, and research for optical links . . . . . . . . . . 5
1.2 High-linearity Wideband Wireless Receivers for Wireless Communications . . . . 7
1.2.1 Applications of wideband wireless receivers . . . . . . . . . . . . . . . . . 7
1.2.2 Evolution of wideband wireless receivers . . . . . . . . . . . . . . . . . . 8
Chapter 2: A Power-Efficient Transceiver in 28 nm CMOS for An Optical Link with 2.5D
SiP Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Link Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Design of High-Sensitivity Low-Power Receiver . . . . . . . . . . . . . . . . . . . 14
2.2.1 TIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Analysis of the proposed TIA . . . . . . . . . . . . . . . . . . . . . . . . . 17
i
2.2.3 S2D converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Analysis of the proposed S2D converter . . . . . . . . . . . . . . . . . . . 22
2.2.5 VGA and level shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.6 Summer and 1-bit DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.7 Mismatch calibration for the receiver . . . . . . . . . . . . . . . . . . . . . 29
2.3 Design of Low-Power Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Four-to-one serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Output driver, AC coupling unit and custom pads . . . . . . . . . . . . . . 31
2.4 Clocking Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Quadrature locked loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.2 Delay line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 The Tradeoff Between Noise, Data Rate, and Power Consumption of Transimpedance
Amplifiers for Optical Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.1 Proposed analytical solutions . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.2 Results in 65 nm CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.3 Quality factor, channel length, and input parasitic capacitance . . . . . . . 50
2.6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7 Research on CDR-Based Clocking Scheme . . . . . . . . . . . . . . . . . . . . . 57
2.7.1 CDR-based clocking scheme in optical links . . . . . . . . . . . . . . . . . 57
2.7.2 Previous CDR-based clocking schemes . . . . . . . . . . . . . . . . . . . 58
2.7.3 Proposed power-efficient clocking scheme . . . . . . . . . . . . . . . . . . 63
ii
Chapter 3: Analysis of Injection-Locked Ring Oscillators for Quadrature Clock Genera-
tion in Wireline or Optical Transceivers . . . . . . . . . . . . . . . . . . . . . . 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Existing Analysis Techniques for ILROs . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.1 Frequency-domain analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.2 Time-domain analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2.3 Phase-domain-response analysis . . . . . . . . . . . . . . . . . . . . . . . 75
3.3 Modified Frequency-Domain Analysis for ILRO-based QCG . . . . . . . . . . . . 75
3.3.1 Basis of frequency-domain analysis for RO . . . . . . . . . . . . . . . . . 75
3.3.2 Modified frequency-domain analysis for quadrature RO . . . . . . . . . . . 78
3.3.3 Modified frequency-domain analysis for ILRO-based QCG . . . . . . . . . 80
3.4 Simulation Results and Design Trade-offs . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Chapter 4: An Out-of-Band IM3 Cancellation Technique for Wideband Wireless Receivers 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 IM3 Cancellation Using A Baseband Auxiliary Path . . . . . . . . . . . . . . . . . 94
4.2.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2.2 Design considerations for the auxiliary Path . . . . . . . . . . . . . . . . . 95
4.2.3 Design of the current buffer in the AP . . . . . . . . . . . . . . . . . . . . 97
4.2.4 Cancellation of IM3 products . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.5 NF after IM3 product cancellation . . . . . . . . . . . . . . . . . . . . . . 103
4.3 Implementation of An FTNC Receiver with IM3 Cancellation . . . . . . . . . . . . 104
4.3.1 Schematic of the proposed receiver . . . . . . . . . . . . . . . . . . . . . . 104
iii
4.3.2 Circuit implementation of the building blocks . . . . . . . . . . . . . . . . 105
4.3.3 Implementation of the DSP . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.1 Performance without cancellation . . . . . . . . . . . . . . . . . . . . . . 110
4.4.2 Cancellation of IM3 of two-tone interferers . . . . . . . . . . . . . . . . . 111
4.4.3 Cancellation of IM3 of modulated interferers . . . . . . . . . . . . . . . . 113
4.4.4 Discussion and comparison to the state of the art . . . . . . . . . . . . . . 116
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter 5: Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
iv
List of Tables
2.1 Targeted design parameters proposed by the link designer in this project. . . . . . . 13
2.2 Post-layout simulated mismatch between VthFwd and VthFB over process and
temperature variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Summary of the post-layout simulated results in Fig. 2.10 . . . . . . . . . . . . . . 21
2.4 Comparison table of different S2D architectures . . . . . . . . . . . . . . . . . . . 22
2.5 Comparison between the proposed tranceiver and prior arts . . . . . . . . . . . . . 40
2.6 Comparison between previous methods and the proposed method for the calcula-
tion of the optimal TIA noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Comparison to the state-of-the-art high linearity receivers. . . . . . . . . . . . . . . 117
v
List of Figures
1.1 (a) Architecture of traditional server (left) and disaggregated resouce pools (right)
in data centers; (b) connections of disaggregated resouce pools with bandwidth
steering using photonic switches . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Prior implementations combining with silicon photonic integrated circuit (PIC)
and electrical integrated circuit (EIC): (a) monolithic integration; (b) 2D integra-
tion; (c) 3D integration (EIC on the top); (d) 3D integration (PIC on the top); (e)
3D integration with a ceramic interposer. . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Prior heterogenous integration from: (a) AMD Fury [13]; (b) Intel EMIB [14]; (c)
NVIDIA Tesla [15]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 (a) System in Package (SiP) with advanced 2.5D technology; (b) the state-of-the-
art architecture combining SiP with photonics; (b) the concept of the implementa-
tion combining SiP with photonics in this thesis. . . . . . . . . . . . . . . . . . . . 6
1.5 Evolution of wireless wideband receivers with: (a) wideband LNA; (b) wideband
noise-cancelling LNA; (c) frequency translational noise cancelling (FTNC); (d)
N-path filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Schematic of a unidirectional silicon photonic link [25]. (1) Clock generation and
serialization. (2) Electrical transmitter (Tx). (3) Optical modulator and thermal
tuning. (4) Optical loss along the link. (5) Key assumptions for optical devices.
(6) An example for electrical receiver sensitivity. (7) Model of electrical receiver
sensitivity vs. data rate. (8) Microring demultiplexing array with thermal tuning
followed by photodetectors. (9) Maximum available optical power budget. (10)
Electrical receiver. (11) Deserialization of the electrical data. . . . . . . . . . . . . 12
2.2 Block diagram of the proposed link shown in the top view and side view, includ-
ing a Rx chip, a Tx chip, and a photonic chip, an interposer, PCB, and 2.5D SiP
packaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Schematic for the proposed receiver chip. . . . . . . . . . . . . . . . . . . . . . . 14
vi
2.4 Topologies for shunt feedback TIA with (a) an inverter amplifier; (b) a cascode
inverter amplifier; (c) multi-stage inverter-based amplifier; (d) an amplifier with
PMOS cascoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Schematic for the proposed receiver TIA. . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Block diagram of the offset calibration for TIAs . . . . . . . . . . . . . . . . . . . 16
2.7 Post-layout simulated fine tuning and coarse tuning for the VthFB . . . . . . . . . . 18
2.8 Model of the feedback DC offset calibration of the proposed TIA . . . . . . . . . . 19
2.9 Topologies for S2D converters: (a) a unity-gain inverter without a dummy TIA;
(b) a fully-differential pair with CMFB; (c) cross-coupled inverters; (d) proposed
inverter-based Cherry-Hooper amplifier with embedded CMFB. . . . . . . . . . . 20
2.10 Post-layout simulated differential gain (GD), common-mode gain (GC), and CMRR
of the proposed S2D over process variation. . . . . . . . . . . . . . . . . . . . . . 20
2.11 Small-signal model for the differential gain of the proposed S2D converter . . . . . 22
2.12 Small-signal model for the common-mode gain of the proposed S2D converter . . . 24
2.13 Schematic for the VGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.14 Schematic for the level shifter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.15 Schematic for one slice in 1-bit DFE. . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.16 Schematic for the proposed transmitter chip. . . . . . . . . . . . . . . . . . . . . . 30
2.17 Schematic for the proposed four-to-one serializer. . . . . . . . . . . . . . . . . . . 30
2.18 Schematic for the proposed output driver, AC coupling unit, and custom pads. . . . 31
2.19 Schematic for QLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.20 Schematic for the proposed operational transconductance amplifier in the QLL. . . 32
2.21 Post-layout simulated IQ mismatches of proposed quadrature clock generator: (a)
over temperature variation; (b) over voltage variation. 0 − 90, 90 − 180, 180 −
270, 270 − 0 represent the IQ mismatches from different phases, respectively. . . . 34
2.22 Schematic for the delay line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
vii
2.23 Die photos of the proposed (a) Rx chip and (b) Tx chip. . . . . . . . . . . . . . . . 35
2.24 Designed PCB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.25 Post-layout simulated frequency response of the analog circuits in RxD. . . . . . . 36
2.26 Post-layout simulated pulse response of the analog circuits in RxD. . . . . . . . . . 36
2.27 Post-layout simulated eye diagrams of the analog circuits in RxD before/after 1-
bit DFE calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.28 Post-layout simulated input referred noise spectral density. . . . . . . . . . . . . . 38
2.29 Post-layout simulated phase noise performance of the oscillator and the proposed
QLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.30 Post-layout simulated eye diagram of TxD. . . . . . . . . . . . . . . . . . . . . . . 39
2.31 Detailed power consumption in each (a) RxD and (b) TxD. . . . . . . . . . . . . . 39
2.32 Schematic of TIA with an inverter-based amplifier. . . . . . . . . . . . . . . . . . 41
2.33 Small-signal model of TIA with an inverter-based amplifier. . . . . . . . . . . . . . 42
2.34 Numerical solutions of (2.25) the relationship among data rate, TIA power con-
sumption, and (a) sensitivity (noise); (b) feedback resistance RF ; (c) load capaci-
tance Co. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.35 (a) The 2D plot for the tradeoff among noise (sensitivity), data rate, and power
consumption from Fig. 2.34(a); (b) The fitting functions for the minimum sen-
sitivities and their power, and minimum power and their sensitivities, with the
X-axis fDR (GHz). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.36 The ratio of CI/CD of each input data rate (Gb/s) at the best sensitivity. . . . . . . . 49
2.37 (a) Comparing the calculated minimum sensitivity and simulated minimum sensi-
tivity; (b) comparing the calculated and simulated noise from feedback resistance
RF and the FET channel at the minimum sensitivity. . . . . . . . . . . . . . . . . . 50
2.38 Calculated results with Q = 0.5, Q = 0.577, and Q = 0.707 at 10 Gb/s and
20 Gb/s: (a) the relationship between sensitivity and power consumption; (b) and
(c) the relationship between power consumption and the corresponding RF and Co. . 51
viii
2.39 Calculated results with L = 60 nm, L = 70 nm, and L = 80 nm at 10 Gb/s and
20 Gb/s: (a) the relationship between sensitivity and power consumption; (b) and
(c) the relationship between power consumption and the corresponding RF and Co. . 52
2.40 Calculated results with CD = 60 fF, CD = 80 fF, and CD = 100 fF at 10 Gb/s and
20 Gb/s: (a) the relationship between sensitivity and power consumption; (b) and
(c) the relationship between power consumption and the corresponding RF and Co. . 54
2.41 Schematic for the calculation of input-referred noise. . . . . . . . . . . . . . . . . 55
2.42 Block diagram of a receive side of a multi-lane wireline or optical transceiver
device including a phase adjuster. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.43 The classical clocking scheme for multi-lane wireline or optical transceivers as
Prior Art 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.44 Building blocks in Prior Art 1: (a) schematic of a frequency divider; (b) block
diagram of a duty cycle correction; (c) schematic of a classical phase interpolator
(PI); (d) phase constellation of the classical PI . . . . . . . . . . . . . . . . . . . . 59
2.45 The classical clocking scheme based on an injection-locked mulitphase generator
as Prior Art 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.46 Building blocks in Prior Art 2: (a) schematic of an injection-locked multiphase
generator; (b) schematic of a phase interpolator (PI) with 8-phase inputs; (c)
phase constellation of this PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.47 The classical clocking scheme with an injection-locked phase rotator as Prior Art 3 62
2.48 The proposed power-efficient clocking scheme . . . . . . . . . . . . . . . . . . . . 63
2.49 Schematic of the proposed phase adjuster . . . . . . . . . . . . . . . . . . . . . . 64
2.50 Schematic of the proposed 64-phase fully-differential oscillator, which is imple-
mented as 16 four-phase fully differential cross coupled sub-oscillators. . . . . . . 64
2.51 A schematic showing internal connectivity of the proposed injection-locked phase
rotator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.52 Two methods for the implementation of injection locking: (a) injection locking
with a single-ended signal by shorting the differential stages through NMOS; (b)
injection locking with a pair of differential signals through two inverters. . . . . . . 67
2.53 Issues of injection locking with a single-ended signal by shorting the differential
stages through NMOM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
ix
3.1 (a) Quadrature clock generator (QCG) in a multi-lane wireline or optical receiver
with a clock data recovery (CDR) based clocking scheme, (b) with a feedforward
clocking scheme; (c) QCG in a multi-lane wireline or optical transmitter. . . . . . . 70
3.2 Two architectures for QCG: (a) based on a frequency divider (FD); (b) based on
an injection-locked ring oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Design trade-offs in QCGs with injection-locked LC oscillators [68]: (a) the re-
lationship between f0 and jitter performance; (b) the relationship between f0 and
IQ mismatch; design trade-offs in ILRO-based QCGs: (c) the relationship be-
tween f0 and jitter performance; (d) the relationship between f0 and IQ mismatch. . 72
3.4 Model of the LC oscillator used in traditional frequency-domain analysis. . . . . . 74
3.5 (a) Model for an odd N-stage ILRO; (b) a simplified model of (a) with an ideal
inverter in the loop [73]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6 (a) Amplitude-frequency and phase-frequency response of the impedance ZL of
RC tank; (b) current phasor diagram . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.7 (a) Schematic of the differentical quadrature RO; (b) frequency-domain model
of the differentical quadrature RO; (c) current phasor diagram of the differentical
quadrature RO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8 (a) Schematic of the ILRO-based QCG; (b) frequency-domain model of the ILRO-
based QCG ; (c) current phasor diagram of the ILRO-based QCG. . . . . . . . . . 81
3.9 Solutions of (3.10) with initial conditions and approximation (3.11), when k =
0.3, fin j = 2.5 GHz, and β = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.10 Relationship between intrinsic frequency f0 and IQ mismatch in the proposed
analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.11 Relationship between θ and φ′s1 and its derivative dφ
′
s1/dθ in the proposed analysis. 84
3.12 Schematic of the testbench. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.13 Calculated and simulated results at injection frequency fin j = 2.5 GHz and injec-
tion strength β = 0.1. (a) the relationship between amplitude ratio a1, a2, injec-
tion included angle θ, and intrinsic frequency f0; (b) the relationship between IQ
mismatch ε , jitter at injection stages, jitter at non-injection stages, and f0. . . . . . 85
x
3.14 Calculated and simulated results at injection frequency fin j = 2.5 GHz and injec-
tion strength β = 0.2. (a) the relationship between amplitude ratio a1, a2, injec-
tion included angle θ, and intrinsic frequency f0; (b) the relationship between IQ
mismatch ε , jitter at injection stages, jitter at non-injection stages, and f0. . . . . . 86
3.15 Calculated and simulated results at injection frequency fin j = 7 GHz and injec-
tion strength β = 0.1. (a) the relationship between amplitude ratio a1, a2, injec-
tion included angle θ, and intrinsic frequency f0; (b) the relationship between IQ
mismatch ε , jitter at injection stages, jitter at non-injection stages, and f0. . . . . . 87
3.16 Calculated and simulated results at injection frequency fin j = 7 GHz and injec-
tion strength β = 0.2. (a) the relationship between amplitude ratio a1, a2, injec-
tion included angle θ, and intrinsic frequency f0; (b) the relationship between IQ
mismatch ε , jitter at injection stages, jitter at non-injection stages, and f0. . . . . . 88
4.1 Wideband receiver desensitization mechanisms. . . . . . . . . . . . . . . . . . . . 92
4.2 Existing approaches to achieve high OB linearity in wideband receivers: (a) mixer-
first receiver; (b) a receiver preceded by an on-chip, N-path band-pass, RF filter;
(c) frequency-translational noise-cancelling (FTNC) receiver; (d) a receiver with
frequency-selective feedback; (e) a receiver with an RF, IM3-cancellation path. . . 93
4.3 Concept of the proposed IM3 cancellation approach using a baseband auxiliary
path for LNTA-based receivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4 Impact of the auxiliary path (AP) on the equivalent input baseband impedance
(ZBB) presented to the passive mixer by the baseband TIA: (a) without the auxil-
iary path; (b) with the auxiliary path. . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Design of the current buffer (CB) in the auxiliary path: (a) the connections for
the differential CB; (b) the comparison of the equivalent input impedance from
three possible CB implementations; (c) CB with passive components; (d) oper-
ational ampilfier based CB; (e) the proposed low-input impedance CB with pro-
grammable gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6 (a) Model for the transmitter; (b) mathematical model for the proposed receiver
for IM3 products cancellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Two cases for strong OB interferers; Case 1: two OB interferers located at one
side of the LO signal. Case 2: two OB interferers located at different sides of the
LO signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.8 The schematic of the proposed system including chip, PCBs, and DSP. . . . . . . . 104
xi
4.9 The schematic of CS LNTA, CG LNTA, cubing circuits and TIA. . . . . . . . . . . 106
4.10 The block diagram for the DSP in Matlab. . . . . . . . . . . . . . . . . . . . . . . 107
4.11 The measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.12 (a) Die photo; (b) measured and simulated S11; (c) measured conversion gain of
the MP (from node 1 to node 3 in Fig. 3) and AP (from node 1 to node 4 in Fig.
3) ; (d) measured IIP3 and B1dB vs. blocker offset frequency; (e) measured NF
vs. BB offset frequency at 1 GHz LO frequency; (f) measured IIP3 and B1dB vs.
LO frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.13 The measured LO leakage power vs. LO frequency. . . . . . . . . . . . . . . . . . 110
4.14 Measured two-tone cancellation including the IM3 products and noise before/after
cancellation: (a) input interferers located at 1.1 GHz and 1.197 GHz with 1 GHz
LO frequency, (b) input interferers located at 2.1 GHz and 2.197 GHz with 2 GHz
LO frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.15 Measured two-tone cancellation with 1 GHz LO frequency (a) IM3 product can-
cellation with recalibration (calibration at each input power) and without recali-
bration (calibration once at single input power −15 dBm); (b) IM3 products and
noise before/after cancellation vs. offset frequency of interferers at −15 dBm in-
put power. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.16 Measured single-tone interferer and one 10 MHz QPSK modulated interferer can-
cellation at −15 dBm input power for each interferer: (a) single-tone interferer
located at 1.1 GHz and the center frequency of modulated interferer located at
1.2 GHz with 1 GHz LO frequency; (b) single-tone interferer located at 2.1 GHz
and the center frequency of modulated interferer located at 2.2 GHz with 2 GHz
LO frequency; (c) measured cancellation at 1 GHz LO frequency and calculated
cancellation with constant group delay (GD); (d) measured cancellation at 1 GHz
LO frequency vs. interferer offset frequency. . . . . . . . . . . . . . . . . . . . . 114
4.17 Measured two 5 MHz QPSK modulated interferers cancellation: (a) the center
frequencies of these two interferers are located at 1.1 GHz and 1.2 GHz with
1 GHz LO frequency; (b) the center frequencies of these two interferers are lo-
cated at 2.1 GHz and 2.2 GHz with 2 GHz LO frequency. . . . . . . . . . . . . . . 116
xii
Acknowledgements
First and foremost, I would like to thank my Ph.D. advisor, Professor Peter R. Kinget. This
dissertation would not have been possible without your generous support. I still clearly remember
the day I received the offer letter from you in March 2015, the day I stepped into CISL Lab
first time, the day I passed my thesis proposal, and the days I slept in our lab. Thank you for your
suggestions and patience. After five years, eventually, I become a competent engineer and a reliable
man.
I would like to express my sincere gratitude to my colleagues in CISL Lab: Zhaowen Wang,
Tanbir Haque, Jianxun Zhu, Yang Xu, Matt Bajor, Michael Unanian, Guoxiang Han, Yu Chen,
Ning Guo, Rabia Tugce Yazicigil, Sarthak Kalani, Daniel de Godoy Peixoto, Shravan Nagam,
Vivek Mangal, Scott Newton, Subhajit Ray, Petar Barac, and Yuka Onizuka. Thank you for the
valuable technical discussions and for sharing your life experience with me.
I would like to thank Prof. Keren Bergman and Nathan Casey Abrams. Thanks for working
with you on Photonic Integrated Networked Energy-efficient Datacenters (PINE) project during
the last three years.
My appreciation also goes to the engineers from Cisco Systems, where I studied, worked, and
grew during my internship: Dr. Kadaba R. (Kumar) Lakshmikumar, Dr. Pavan Kumar Hanumolu,
Dr. Romesh Kumar Nandwana, Alex Kurylak, Dr. Manohar Nagaraju, Bibhu Das, Dr. Abhishek
Bhat, and Mike Brubaker.
I am deeply grateful to all the members of the dissertation committee: Prof. Charles A. Zukowski,
Prof. Mingoo Seok, Prof. Peter R. Kinget, Dr. Romesh Kumar Nandwana, and Prof. Timothy Dick-
son. Thank you for the timely review of the thesis and evaluation of the defense.
I also would like to thank Columbia University for its administrative and housing support.
xiii
I thank Elsa Sanchez, Laura M Castillo, Yoel Rio, and Dennis Scott-Torbet for administrative
support. I thank Prof. Timothy Dickson for the opportunity to be a teaching assistant.
The work presented in this thesis was supported by Advanced Research Projects Agency-
Energy (ARPA-E) under Energy-efficient Light-wave Integrated Technology Enabling Networks
that Enhance Data processing (ENLITENED) Grant DE- AR0000843, and in part under Defense
Advanced Research Projects Agency (DARPA) RF-FPGA and National Science Foundation (NSF)
ECCS-1343282. I thank their support.
At last, I would like to dedicate this dissertation to my parents: Haixiang Zhang and Fang He. I
could never be where I am now without your unconditional love and support. I would like to share
my accomplishments with you.
xiv
Chapter 1: Introduction
1.1 Optical Transceivers with Heterogeneous 2.5D Integration for Data Centers
1.1.1 Silicon photonics technologies in data centers
In the era of Big Data, Internet Protocol (IP) traffic for emerging technologies, such as video
services, artificial intelligence (AI), cloud computing, and machine-learning applications, have
seen explosive growth. Therefore, the mega-data centers, which contain thousands of servers that
may be hundreds of meters away, are demanded to store, compute, and transmit the data that do not
fit on a single server. Silicon photonics technologies, which use silicon as the medium for photonic
applications, have been introduced to improve the performance of data centers.
Bandwidth steering for flexible resource utilization is one of the most influential trends [1]. Dis-
aggregation (Fig. 1.1a), a concept that separates resources, such as central processing unit (CPU),
graphics processing unit (GPU), memory, and other interfaces, into separated pools, is introduced
and employed, enabling various resources to be replaced and upgraded independently. As shown
in Fig. 1.1b, low to medium radix photonic switches implemented in silicon photonics chips re-
configure or reconstruct the network of resources to match the unpredictable traffic pattern.
Silicon photonics technologies also break the trade-off between high bandwidth and low power
consumption of each resource unit, especially for mega data centers. As is known, electrical trans-
mission lines in backplanes and various cables introduce huge transmission loss and latency for
high-frequency and long-distance (beyond centimeters) applications. As a result, power-hungry
building blocks such as decision feedback equalizer (DFE) and feedforward equalizer (FFE) in
electrical transceivers are required. However, since the loss and latency in optical fiber are ex-
tremely low, the combination of silicon photonics and electrical transceivers support power-efficient




Figure 1.1: (a) Architecture of traditional server (left) and disaggregated resouce pools (right) in
data centers; (b) connections of disaggregated resouce pools with bandwidth steering using
photonic switches
Fig. 1.2 shows the implementations combining with silicon photonic integrated circuit (PIC)
and electrical integrated circuit (EIC) [2] [3] in recently years. Monolithic integration shown in
Fig. 1.2a adds the least parasitics to electrical transceivers by developing photonic devices based
on an existing complementary metal-oxide-semiconductor (CMOS) silicon-on-insulator (SOI) pro-
cess [4]. However, this integration suffers from the limitation of technology nodes (45 nm and 32
nm), high waveguide loss, low photodiode responsivity, and low photodiode bandwidth. Fig. 1.2b
shows the 2D integration where both the EIC and PIC sit on a printed circuit board (PCB) through
ball grid array (BGA) connection, whose pitch is usually above 150 um [5] [6]. The EIC and PIC in
2D integration are generally connected through wire bonding that introduces significant parasitics.




Figure 1.2: Prior implementations combining with silicon photonic integrated circuit (PIC) and
electrical integrated circuit (EIC): (a) monolithic integration; (b) 2D integration; (c) 3D
integration (EIC on the top); (d) 3D integration (PIC on the top); (e) 3D integration with a
ceramic interposer.
EIC or PIC is connected to PCB through wire bonding, while the other one is on the top of the
integration and packaged by micro-solder bumps or copper pillars, whose pitch can be as fine as
40 um. Though 3D integration increases the density of the pitch and reduces the parasitics, its ther-
mal isolation between the EIC and PIC could be an issue. The thermal-sensitive photonic devices,
such as microrings, are vulnerable to the heat from the EIC. Fig. 1.2e employs a ceramic interposer
to connect the PIC, which is located in the trench of the interposer, to PCB through BGA packages
[10] [11]. Only part of the EIC is 3D integrated on the top of the PIC through micro-solder bumps
or copper pillar connections to enhance thermal isolation. However, a ceramic interposer generally
only contains one or two layers, limiting the density of the integration.
1.1.2 Heterogeneous integration in HPC and data centers
The define of heterogeneous integration is as below. "Heterogeneous Integration (HI) refers
to the integration of separately manufactured components into a higher-level assembly (System
in Package – SiP) that, in the aggregate, provides enhanced functionality and improved operating
characteristics."[12]. In this thesis, we focus on HI in high-performance computing (HPC) and data
3
10 |  AMD |  HOT CHIPS GPU SESSION |   AUGUST 2015
\ A new type of memory chip with low 
power consumption and an ultra-
wide bus width
\ Many of those chips stacked vertically like 
floors in a skyscraper
\ New interconnects, called “through-silicon 
vias” (TSVs) and “µbumps”, connect one 
DRAM chip to the next
\ TSVs and µbumps also used to connect 
the SoC/GPU to the interposer
\ AMD and SK Hynix partnered to define 
and develop the first complete 
specification and prototype for HBM
HIGH-BANDWIDTH MEMORY
DRAM BUILT FOR AN INTERPOSER
(a)
White Paper | Enabling Next-Generation Platforms Using Intel’s 3D System-in-Package Technology
require a large number of micro bumps using micro vias, 
which affects the overall yield and manufacturing complexity. 
Additionally, the number of die that can be integrated using 
an interposer is limited, affecting the scalability.
Higher performance
Stratix 10 FPGAs and SoCs leverage the EMIB’s 
heterogeneous in package integration capabilities to offer 
the highest levels of performance. As shown in Figure 10, 
the EMIB enables the die I/O or bumps to be integrated 
to be placed as close as possible to the edge of the die 
because fewer I/O or bumps are required. This methodology 
ensures that the physical connections between the die are 
very precise and use short interconnect wires. The short 
wires, in turn, result in significantly reduced loading that 
the wire presents to the driving buffer, resulting in higher 
performance. In contrast, alternative solutions reconnect 
the logic fabric using the large underlying interposer. This 
homogeneous integration involves a connecting large 
number of I/O or bumps, which spreads them out from the 
edge of the die towards the center. This placement results in 
much longer int rconnect wires and higher loading on the 
driving buffers. The net result is lower performance.
Reduced complexity, superior signal and power integrity
The EMIB-based flow significantly reduces manufacturing 
complexity. As illustrated in Figure 11, the EMIB solution 
offers simple two-step connectivity for user I/O, power, 
and transceiver signals: bump to standard package trace to 
package ball. The standard package trace is widely used in 
FCBGA packages. This simple connectivity results in reduced 
manufacturing complexity and superior signal and power 
integrity. Key parameters such as insertion loss to crosstalk 
ratio (ICR) and power supply rejection ratio (PSRR) are 
comparable to monolithic designs.
In contrast, alternative solutions offer complex four-step 
connectivity for signals that need to connect to the package 
ball: bump to through silicon vias (TSV) to bump to package 
trace to package ball. This connectivity requires TSVs for 
every user signal and adds significant complexity to the 
manufacturing flow. (The EMIB flow does not use any TSVs.) 
TSV processing adds significant manufacturing complexities 
leading to incremental yield loss, which affects the overall 
commercial viability. Additionally, alternative solutions use 
large number of TSVs (~10,000). This complex four-step 
connectivity results in poor signal integrity for high-speed 
signals and causes IR drop for power delivery nets. TSVs 
also add series resistance and capacitance, which makes 
high-speed design for the transceiver blocks even more 
complex and challenging. Cross talk in the interposer routing 
and coupling between TSVs may impact ICR specifications; 
signal-to-power rail coupling through TSVs may impact PSRR 
specifications.
While heterogeneous 3D SiP integration provides an ideal 
solution to address scalability and flexibility requirements 
for next-generation transceivers, peripherals, memories etc., 
a monolithic FPGA fabric is vital in meeting next-generation 
platform requirements. The next section will detail the 
benefits of using a monolithic fabric versus interposer based 
stacked core fabric solution.
Monolithic core fabric: maximum performance and 
utilization
A monolithic FPGA core fabric is critical to provide maximum 
performance and utilization, and ensure that data can be 
processed at the highest rates possible without running into 
Circuit Board








(3) Package Substrate (6) EMIB
(2) Transceiver Die




























High Silicon Imposer TSV Capacitance
vs.
Figure 10. EMIB Implementation vs. Alternative Interposer based Implementation
(b)
(c)
Figure 1.3: Prior heterogenous integration from: (a) AMD Fury [13]; (b) Intel EMIB [14]; (c)
NVIDIA Tesla [15].
centers, especially 2.5D integration with interposers. The applications of HI also include aerospace
and defense, autonomous automotive, mobile, health and wearables, and internet of things (IoT),
which are beyond the scope of this thesis.
AMD and ASE first integrates ASIC with high bandwidth memory (HBM) memory stack using
2.5D silicon interposer (in Fig. 1.3a) [13]. As is known, memory access latency and performance
always restrict the exploitation of processors’ performance potential. The employment of an in-
terposer breaks the I/O pin limitation by increasing inner connection density and reducing the
latency. Fig. 1.3b shows Embedded Multi-die Interconnect Bridge (EMIB) developed by Intel as
2.5D package integration [14]. EMIB uses a tiny silicon bridge die, embedded as a part of Intel’s
4
substrate fabrication process, with multiple routing layers, but without through-silicon vias (TSVs).
Fig. 1.3c shows another example from NVIDIA Tesla [15]. Chip-on-wafer-on-substrate (CoWoS)
technology tightly integrates GPUs and memory on the same package. NVIDIA NVSwitch, which
acts photonic switch in the section mentioned above, in NVLink boosts the total bandwidth. In
summary, the implementations of HI in electrical data centers are shown in Fig. 1.4a. Since both
the logics and memories are connected into a silicon interposer through flip chip, their distances
for electrical routing are significantly reduced, especially for high-speed signals. The interposer
and micro solder bumps or copper pillars also provide a finer pitch and a denser fan-out solution.
Though HI dramatically enhances the bandwidth, the communication distance is limited to
the millimeter- or centimeter-scale due to the considerable electrical loss. The combination of SiP
and photonics probably is the answer to the above question [16]. In Fig. 1.4b, the logic pool and
memory pool located meters or kilometers away physically are connected through an optical fiber,
which offers hundreds of GB/s bandwidth and only several dB loss. Multiple logic/memory chips
are packaged with a monolithic optical I/O chip [4] consisted of electrical transceivers (TRx) and
photonic devices in each unit. Microring modulators and wavelength division multiplexing (WDM)
are employed to minimize the area and maximize the total throughput, respectively.
1.1.3 Transciever design, theory, and research for optical links
Fig. 1.4c shows the implementation in this dissertation. The monolithic optical I/O chip in
Fig. 1.4b is replaced by two separate chips: one electrical TRx chip and one photonic chip. There-
fore, the advanced CMOS or FinFET technologies and high-performance photonic devices in PIC
can improve the electrical part and photonic part performance. Furthermore, this separation dra-
matically reduces the cost and increases the flexibility for a large volume production. We focus
on the analysis and the implementation of electrical TRx shown in Fig. 1.4c, including multi-lane
receivers, multi-lane transmitters, and the clocking scheme for transceivers.
In Chapter 2, we introduce our design of a power-efficient optical transceiver as a prototype




Figure 1.4: (a) System in Package (SiP) with advanced 2.5D technology; (b) the state-of-the-art
architecture combining SiP with photonics; (b) the concept of the implementation combining SiP
with photonics in this thesis.
data channels and one 2.5 GHz clocking channel, are designed and implemented in 28 nm CMOS
technology. The proposed transceiver that combines new circuit techniques and prior arts is co-
designed with PIC and the interposer. In addition, a relationship among noise (sensitivity), data
rate, and power consumption for inverter-based optical TIA is discussed in Chapter 2.6, and the
research on CDR Based clocking scheme for optical links is discussed in Chapter 2.7.
Injection-locked ring oscillators (ILROs) are widely used for quadrature clock generators (QCG)
in multi-lane wireline or optical transceivers because of their low power, low area, and technology
scalability. However, locking a four-stage I/Q oscillator with a two-phase injection signal generates
imbalances in amplitude and phase, resulting in I/Q phase errors and degraded jitter performance.
Chapter 3 proposes a modified frequency-domain analysis to provide intuitive insight into the per-
6
formance design trade-offs. The analysis is validated by comparing analytical predictions with
simulations for an ILRO-based QCG in a 28 nm CMOS technology.
1.2 High-linearity Wideband Wireless Receivers for Wireless Communications
1.2.1 Applications of wideband wireless receivers
The past two decades have witnessed a dramatic growth in the applications of wireless commu-
nications. Wireless technologies, e.g., mobile technologies including 3G, 4G, and 5G networking,
Wi-Fi, Bluetooth, and television and radio broadcasting, have profoundly changed people’s life.
The requirements of receivers, which is one of the essential electronic devices, are different from
applications. In this dissertation, we discuss the two major applications for wideband wireless re-
ceivers: software-defined radio (SDR) and TV white space (TVWS) communications.
An SDR is referred to the communication functions of the transceiver implemented by software
programs on hardware instead of implemented in hardware directly. Therefore, an SDR is suitable
for multiband and multistandard systems due to its reconfigurability. A cognitive radio (CR) is an
SDR that senses its environment, tracks changes, and achieves dynamic spectrum access, which
is realized through spectrum sensing [17] [18] or geolocation database. In contrast to narrow-
band receivers, e.g., the receivers with inductor-degenerated low noise amplifier (LNA), wideband
receivers are required for SDR systems to capture multiband, multichannel, multistandard radio
frequency (RF) signals.
TVWS frequencies are the unused spectrum in TV bands between 470 MHz to 790 MHz, which
are allowed to be operated in unlicensed devices. Wireless standards, including IEEE 802.22,
802.11af, and 802.15.4m, have been developed for TVWS communications. Low-cost wideband





Figure 1.5: Evolution of wireless wideband receivers with: (a) wideband LNA; (b) wideband
noise-cancelling LNA; (c) frequency translational noise cancelling (FTNC); (d) N-path filters.
1.2.2 Evolution of wideband wireless receivers
In contrast to narrowband wireless receivers, a broadband 50Ω impedance matching is required
for wideband receivers. Though this wideband matching can be simply implemented by 50 Ω re-
sistor or common-gate (CG) amplifier in Fig. 1.5a, the noise figure (NF) is much worse than that
in narrowband receivers, for the matching networking directly adds 3 dB NF. A noise-cancelling
CMOS LNA is proposed to break the fundamental tradeoff between NF and source impedance
matching [19] [20] in Fig. 1.5b. Both a common-gate amplifier and a common-source (CS) ampli-
fier are employed in this architecture. The CG amplifier provides source impedance matching. The
CS amplifier captures the noise from the CG one at the input, amplifies it with a proper ratio, and
cancels it at the differential outputs of the LNA.
8
A wideband LNA amplifies not only the wanted in-band signal but also the out-of-band (OB)
blockers. Thus, the amplified out-of-band blockers generate 3rd order intermodulation (IM3) prod-
ucts and deteriorate the linearity (defined as OB linearity) of the receiver. A frequency translational
noise-cancelling (FTNC) receiver is proposed (Fig. 1.5c). By replacing the LNA with a low noise
transconductance amplifier (LNTA), an FTNC receiver significantly relaxes the trade-off between
noise, out-of-band linearity, and wideband operation.
However, in the absence of SAW filters, an LNTA still introduces IM3 products with strong
input OB blockers. Mixer-first receivers based on N-path filters achieve high linearity by remov-
ing the LNA or LNTA [21] (Fig. 1.5d). Instead of providing a wideband RF source impedance
matching, N-path filters translate the baseband impedance to the RF band through the switch-R-C
circuits. The center frequency of the translated RF band is programmable by tuning the sampling
frequency of the switches. Combining N-path band-pass filters with other filtering techniques, such
as infinite impulse response (IIR) filtering [22], N-path notch filters [23], and feedback circuits
[24], a superb OB linearity is achieved. Nevertheless, the absence of an LNA or LNTA leads to an
isolation issue between the local oscillator (LO) and the RF receiver input, which is not acceptable
in practical applications.
Chapter 4.1 of this thesis reviews the advantages and disadvantages of prior wireless wideband
receivers and proposes a new architecture using an OB IM3 cancellation technique at the baseband.
The proposed receiver is demonstrated in 65 nm CMOS technology and achieves both high OB
linearity and good LO isolation.
9
Chapter 2: A Power-Efficient Transceiver in 28 nm CMOS for An Optical
Link with 2.5D SiP Integration
In this chapter, we focus on the implementation of transceivers (TRx) for optical links. The
proposed TRx is designed and implemented along with a custom PIC and a custom interposer. We
begin in Section 2.1 by introducing the overview of our link architecture. Next, in Section 2.2, we
illustrate the details in the Rx chip. The design of the Tx chip is elaborated in Section 2.3. Since
both Tx and Rx share a similar clocking scheme, it is discussed in Section 2.4. Section 2.5 shows
the simulation results, and Section 2.6 introduces the proposed TIA model. At last, the research on
CDR-based clocking scheme is introduced.
This work has been completed with the collaboration of Zhaowen Wang and Michael Unanian.
I developed the system-level architecture of the TRx, designed the circuits in the receivers and part
of the circuits in the transmitter, proposed the circuits in the clocking scheme in the TRx, proposed
a TIA model, and completed 50% layout in the Rx chip and 30% layout in the Tx chip. Zhaowen
designed and implemented the quadrature clock generator in the TRx and completed 50% layout
in the Rx chip and 10% layout in the Tx chip. Michael Unanian designed and implemented the
serializer and the drivers in the Tx chip and 60% layout in the Tx chip.
The proposed Rx chip and Tx chip testing require integration with a photonic chip, a silicon
interposer, and a printed circuit board (PCB). The proposed TRx chips are taped out in 28 nm
CMOS technology in December 2018 and shipped back in February 2019. The PCB is designed
by Nathan Casey Abrams from Professor Keren Bergman Group at Columbia University (60%)
and me (40%), taped out in September 2019, and shipped back in October 2019. The photonic
chip and interposer are designed by Nathan Casey Abrams. The photonic chip is taped out in
November 2018 and shipped back in April 2019. The interposer is taped out in November 2018
10
and revised in August 2019. The first interposer wafer fails the bond and grind in March 2020, and
the second one fails in September 2020. Unfortunately, we do not have funds in the project for a
new wafer iteration. As a result, our TRx chips cannot be tested.
2.1 Link Architecture Overview
The elements and design considerations in an optical link are shown in Fig. 2.1 [25]. A comb
laser source provides multiple optical channels to support the wavelength division multiplexed
(WDM) transmitters. Each transmitter sends the high-speed electrical signals from the serializer,
which requires the clock generation and distribution, to the microring modulator. Similarly, the
receivers receive the high-speed electrical signals from the optical devices in each channel and
deserialize the data with clock references. The optical loss in this link includes the coupler loss
from the laser to the transmitter silicon waveguides, modulator array penalty in the transmitter
silicon waveguides, coupler loss from silicon waveguides to optical fibers, and demux array penalty
in the receiver silicon waveguides. For a given total link throughput (such as 400 Gb/s), the basic
design tradeoff is between channel number and data rate per channel. A higher data rate results in
less channel numbers but a lower receiver sensitivity and higher optical loss. This tradeoff depends
on the performance of the comb laser, optical devices, and electrical TRx.
Table 2.1 shows the targeted design parameters proposed by the link designer. Since it is the
first 28 nm project in Columbia University and the limited resources, we only implement two
channels for data at 10 Gb/s and one channel for a forward clock at 2.5 GHz.
Fig. 2.2 shows an overview of our design. Both the electrical TRx and the photonic chip are
packaged to a 4-layer custom silicon interposer through flip chip. The electrical system includes
one Rx chip and one Tx chip fabricated in 28 nm CMOS technology. The photonic chip that con-
tains photonic devices for both electrical transmitter and receiver is produced through AIM silicon
photonics technology. Overall, one Tx SiP unit and one Rx SiP unit are represented. The logic
or memory functions in the Tx SiP unit are substituted with the pseudorandom binary sequence




































Fig. 2. Schematic of a full-duplex chip-to-chip optical interconnect based on silicon photonics. Each is equipped with an optical transceiver interface that
can support wavelength-division-multiplexing (WDM) optical signaling. The receiver includes wavelength selective filters for demultiplexing.
process). However, silicon-based optical components benefit
from a high refractive index contrast between silicon and
silica or air, allowing micron/submicron-scale light guiding
structures. These structures can in turn be arranged to obtain
interferences or resonances at particular wavelengths. Fig.
1(a) shows the wavelength response on the output port of a
Mach-Zehnder interferometer when its input port is excited,
while Fig. 1(b) illustrates the wavelength response of a ring
resonator. Refractive index of silicon can also be precisely
modified by changing the concentration of charge carriers [4].
Moreover, because changing this index takes no longer than
tens of picoseconds, fast optical modulations (> 10 Gb/s) can
be realized with silicon-based devices.
The key enabler for silicon photonics, however, has been
the capability to etch the micro-scale structures with enough
precision through slightly adapted [5] or unmodified silicon
CMOS processes [6]. Benefiting both from decades of industry
investment, and from an installed electronics design automa-
tion (EDA) and fabrication environment, silicon photonics
offers the possibility to drastically reduce the costs of optical
transceivers. In addition, transceivers are far easier to integrate
along with conventional electronics if they share the same
substrate and material [7]. Silicon photonics thus also allows
us to foresee deeply integrated optical transceivers in the
future.
At the time this paper is being written, silicon photonics
is clearly industrially and commercially emerging. After sev-
eral “early bird” products targeted for the telecom market,
silicon photonics-based transceivers are being introduced in
supercomputers (HPC) and are about to be massively adopted
for datacenters as their cost-per-provided-bandwidth will soon
reach ∼$1/Gbps [8]. Progresses toward full integration of
optical transceivers with compute elements are also being
realized [7].
In order to optimally exploit the benefits of this emerging
technology in computing systems, a precise understanding of
their main engineering trade-offs is required. Hence, the design
of silicon photonic point-to-point links is a multi-facetted
problem demanding several balanced design choices.
As it is hard to directly exploit the THz-order optical band-
width available, optical links generally use optical wavelength
division multiplexed (WDM) signaling which consists of op-









































+,-,./0/- 1,23/ 45 6-785
49/-.,2743:;:< $!%&'()*+ ,- .!/-
=6>32,06-7? 0111 ,-
=,57=6>71>-;@/ 2!3 ,-
A,@/<3;>/ 26BB $!4"'5% ,- .!/-




Fig. 3. Schematic of a unidirectional silicon photonic link. (1) Clock
generation and serialization. (2) High-speed drivers. (3) RC representation of
optical modulator and thermal tuning. (4) Evolution of optical power budget
along the link. (5) Key assumptions for optical devices. (6) Optical sensitivity
level of the receiver. (7) Our sensitivity model as a function of data rate for
65 nm CMOS node. (8) Microring demultiplexing array with thermal tuning
followed by photodetectors. (9) Maximum available optical power budget.
(10) Transimpedance amplifier frontend of the receiver. (11) Deserialization
of the electrical data.
at a rate compatible with high-speed electrical drivers and
optical modulators, to achieve a high aggregated throughput
within a single fiber. As we will further illustrate in this
paper, a typical design trade-off is the choice of the number
of optical channels packed together as WDM signaling in
conjunction with the channel bitrate to employ [9]. More
channels modulated at higher bitrates are of course desirable.
High bitrate signals, however, are more sensitive to distortions
resulting from imperfect modulators and filters, and are also
more energy consuming. On the other hand, multiplying
endlessly the number of channels eventually obliges to reduce
the frequency guard-bands present between adjacent channels
which can lead to severe optical crosstalk phenomena [10]. A
good knowledge of the link’s elements is required to balance
these two limitations and locate the sweet-spot leading to
minimal energy consumption or maximal scalability.
In this paper, we aim to clearly introduce the main factors
determining the scalability and energy consumption of pho-
tonic links. For this purpose, we present and summarize our
recent work on silicon photonic modeling [9]–[12]. We employ
2017 Design, Automation and Test in Europe (DATE) 327
Figure 2.1: Schematic of a unidirectional silicon photonic link [25]. (1) Clock generation and
serialization. (2) Electrical transmitter (Tx). (3) O tical modulator and thermal tuning. (4) Optical
loss along the link. (5) Key assumptions for optical de ices. (6) An exam le for electrical
receiver sensitivity. (7) Model of electrical receiver sensitivity vs. data rate. (8) Microring
demultiplexing array with thermal tuning followed by photodetectors. (9) Maximum available
optical power budget. (10) Electrical receiver. (11) Deserialization of the electrical data.
off-chip. As shown in the side view in Fig. 2.2, the high-speed signals between electrical TRx and
PIC are connected through the interposer; the low-speed signals (such as supplies, ground, and
sca chain signals) are routed out via through-silicon via (TSV) in the interposer and a printed
circuit board (PCB).
There are two signal channels (TxD) at 10 GB/s each and one clock channel (TxC) at 2.5 GHz
in the Tx chip. Each ignal channel is fed by quadrature 2.5 GB/s inputs from a parall l PRBS gen-
erator. The three channels drive three microring modulators in the PIC at different wavelengths.
Each microring modulator has 20 GHz bandwidth and requires a drive voltage of 1.6 Vpp biased
at −0.5 V (or a reversed signal). On the receiver side, two signal channels (RxD) and one clock
channel (RxC) in the Rx chip receive the signals from microring drop filters and PDs in the PIC.
Finally, the 10 GB/s signal is demultiplexed to four 2.5 GB/s on s. A bit error ratio tester (BERT)
12
Table 2.1: Targeted design parameters proposed by the link designer in this project.
Parameters Values Parameters Values Parameters Values
PD	responsivity 0.9	~	1	A/W Technology 28	nnm Technology 28	nnm
PD	parasitic	capacitance	 <	15	fF Supply 1	V Supply 0.9	~	1.1	V
PD	contact	pads	capacitance <	15	fF Data	channel	number 2 Data	channel	number 2
Clock	channel	number 1 Clock	channel	number 1
Extinction	ratio	(ER) 6	dB Input	data	rate 10	Gb/s Input	data	rate 2.5	Gb/s
Output	data	rate 2.5	Gb/s Output	data	rate 10	Gb/s
Input	clock	frequency 	2.5	GHz Input/output	clock	frequency 	2.5	GHz
Input	capacitance	(Pad	and	ESD)	 30	fF Output	capacitance	(Pad	and	ESD)	 40	fF
Input	capacitance	(transistors) 20	fF Output	capacitance	(transistors) 30	fF
Sensitivity	(Pavg) -	17.5	dBm Output	swing	(Vpp)	 1.6	V
Sensitivity	(uApp) 20	~	25	uA
Input	range	 -	20	~	-	11	dBm
Power	efficiency 0.5	pJ/b Power	efficiency 0.5	pJ/b
Targeted	PIC	performance	 Targeted	Rx	performance	 Targeted	Tx	performance	
Figure 2.2: Block diagram of the proposed link shown in the top view and side view, including a
Rx chip, a Tx chip, and a photonic chip, an interposer, PCB, and 2.5D SiP packaging.
is employed to verify the received signals. The TRx combines proposed and previous circuit tech-
niques to realize a high-sensitivity low-power receiver, a low-power transmitter, and an accurate
quadrature clocking scheme. The details are introduced in the sections below.
The parasitic parameters between the electrical TRx and PIC are crucial for our design. As
shown in Fig. 2.2, we employ the model of the photonic devices from our vendor; for the parasitics
from the interposer and flip chip, we generate an equivalent hybrid-pi model from electromagnetic
simulation; for the parasitics in the CMOS chip, the custom pads and ESD protection are designed
and evaluated through parasitic extraction and post-layout simulation.
13
Figure 2.3: Schematic for the proposed receiver chip.
2.2 Design of High-Sensitivity Low-Power Receiver
Fig. 2.3 depicts the architecture of the proposed receiver chip. Each data channel (RxD) con-
sists of transimpedance amplifiers (TIA), a single-ended to differential (S2D) converter, a variable
gain amplifier (VGA), a level shifter (LS), 1-bit decision feedback equalizers (DFE), and a quadra-
ture clocking generator for DFE. The clock channel (RxC) includes TIAs, an S2D converter, and
amplifiers (AMP). The building blocks drawn in blue color are all designed for testability: the input
driver and PD emulator can replace the optical input signal with an electrical one for both RxD and
RxC; a test switch vcmtest is used for the two-step offset calibration shown in the section below;
switches and 50 Ω output buffer are used to fan out both 2.5 GB/s signals and 2.5 GHz I/Q clocks.
2.2.1 TIA
The design of TIAs, in terms of thermal noise, power supply ripple rejection (PSRR) for sup-
ply noise, and DC offset calibration, are crucial for a high-sensitivity optical receiver. The detailed
design tradeoffs can be found in [26] [27]. In spite of the variety of TIA architectures like regu-
lated cascode (RGC) TIA [28] and double-sampling receiver [29], the "shunt-shunt" feedback TIA
with a push-pull (inverter-based) amplifier wins great popularity, especially for advanced notes.
14
(a) (b) (c) (d)
Figure 2.4: Topologies for shunt feedback TIA with (a) an inverter amplifier; (b) a cascode
inverter amplifier; (c) multi-stage inverter-based amplifier; (d) an amplifier with PMOS
cascoding.
The inverter-based topology enhances the current efficiency and enables low-voltage design and
scalability, but it is vulnerable to PVT variations. A large voltage gain from the feedforward am-
plifier in the TIA levers the value of the feedback resistor and reduces the thermal noise; however,
its variation over PVT becomes worse, thus degrading the stability. In Fig. 2.4, we compare the
four implementations for shunt-feedback TIA to discuss the design tradeoff between performance
and stability. Fig. 2.4 (a) shows the simplest shunt-feedback TIA [8] [9] [30] with a limited feed-
forward voltage gain. In 28 nm CMOS technology, the typical inverter gain is about 7.5 with 35%
process variation and 3% temperature variation (from 0◦C to 100◦C) based on our simulation. The
cascode inverter in Fig. 2.4 (b) significantly improves the voltage gain to about 20, but deteriorates
process and temperature variations to 73% and 40%. The voltage gain further increases to 25 by
cascading three stages [31] in Fig. 2.4 (c), but the process and temperature variations become 88%
and 75%, respectively. We employ the topology in Fig. 2.4 (d) to achieve a balanced design with
only cascoding either PMOS or NMOS [32], and finally, about 12 voltage gain, 45% process vari-
ation, 19% temperature variation are achieved. Due to the 0.9 V supply voltage in our design, the
cascode PMOS is directly connected to the ground as the bias.
Fig. 2.5 depicts the schematic of the proposed main TIA and dummy TIA. Though the self-
referenced TIA [33] also achieves good PSRR without a dummy TIA, its reliability depends on
15
Figure 2.5: Schematic for the proposed receiver TIA.
Figure 2.6: Block diagram of the offset calibration for TIAs
the parasitics from the pads. In our design, the main TIA and the dummy TIA are designed based
on the topology in Fig. 2.4d. The same variable feedback resistors, which are controlled by six
binary bits for process variation, are the same for both TIAs, and a one-fourth size inverter is used
in dummy TIA to reduce the power consumption at the cost of a minuscule increase of the total
noise. A programmable capacitor mimics the parasitics at the input of the main TIA to maximize
the PSRR for the pseudo-differential pair.
A feedback offset calibration for the input optical signal is required for the TIA design, but
it generates two potential issues: the extra noise from the feedback loop added directly to the
16
sensitive input of the main TIA; a large RC constant, as high as 100 us, needed in the feedback
loop. Fig. 2.6 shows the block diagram of the offset calibration for TIAs. RT stands for the DC
transimpedance from the input current Iin to the output voltage Vo, and F stands for the feedback
DC transconductance from Vout to the feedback current Ifb. The low-pass filter in the feedback loop




1 + RTF + sRC
(2.1)
A high loop gain H = 1 + RTF is required for an effective calibration, so the high-pass bandwidth
in (2.1) is boosted H times simultaneously. If 50 kHz low cutoff frequency is designed for 10 GB/s
data rate, the required RC constant is τ = 10/(50k) = 200 us for H = 10.
In our design, an NMOS bias is used to eliminate the offset current to minimize the added noise;
a long-channel cascode inverter replaces the large passive resistor to save the area. However, based
on the analysis in Section 2.2.2, any mismatch between the threshold voltage of the feedforward
inverter (Fwd inv) VthFwd and that of the feedback inverter (FB inv) VthFB directly transforms to
the voltage offset at the outputs of TIAs. Therefore, we introduce the coarse tuning and fine tuning
to calibrate this offset through the ratio of PMOS/NMOS and the body bias of PMOS, respectively,
which are manually adjusted. Table 2.2 shows the post-layout simulation results of the mismatch
between VthFwd and VthFB over the process and temperature variation. The process variation is
between 9.82 mV and 11.69 mV, while the temperature variation is between 15.25 mV and 18.93
mV. Considering the random mismatch of 3σ standard deviation, the maximum requirement for the
process-temperature variation tuning range is 38.48 mV. Fig 2.7 shows the post layout simulated
fine and coarse tuning range for VthFB, and they are about 25 mV for fine tuning and over 80 mV
on the whole.
2.2.2 Analysis of the proposed TIA
Fig. 2.8 shows the model of the feedback DC offset calibration of the proposed TIA, defining
Vth1 and Vth2 as the threshould voltages of Fwd inv and FB inv respectively. The input and output
17
Table 2.2: Post-layout simulated mismatch between VthFwd and VthFB over process and tempera-
ture variation
III. Rx: TIA 
6/13/20 Columbia	University	Confidential
TT SS FF SF FS Process	variation
VthFwd @27	°C	(mV)
Mean 499.99 449 450.23 466.05 434.62
3s 4.16 4.19 4.16 4.16 4.16
VthFB @27	°C	(mV)
Mean 499.06 454.04 445.55 460 440.26
3s 5.25 5.25 5.24 5.26 5.25
Mean	difference 0.93 -5.04 4.68 6.05 -5.64 11.69
VthFwd @0	°C	(mV)
Mean 446.41 446.5 446.85 462.82 430.58
3s 4.2 4.23 4.19 4.2 4.19
VthFB @0	°C	(mV)
Mean 451.18 454.79 447.04 461.05 441.47
3s 5.23 5.23 5.23 5.23 5.23
Mean	difference -4.77 -8.29 -0.19 1.77 -10.89 12.66
VthFwd @100	°C	(mV)
Mean 458.08 458.16 457.54 473.21 443.95
3s 4.09 4.1 4.09 4.09 4.1
VthFB @100	°C	(mV)
Mean 445.96 450.96 440.58 456.19 435.91
3s 5.31 5.32 5.29 5.32 5.31
Mean	difference 12.12 7.2 16.96 17.02 8.04 9.82
Temperature	variation 16.89 15.49 17.15 15.25 18.93
0.5 0.6 0.7 0.8 0.9





















0.5 0.6 0.7 0.8 0.9






















Figure 2.7: Post-layout simulated fine tuning and coarse tuning for the VthFB
of a inverter are identical at threshould voltage. Based on Kirchhoff’s circuit laws, we derive (2.2).
18
Figure 2.8: Model of the feedback DC offset calibration of the proposed TIA
Vo − Vth1 = (V1 − Vth1)(−A1) (2.2a)
V3 − Vth2 = (Vo − Vth2)(−A2) (2.2b)




Assuming Vth2 = Vth1 + ∆V , we get (2.3) based on (2.2).
Vo = Vth1 +
−A1Rf Idc
gm3Rf A1 A2 + A1 + 1
+
−gm3Rf A1(Vth3 − Vth1)
gm3Rf A1 A2 + A1 + 1
+ ∆V
gm3Rf A1 A2
gm3Rf A1 A2 + A1 + 1
(2.3)
Known A2  A1  1 and Idc about tens of mirco Ampere, we can conclude: the DC offset
current Idc and the threshold mismatch between Vth1 and Vth3 has little impact on the output offset;
any mismatch between the threshold of Fwd inv Vth1 and the threshold of FB inv Vth2 is almost





Figure 2.9: Topologies for S2D converters: (a) a unity-gain inverter without a dummy TIA; (b) a
fully-differential pair with CMFB; (c) cross-coupled inverters; (d) proposed inverter-based
Cherry-Hooper amplifier with embedded CMFB.





































































Figure 2.10: Post-layout simulated differential gain (GD), common-mode gain (GC), and CMRR
of the proposed S2D over process variation.
20
Table 2.3: Summary of the post-layout simulated results in Fig. 2.10III. Rx: S2D
6/13/20 Columbia	University	Confidential
Corner Global TT Global SS Global FF Global SF Global	FS
GDM	 3.008	dB 2.720	dB		 2.399	dB 2.926	dB 3.076	dB
BW	for	GDM 16.59	GHz 13.52	GHz 20.49	GHz 16.48	GHz 16.70	GHz
CMRR	@	10	MHz 53.13	dB 19.57	dB 25.37	dB 39.88	dB 33.17	dB
CMRR @	10	GHz 15.65	dB 13.33	dB 16.13	dB 15.11	dB 15.93	dB
BW	for	CMRR 0.1340	GHZ 6.074	GHz 3.686	GHz 0.5861	GHz 1.426	GHz
Supply 0.9	V 0.9 V 0.9 V 0.9	V 0.9	V
Power 1.476	mW 1.025	mW 2.086	mW 1.501	mW 1.472	mW

















































An S2D converter is needed in an optical receiver due to its single-ended inputs. It is, in general,
followed by TIAs, before any distortion is generated. In this section, we propose a Cherry-Hooper
based S2D converter with embedded common-mode feedback (CMFB) circuits in Fig. 2.9d and
compare it with the previous implementations in Fig. 2.9a, Fig. 2.9b, and Fig. 2.9c.
The complementary signal can be simply generated by a unity-gain inverter without a dummy
TIA [30] (Fig. 2.9a), but it cannot suppress the supply noise from the TIA. A differential pair in
Fig. 2.9b is the most common structure for an S2D converter, but it is not suitable for the low-
voltage design. Since the output DC voltages from TIAs are VDD/2, a DC voltage shifter, as is
shown in [33], is generally required, especially when the supply voltage is less than 0.9 V and the
threshold voltage of MOSFET is higher than 0.4 V in 28 nm CMOS technology or beyond. Cross-
coupled inverters can address the problem of DC shift in Fig. 2.9c, but its performance is limited
by the tradeoff between common-mode rejection ratio (CMRR) and power consumption (shown in
Section 2.2.4).
The proposed S2D converter consists of two inverter-based Cherry-Hooper amplifiers. A PMOS
(MP3p/MP3n) and an NMOS (MN3p/MN3n) are stacked in the second stage of each amplifier. Their
gates are connected to the output of the amplifier, and their drains are linked together, respectively.
As a result, the connected drains with red lines are virtual grounds for differential signals but not
21
Table 2.4: Comparison table of different S2D architectures
	
		 (a)	 (b)	 (c)	 (d)	proposed	
Input	DC	 Vdd/2	 Vth	+	2Vov	 Vdd/2	 Vdd/2	
Output	DC	 Vdd/2	 CMFB	needed	 Vdd/2	 Vdd/2	
PSRR	 No	 Yes	 Yes	 Yes	
DM	gain	 N/A	 gm1RL	 -	gm1/(gm2	-	gm3)	 (gm2Rf	-	1)	gm1	/	gm2	
CM	gain	 N/A	 	 gm1RL	/	(1	+	gm1Rs	)	 -	gm1/(gm2	+	gm3)	 (gds3Rf	-	1)	gm1	/	(gm3	+	gds3)	
CMRR	 N/A	 1	+	gm1Rs	 (gm2	+	gm3)/	(gm2	-	gm3)	
[(gm2Rf	-	1)(gm3	+	gds3)]	
/[(gds3Rf	-	1)	gm2]	
Power	 Low	 Moderate	 High	 Low	
	
	
Figure 2.11: Small-signal model for the differential gain of the proposed S2D converter
for common-mode signals. The detailed calculation of the proposed structure is shown in Section
2.2.4. The embedded CMFB provides a high CMRR for the low-voltage design. Fig. 2.10a and
Fig. 2.10b show the post-layout simulated differential gain (GD), common-mode gain (GC), and
CMRR of the proposed S2D converter over process variation. The detailed summary is shown in
Table 2.3. The designed CMRR is as high as 53.1 dB at 10 MHz frequency and 15.7 dB at 10 GHz
frequency at the typical corner. A wideband 20 dB CMRR is achieved even in the worse case.
Table 2.4 shows the comparison of the previous and the proposed S2D converters. The proposed
architecture achieves high CMRR, low power consumption, and a wide input range, especially for
advanced technologies where the headroom is limited.
2.2.4 Analysis of the proposed S2D converter
The differential gain (GD) of the propose S2D in Fig. 2.9d:
Assuming that the equivalent transconductance and output resistance of the sum of both the
22
NMOS and PMOS are marked as gm1, gm2, gm3, ro1, ro2, and ro3 respectively (e.g., gm1 = gmp1 +
gmn1, 1/ro1 = 1/rop1 + 1/ron1), the small-signal model of the proposed S2D converter for the




















(gm2R f − 1)Zx ZLgm1
Zx + ZL + R f + ZL Zxgm2
(2.5)










(gm2R f − 1)ro1ro2gm1
s2CxCLro1ro2R f + s[Cxro1(ro2 + R f ) + CLro2(ro1 + R f )] + ro1ro2gm2 + ro1 + ro2 + R f
(2.7)
At DC, the solution of GDM is shown in (2.8):
GDM =
(gm2R f − 1)gm1ro1ro2
gm2ro1ro2 + ro1 + ro2 + R f
≈




Figure 2.12: Small-signal model for the common-mode gain of the proposed S2D converter
Now let us apply (2.8) to a numerical example. With the condition below:
ro = ro1 = ro2 (2.9a)
Co = Cx = CL (2.9b)
gm = gm1 = gm2 (2.9c)
gmro = 6.5 (2.9d)
gm R f = 3.25 (2.9e)
we get that now the transfer function of the differential gain has a feature a second-order Butter-
worth filter. Its gain and 3 dB bandwidth is calculated as:





Obviously, the bandwidth is enlarged 4.24 times as a typical cherry-hooper amplifier.
The common-mode gain (GC) of the propose S2D in Fig. 2.9d:
The small-signal model of the proposed S2D converter for the differential gain calculation is
24
shown in Fig. 2.12. First, let us calculate the relationship between Vx and Vo.
sCLVo + gm2Vo + gds3Va + gfVo − gfVx = 0 (2.11a)
gds2Vo − gds2Va + gm2Vx − gm2Va − gds3Va − gm3Vo = 0 (2.11b)
The solution of (2.11) is shown in (2.12), and its approximation is in (2.13), supposing gm2 and




(gm2 + gds2 + gds3)g f − gm2gds3






g f − gds3
gf + gm3 + sCL
(2.13)








Combining (2.14) and (2.13), we get (2.15) and its approximation (2.16) at DC.
GCM =
(gds3R f − 1)ro1ro2gm1






(gds3R f − 1)gm1ro1ro2
(gm3 + gds3)ro1ro2 + (1 + R f gm3)ro2
≈







gm2R f − 1
gds3R f − 1
s2Co2ro2 + (2k + A + 1)sCoro + Ak + A + k
s2Co2ro2 + (2k + 2)sCoro + Ak + 2k + 1
(2.17)
25
Gain of the S2D converter in Fig. 2.9c:
The differential gain and common-mode gain of the S2D converter in Fig. 2.9c is derived as
below.
−gm2Vout− = gm1Vin+ + gm3Vout+ (2.18a)
−gm2Vout+ = gm1Vin− + gm3Vout− (2.18b)
For the differential gain, we substitute Vin+ = Vid/2 and Vin− = −Vid/2 to (2.18); for the common-









Since its differential output resistance is gm2 − gm3, a high CMRR is only achieved by increasing
gm2 + gm3 for high-speed application, resulting in a significant increase of its power consumption.
2.2.5 VGA and level shifter
A pair of inverter-based Cherry-Hooper amplifiers (Fig. 2.5 (a)) is employed as the VGA
(Fig. 2.13). Their voltage gains are programmed by changing the transconductance (gm) in the
first stage and the feedback resistance in the second stage. In this design, only one bit is used in the
first stage to provide a 6 dB gain programmability for process variation.
Because of the employment of a differential pair next stage, a level shifter (LS) is designed to
match its input common-mode requirement in Fig. 2.14. Similar to the S2D converter and VGA,
the design of our LS is based on the inverter-based Cherry-Hooper structure, but the DC voltage
shift, which is from 0.45 V (VDD/2) to 0.65 V in our application, is realized by a DC current from
the NMOS controlled by CMFB circuits and feedback resistance in the second stages. Though the
26
Figure 2.13: Schematic for the VGA.
Figure 2.14: Schematic for the level shifter.
inputs of CMFB circuits can be directly from the outputs of the LS, a larger RC component is
needed in the feedback loop. Instead, an auxiliary LS, a replica of the main LS but with a 1/8 size,
is introduced to feed CMFB circuits. Notably, the gain of S2D converter, VGA, and LS can be
programmable due to the similar architecture if needed.
2.2.6 Summer and 1-bit DFE
Equalizers break the bandwidth limitation for analog front end (AFE) and reduce the total ther-
mal noise. Compared with other equalizers such as continuous-time linear equalizer (CTLE) and
feedforward equalizer, decision feedback equalizer (DFE) cancels the inter-symbol interference
27
Figure 2.15: Schematic for one slice in 1-bit DFE.
(ISI) by the regenerated signals without introducing extra noise. The designs of DFE-based optical
receivers are discussed in [34] [35]. Considering performance and complexity, 1-bit DFE is em-
ployed in the proposed receiver, and the 3 dB bandwidth in AFE (including TIA, S2D converter,
VGA, LS, and summer) is designed at 0.3 times data rate, that is 3 GHz in our receiver.
As shown in Fig. 2.3, the 1-bit DFE contains four slices driven by quadrature clocks at 2.5 GHz
frequency separately. Fig. 2.15 shows the schematic of one slice, including a summer and a slicer.
The slicer is composed of a double-tail dynamic comparator [36] and a symmetric slave latch [37].
The summer consists of three NMOS differential inputs, and they are connected to the outputs of
the LS (for inp and inn), the outputs of the next DFE slice as feedback signals (for fbp and fbn), and
offset calibration signals (for offp and offn), respectively. The current bias for those three branches
can be adapted for the best ISI cancellation and offset compensation. In our design, both the current
bias for the main branch and feedback branch is manually controlled; the current bias for the offset
branch is adjusted by a 5-bit digital signal to achieve 1 uA resolution.
28
2.2.7 Mismatch calibration for the receiver
DC offset leads to a power penalty to reduce the sensitivity of the receiver. Since the sizes
of transistors are generally small for high-speed design, the mismatch in each stage could finally
cause a considerable offset for the whole receiver. Therefore, a mismatch calibration scheme is
required.
A dual-step calibration scheme is adopted in the proposed receiver. In step one, we switch
the inputs of the LS to a common-mode bias vcmtest, and compensate the mismatch from the LS,
summers, and slicers by adjusting the offset current in the summer. In step two, we connect the
inputs of the LS to the outputs of VGA and adjust the mismatch from TIAs, the S2D converter,
and the VGA by calibrating the threshold of the feedback inverter (FB inv) in the main TIA.
Both the calibrations can be observed at the final outputs from slicers. The switching point of the
polarity of the differential signals indicates the optimal calibration configuration. In addition, the
interdigitation layouts are carefully implemented in all stages of the analog front-end to minimize
potential mismatches.
2.3 Design of Low-Power Transmitter
Fig. 2.16 depicts the architecture of the proposed transmitter chip. Each data channel (TxD)
consists of a serializer that multiplexes four 2.5 Gb/s input signals to one 10 Gb/s output signal,
an inverter cell and a transmission gate cell to generate the differential signal, a pair of output
drivers, and AC coupling units for outputting the differential signal, and an anode custom pad, and
a cathode custom pad with their ESD protection, respectively. The four 2.5 Gb/s input signals are
fed by an on-chip parallel PRBS generator, which is chosen between 7 bits and 15 bits. Similar
with the receiver chip, a QLL and a delay line for phase rotation are employed to generate accurate
four-phase clocks to feed the serializer in each TxD. In the clock channel (TxC), we design the
output drivers, AC coupling units, and custom pads, and their ESD protection. A pair of differential
external inputs can bypass the serializer and directly drive the output drivers for the testing purpose.
29
Figure 2.16: Schematic for the proposed transmitter chip.
Figure 2.17: Schematic for the proposed four-to-one serializer.
2.3.1 Four-to-one serializer
The schematic of the four-to-one serializer is shown in Fig. 2.17. The input data are connected
to both NMOS and PMOS to generate symmetrical charge and discharge for the serializer, con-
trolled by four 25% duty cycle 2.5 GHz clocks. The post-layout simulation shows only a 2.4% duty
cycle over process variations.
30
Figure 2.18: Schematic for the proposed output driver, AC coupling unit, and custom pads.
2.3.2 Output driver, AC coupling unit and custom pads
Fig. 2.18 shows the details of the output driver, AC coupling unit, and custom pads. As ana-
lyzed in [6], a programmable output driver accommodates the output resistance and prevents an
overdamped or underdamped output waveform. The output driver, based on inverters, is divided
into a pre-driver and a variable driver. The variable driver consists of a fixed inverter and variable
inverters controlled by a 5-bit digital signal.
An on-chip 1.54 MΩ resistor and an on-chip 3.04 pF capacitor compose an AC coupling unit,
generating a 34 kHz, 3 dB bandwidth high-pass filter. Since a minimum peak-to-peak drive voltage
of 1.6 V biased at 0.5 V is required for the inputs of the modulator in the PIC, we use a 0.8 V
peak-to-peak single-ended swing and bias the anode and cathode at 0.9 V and 0.1 V, respectively.
As a result, the output voltage could be higher than the 0.9 V supply at the anode and lower than
0 V ground at the cathode. Therefore, we design custom pads and ESD protection, as shown in
Fig. 2.18. A cascaded P-type ESD diode is added to protect the anode, while a cascaded N-type
ESD diode in a deep N-well is for the cathode.
31
Figure 2.19: Schematic for QLL.
Figure 2.20: Schematic for the proposed operational transconductance amplifier in the QLL.
2.4 Clocking Scheme
A 2.5 GHz, feedforward clocking scheme from the Rx chip to the Tx chip is employed in this
design. As mentioned, both the DFE in each RxD and the serializer in each TxD require accurate
quadrature clocks. In this section, we introduce the QLL design for four-phase generation and the
delay line for phase rotation.
32
2.4.1 Quadrature locked loop
QLL (shown in Fig. 2.19) is proposed in [38], and it is composed of a four-phase cross-coupled
oscillator, a quadrature phase detector, an integrating unit (including a transconductance cell (gm)
and a capacitor), and a level shifter. The oscillator is injected by a pair of differential signals INJP
and INJN from an external reference or the feedforward clock. It is noted that only two phases in
the oscillator are injected, while the other two are connected to the ground through dummies. The
quadrature phase detector consists of an XOR gate and an XNOR gate. Its four inputs are evenly
loaded, and its outputs are connected to an operational transconductance amplifier along with a
capacitor, generating an integrator as the charge pump in a PLL. Finally, a level shifter receives
the voltage signal of the integrator and feeds its output back to the oscillator. The principle of QLL
is analyzed in [38]. Intuitively, we can understand it as the combination of injection locking and
delay-locked loop (DLL). After synchronizing with the input frequency through injection locking,
the quadrature-phase error is detected and corrected through a delay-locked loop.
The mismatch in the feedback loop is critical and directly transform into the quadrature-phase
error. Although the random mismatches are minimized through interdigitation layout and large
transistor sizes, the systemic mismatch from the operation transconductance amplifier cannot be
easily removed. As shown in Fig. 2.20, Vint and Vg are straightly connected in the previous design
[38]. Since Vctrl is determined by the loop calibration and could be any value within the injection
locking range, there is a voltage mismatch between Vint and Vctrl . The previous solution that adjusts
the body bias of the input differential pair manually cannot cope with the temperature and voltage
variation, resulting in a significant quadrature mismatch as high as several degrees in the worst
case. Therefore, a local feedback loop is designed in this paper. The internal signal Vint is compared
with Vctrl in the additional operational amplifier to eliminate their voltage difference. In addition,
we also design a coarse calibration and a fine calibration controlled manually for any process
mismatch. As shown in Fig. 2.21, IQ mismatches of the proposed quadrature clock generator are
from −0.2 to +0.25 degrees across temperature variation and from −0.3 to +0.45 degrees across
voltage variation (from 0.85 V to 0.95 V), respectively.
33










































Figure 2.21: Post-layout simulated IQ mismatches of proposed quadrature clock generator: (a)
over temperature variation; (b) over voltage variation. 0 − 90, 90 − 180, 180 − 270, 270 − 0
represent the IQ mismatches from different phases, respectively.
Figure 2.22: Schematic for the delay line.
2.4.2 Delay line
A manually controlled differential delay line is designed before each QLL to deskew the phase
of quadrature clocks. Fig. 2.22 is the schematic of the inverter-based delay line, which consists of
a coarse delay line and a fine delay line. The coarse delay line includes 7 delay units, and they
correspond to a 3-bit binary digital signal SEL[2:0]. We turn on or off the inverter in each delay
unit to control the total delay. A pair of cross-couple inverters in each delay unit calibrates any
34


























Figure 2.23: Die photos of the proposed (a) Rx chip and (b) Tx chip.
Figure 2.24: Designed PCB.
potential mismatch between the complementary signals.
A fine delay unit is regulated by an external voltage signal Vf ine to realize a continuously
high-resolution calibration. The control voltage is supplied to an NMOS transistor in series with a
capacitor, together, as the load of each inverter cell.
35














) 3.035 GHz22.12 kHz
Figure 2.25: Post-layout simulated frequency response of the analog circuits in RxD.



































Figure 2.26: Post-layout simulated pulse response of the analog circuits in RxD.
2.5 Simulation Results
The prototype Rx chip and Tx chip are fabricated in 28 nm CMOS technology, and their die
photos are shown in Fig. 2.23. The area of the Rx chip and Tx chip are 1.08 × 0.9 mm2 and 0.9
× 0.9 mm2, while the active area of each RxD and TxD are only 0.023 mm2 and 0.029 mm2,
respectively. Unfortunately, because of the failure of the fabrication of the interposer, the testing
results of our chip are not available.
As depicted in Fig. 2.2, our post-layout simulation combines the parasitics from the electri-
36
Columbia	University	Confidential


















Figure 2.27: Post-layout simulated eye diagrams of the analog circuits in RxD before/after 1-bit
DFE calibration.
cal transceiver, the interposer, and the PIC. Fig. 2.25 shows the post-layout simulated frequency
response of the analog circuits in RxD, including AFE and the summer in one slice of the DFE.
The lower 3 dB bandwidth and higher 3 dB bandwidth are 22.12 kHz and 3.035 GHz, respectively.
Fig. 2.26 shows the post-layout simulated pulse response of the analog circuits in RxD. The dashed
curve depicts a single bit current pulse at the input in RxD, and the solid curve depicts the voltage
waveform at the output of the analog circuits in RxD. The post-layout simulated eye diagrams of
the analog (one of the four paths) in RxD before/after 1-bit DFE calibration are shown in Fig. 2.27.
The maximum vertical eye open value of the output voltage is improved from 168 mV to 218 mV,
which is equal to 2.26 dB for sensitivity when a 20 uApp 7-bit PRBS current signal is presented at
the input of RxD.
Based on the simulation results from the eye diagram, we calculate the transimpedance gain
of the analog circuits in RxD after 1-bit DFE is 10.5 kΩ. Thus, the input-referred noise spectral
density is shown in Fig. 2.28, and the total input-referred noise current is 1.26 uApp, resulting in
17.6 uApp optical sensitivity, which is equal to −18.1 dBm with 0.94 A/W PD responsivity and
37















Figure 2.28: Post-layout simulated input referred noise spectral density.

















IL with electrical input
IL with 20 uApp OMA
IL with 40 uApp OMA
Figure 2.29: Post-layout simulated phase noise performance of the oscillator and the proposed
QLL.
6 dB extinction ratio.
Fig. 2.29 shows the phase noise performance of the improved QLL. The free-running oscil-
lator consumes 1.8 mW power consumption and achieves −85.5 dBc/Hz at 1 MHz offset fre-
quency. With a clean 2.5 GHz off-chip electrical reference, the injection-locked QLL achieves
−119.9 dBc/Hz phase noise at 1 MHz offset and 391 fs jitter (from 100 kHz to 1 GHz). Extra noise
from TIAs is added if the 2.5 GHz reference comes from the optical signal. With 20 uApp and
40 uApp optical modulation amplitude (OMA), respectively, the phase noise of injection-locked
38


































































Figure 2.31: Detailed power consumption in each (a) RxD and (b) TxD.
QLL becomes −117.8 dBc/Hz and −117.9 dBc/Hz at 1 MHz offset frequency. Their total jitters
(from 100 kHz to 1 GHz) are 6.23 ps and 3.86 ps, indicating a larger optical input power helps
suppress the thermal noise from TIAs.
Fig. 2.30 shows the post-layout simulated eye diagram of TxD. With 0.9 V supply for PRBS
generator and four-to-one serializer and 1.0 V supply for the output drivers, the vertical eye open
value is 1.62 mV.
The total power consumption of RxD, RxC, TxD, and TxC are 11.8 mW, 3.01 mW, 8.13 mW,
and 3.19 mW, respectively. The detailed power consumption for each RxD and TxD are shown in
39
















Integration Monolithic 3D	hybrid N/A 2D	hybrid 2D	hybrid 3D	hybrid 2.5D	SiP




Bond	type N/A Flip	chip Bond	wire Bond	wire Bond	wire Flip	chip Flip	chip
Electrical	TRx	node 65	nm	(TRx) 40	nm	(TRx) 65	nm	(Rx) 14	nm	FinFET	(TRx) 65	nm	(Tx) 28	nm	(TRx) 28	nm	(TRx)
PIC	node 65	nm 130	nm	SOI N/A N/A 130	nm	SOI N/A 65	nm
Modulator	type	or	VCSEL Microring Microring Mach-Zehnder VCSEL Microring Mach-Zehnder Microring
PD	responsivity	(0	V	bias) 0.1	A/W 0.7	A/W 0.75	A/W 0.52	A/W N/A 0.8	A/W 0.94	A/W






Rx	input/output	data	rate 10/10	Gb/s 10/10	Gb/s 10/5	Gb/s 32/8	Gb/s N/A 25/25	Gb/s 10/2.5	Gb/s
Tx	input/output	data	rate 7/7	Gb/s 10/10	Gb/s N/A 1/32	Gb/s 3.125/25	Gb/s 25/25	Gb/s 2.5/10	Gb/s
Number	of	channel 10 16 1 1 4 16 2
Rx	sensitivity	(ER	=	6	dB) N/A -	15	dBm -17.6	dBm -11.7	dBm N/A -9.8	dBm -18.1	dBm
Tx	output	swing 1.5	V 2	V N/A 1	V 4.4	V N/A 1.62	V
Rx	energy/bit 0.5	pJ/bit 0.395	pJ/bit 2.3	pJ/bit 3.28	pJ/bit N/A 0.92	pJ/bit 1.18	pJ/bit
Tx	energy/bit 0.1	pJ/bit 0.135	pJ/bit N/A 1.41	pJ/bit 4.54	pJ/bit 4.99	pJ/bit 0.81	pJ/bit
Total	energy/bit 0.6	pJ/bit 0.53	pJ/bit N/A 4.69	pJ/bit N/A 5.91	pJ/bit 1.99	pJ/bit
Fig. 2.31.
The comparison table is shown in 2.5 [6] [9] [11] [31] [39] [40]. Our work combines elec-
trical TRx and PIC through 2.5D SiP integration with the silicon interposer. Compared with the
monolithic optical I/O chip in [39], our integration supports much higher PD responsivity. Also,
advanced technologies increase the electrical TRx performance. With the proposed power-efficient
transceiver and circuit-level innovations, including the TIA, S2D, and improved QLL, we achieve
−18.1 dBm Rx sensitivity, 1.62 V Tx output swing, and 1.99 pJ/bit energy efficiency (including
4-to-1 serializers and 1-to-4 deserializers).
2.6 The Tradeoff Between Noise, Data Rate, and Power Consumption of Transimpedance
Amplifiers for Optical Receivers
Various TIA topologies such as the regulated cascode (RGC) TIA [28] [41] and double-sampling
receiver [29] [42] exist, the ”shunt–shunt" feedback TIA topology with a push–pull (inverter-based)
amplifier in Fig. 2.32—also called the TIA with a FET front-end in [43]—has been widely used
in recent link-optimization [44] [45] [46] [25] [47] and optical-receiver implementations [5] [9]
[31] [48] especially for advanced SOI and FinFET technologies [30] [33]. This topology choice is
40
Figure 2.32: Schematic of TIA with an inverter-based amplifier.
driven by three main reasons: A) The inverter-based structure is amenable to scalability and low-
voltage design; B) Its push-pull feature enhances the current efficiency; C) For SOI and FinFET
technology, the design offers the greatest tolerance to process variation [44].
In Fig. 2.32, the design target is to find out the optimal noise by designing the size of the
inverter-based amplifier, the feedback resistor RF , and the added load capacitance Co.L with a given
data rate. CD is a fixed parameter and represents the sum of the photodetector capacitance and the
parasitic capacitances. Fig. 2.33 shows the small-signal model. CI and CF are the designed param-
eters from the inverter-based amplifier. It is well known that the minimum noise is at CI/CD = 1
[49] [50] [51] [52] if we neglect the noise from the feedback resistor RF . However, this assumption
is not correct for broadband bandwidth TIA in modern optical receivers. The design considerations
are further analyzed and developed in [43] under three different constraints, but they are all based
on CI/CD.
This section presents an analysis to calculate the accurate size of the inverter-based amplifier,
feedback resistance RF , load capacitance Co for the optimal noise. We show the proposed analytical
solutions and apply our method in a 65 nm CMOS technology. Furthermore, we show the impacts
of the quality factor, transistor channel length, and input parasitic capacitance on TIA design,
41
Figure 2.33: Small-signal model of TIA with an inverter-based amplifier.
2.6.1 Proposed analytical solutions
Based on previous publications [26] [27] [43] [51] [53] , we summarize the calculation of the
optimal noise in three steps. Since our focus is broadband optical receivers, the gate shot noise and
flicker noise are ignored [43].
In Step 1, we calculate the TIA’s trans-impedance function ZT (s) = Vo(s)/Iin(s), which is the
ratio between output voltage Vo and input current Iin in the S-domain. This function is used in
Step 2 and Step 3.
In Step 2, we calculate the total input-referred current noise: A) deriving the output voltage
noise density; B) employing the output voltage noise density and trans-impedance function in
Step 1 to derive the input-referred current noise density; C) calculating the total input-referred cur-
rent noise by integration (or noise bandwidth). Combining with A) and B), the relationship between
noise currents and the input-referred noise current density is called the input-referral function [26].
In Step 3 With assumptions and approximations, we optimize the equations in Step 2 to find
the optimal noise.
As introduced above, we illustrate our derivations in each step. For the transfer-function cal-
culation in Step 1, Co.TIA and Co.L stand for the amplifier and load capacitances in Fig. 2.33. We
42




A0RF − Ro − sCFRoRF














(A0 + 1)(CTCo + CTCF + CFCo)RFRo
[CT + (A0 + 1)CF]RF + (CT + Co)Ro
(2.20c)
In Step 2, the noise currents are shown in Fig. 2.33. As detailed in Section 2.6.5. we derive the
input-referred noise density as:
di2n.TIA =
R2o





[(gm RF)2 + (2πCTRF)2 f 2] + 4kTΓ + 4kTΓ[2π(CT + CF)RF]2 f 2
]
d f (2.21a)

























where the first two terms in the bracket come from the RF noise and the last two terms come from
the FET-channel noise, and fn.BW0 and fn.BW2 represent the white noise bandwidth and colored
noise ( f 2 term) bandwidth, respectively [51]. If we set CF and Ro to zero, we get exactly the same
equation as in [26]. Also, we draw the same conclusion that the Miller term CF has almost no
influence on the input-referred noise current. Interestingly, output resistance Ro impacts the total
input-referred noise current, and this term cannot be ignored, especially in high-speed applications
when RF is comparable with Ro.
In Step 3 of our proposed optimization strategy, the variables in the above equations—including
gm, Ro, CF, CT, fn.BW0, fn.BW2, RF, and Co—are represented by technology constants, data rate
43
( fDR), power consumption (PTIA), and the quality factor (Q). As a result, we quantitatively calcu-
late the size of the inverter-based amplifier (PTIA), the feedback resistor R, and load capacitance
Co for the optimal noise.
The required technology constants are defined as



































































where (2.24a)–(2.24d) come from (2.23), (2.24e)–(2.24g) come from [43], and (2.24h) is an em-
pirical equation that is widely used in TIA design [26] [27] [51].
44
Table 2.6: Comparison between previous methods and the proposed method for the calculation of
the optimal TIA noise
4 Yudong Zhang, Peter R. Kinget
Table 1: Comparison between previous methods and the proposed method for the calculation of the optimal TIA noise.
Solution Method 1 Method 2 Method 3 Method 4 Proposed Method
Derivation in Step 1,2 Neglecting CF Neglecting CF Neglecting CF Neglecting CF
Adding CF ;
Introducing Co.L and Co.TIA







Design conclusion CI = CD CI = CD CICD = constant < 1
CI
CD
= constant > 1
Finding exact values
for RF , PT IA, and Co
by solving equations
Table 2: Comparison between previous methods and the pro-
posed method for the calculation of the optimal TIA noise.
Channel length L 60 nm 70 nm 80 nm
Wp/Wn 2.1 2.1 2.1
DC gain Ao 6.3 8.4 10.5
Intrinsic frequency fT.N 247.9 GHz 189.5 GHz 150.4 GHz
Intrinsic frequency fT.P 120.6 GHz 91.3 GHz 71.6 GHz
Noise coefficient GN 0.95 0.93 0.92
Noise coefficient GP 0.66 0.64 0.64
(gmN +gmN)/Id 12.1 12.8 13.0
NMOS Cgd/Cgs 0.30 0.26 0.22
PMOS Cgd/Cgs 0.26 0.28 0.24
Supply voltage 1.2 V 1.2 V 1.2 V
CD 80 fF 80 fF 80 fF
Extinction ratio (ER) 6 dB 6 dB 6 dB
PD responsivity (r) 0.75 A/W 0.75 A/W 0.75 A/W
and PT IA cannot be calculated and estimated from technol-
ogy parameters directly, e.g., the value of CT and fA are un-
known in (22) in [18]. In Step 1 and Step 2, (1) and (2) are
not original but match the results in [31] [32]. However, we
introduce the Miller capacitance for the calculation of the
optimal noise and analyze its impact. In addition, the pro-
posed method shows the relationship between TIA and the
next stages by quantitating Co.
3 Results in 65 nm CMOS
3.1 Technology Constants in 65 nm CMOS
In this paper, we apply our theory to 65 nm CMOS tech-
nology. Table 2 shows the summary of the technology con-
stants’ numerical values. WP and WN stand for the PMOS
and NMOS widths, respectively. In this chapter, we choose
the minimum channel length; the channel length’s impact
will be analyzed in next section. We use the sensitivity Ps to







where ER and r are extinction ratio and photodetector (PD)
responsivity, and their typical values are 6 dB and 0.75 A/W.
3.2 Calculated and Simulated Results
In this section, we show the results based on the proposed
solutions with a channel length L = 60 nm and a second-
order Butterworth response (Q = 1/
p
2).
Fig. 4 (a) shows the 3D plot for the tradeoff among noise
(sensitivity), data rate, and power consumption. Fig. 4 (b)
and (c) show the corresponding feedback resistor RF and
output capacitance Co with different data rates and power
consumptions by solving (6). For a comprehensive under-
standing, Fig. 5 (a) shows the 2D plot of Fig. 4 (a).
As a result, we can draw the following conclusions. For
a given data rate, there is a theoretical minimum power con-
sumption. As clearly shown in Fig. 5 (a), if the power con-
sumption is less than the depicted square markers in each
line, there are no real solutions in (6) for RF and Co. The
minimum power consumption increases with the data rate.
For a given data rate, there is a theoretical minimum sensi-
tivity. Interestingly, increasing power consumption does not
always improve the sensitivity. Though greater power con-
sumption does help raise gm, it also increases the input ca-
pacitance, CI, and Miller capacitance, CF, lowering RF. Fi-
nally, the total input-referred noise is nonlinear, as shown in
(3). However, greater power consumption enhances the ca-
pacity of load capacitance as shown in Fig. 4 (c). As a result,
the noise of the next stage could be reduced due to a larger
size. In reality, a sufficient capacitance margin is required
when PVT variations are taken into consideration.
Fig. 5 (b) shows the second-order fitting functions for the
minimum sensitivity points and their power, and the min-
imum power points and their sensitivity. Its X-axis is the
data rate fDR with the unit GHz, while its Y -axis left and
right are sensitivity (dBm) and power consumption (mW),
respectively. (8) depicts the detailed fitting equations for the




(1 − 12Q2 )








(A0 + 1)(CTCo + CTCF + CFCo)RFRo
[CT + (A0 + 1)CF]RF + (CT + Co)Ro
(2.25b)
With (2.24), (2.25), and (2.22), we represent the noise by fDR, PTIA, and Q.
To find out the optimal noise, a proper value for Q is needed. In gen ral, Q is no more than 1/
√
2
for a second-order response, otherwise the system will suffer from intersymbol interference (ISI)
caused by peaking in the frequency domain. In this paper, we adopt Q = 1/
√
2, the Butterworth
response. Actually, based on our numerical analysis shown in the next section, we find that the best
sensitivity appears at the Butterworth response. An intuitive explanation of this result is that fn.BW0
a d fn.BW2 decre ses but RF increases with a higher Q. As a result, the total input-referred noise
is smaller according to (2.22). Table 2.6 shows the comparison betw e previous methods and the
proposed method. Met d 1 is from [49] [50] [51] [52]; Method 2, 3, and 4 are introduced in
[43]. Though previous methods provid concise conclusions for designers by introducing certain
approxi ations and assumptions, the exact size of R and PT I A cannot be calculated and estimated
from technology parameters directly, e.g., the value of CT and f A are unk own in (22) in [43]. In
Step 1 and Step 2, (2.20) and (2.21) are not orig nal but match the results in [26] [54]. However,
we introduce the Miller capacitance for the calculation of the optimal noise and analyze its im-
45
pact. In addition, the proposed method shows the relationship between TIA and the next stages by
quantitating Co. As a result, previous methods cannot provide an explicit tradeoff between noise,
data rate, and power consumption for optical TIAs, but this tradeoff is desired for optical link de-
signers [25]. In Method 1 and Method 2, the optimal noise is always at CI = CD and irrelevant to
data rate and power consumption. In Method 3 and Method 4, qualitative conclusions rather than
quantitative conclusions are given. With the proposed method, this tradeoff is derived in Section
2.6.2 and Section 2.6.3.
2.6.2 Results in 65 nm CMOS
The proposed analysis is applied to 65 nm CMOS technology. Based on the analysis above,
CMOS technology constants are required for the calculation. Some of them are related to the
CMOS channel length.
• Vdd: supply voltage
• Wp/Wn: PMOS width ratio over NMOS width ratio in an inverter
• gmro: inverter amplification factor
• fT .P: PMOS intrinsic frequency
• fT .N : NMOS intrinsic frequency
• Cgd.P/Cgs.P: PMOS gate-drain capacitance over gate-source capacitance
• Cgd.N/Cgs.N : PMOS gate-drain capacitance over gate-source capacitance
• ΓP: PMOS noise coefficient
• ΓP: NMOS noise coefficient

































































































Figure 2.34: Numerical solutions of (2.25) the relationship among data rate, TIA power
consumption, and (a) sensitivity (noise); (b) feedback resistance RF ; (c) load capacitance Co.
In this chapter, we choose the minimum channel length; the channel length’s impact will be ana-
lyzed in the next chapter.
We use the sensitivity Ps to represent the total TIA input-referred current noise in optical
recievers.
Ps = 10log
[ in.rms(ER + 1)
2(ER − 1)ρ × 10−3
]
(2.26)
where ER and ρ are extinction ratio and photodetector (PD) responsivity, and their typical values
are 6 dB and 0.75 A/W.
In this sub-section, we show the results based on the proposed solutions with a channel length
L = 60 nm and a second-order Butterworth response (Q = 1/
√
2). Fig. 2.34a shows the 3D plot for
47

















































































Sens. at min. power
Min. sens.
Min. power
Power at min. sens.
(b)
Figure 2.35: (a) The 2D plot for the tradeoff among noise (sensitivity), data rate, and power
consumption from Fig. 2.34(a); (b) The fitting functions for the minimum sensitivities and their
power, and minimum power and their sensitivities, with the X-axis fDR (GHz).
the tradeoff among noise (sensitivity), data rate, and power consumption. Fig. 2.34b and Fig. 2.34c
show the corresponding feedback resistor RF and output capacitance Co with different data rates
and power consumptions by solving (2.25). For a comprehensive understanding, Fig. 2.35a shows
the 2D plot of Fig. 2.34a.
As a result, we draw the following conclusions. For a given data rate, there is a theoretical
minimum power consumption. As clearly shown in Fig. 2.35a, if the power consumption is less
than the depicted square markers in each line, there are no real solutions in (2.25) for RF and
Co. The minimum power consumption increases with the data rate. For a given data rate, there is
a theoretical minimum sensitivity. Interestingly, increasing power consumption does not always
improve the sensitivity. Though greater power consumption does help raise gm, it also increases
the input capacitance, CI, and Miller capacitance, CF, lowering RF. Finally, the total input-referred
noise is nonlinear, as shown in (2.22). However, greater power consumption enhances the capacity
of load capacitance as shown in Fig. 2.34b. As a result, the noise of the next stage could be reduced
due to a larger size. In reality, a sufficient capacitance margin is required when PVT variations are
taken into consideration. Fig. 2.35b shows the second-order fitting functions for the minimum
sensitivity points and their power, and the minimum power points and their sensitivity. Its X-
axis is the data rate fDR with the unit GHz, while its Y -axis left and right are sensitivity (dBm)
48














Figure 2.36: The ratio of CI/CD of each input data rate (Gb/s) at the best sensitivity.
and power consumption (mW), respectively. (2.27) depicts the detailed fitting equations for the















































Fig. 2.36 shows the ratio of CI/CD when the minimum sensitivity power is applied for each
input data rate. The ratio increases nonlinearly with the input data rate.
Fig. 2.37 demonstrates that the calculated results agree well with the simulation results. The
49















































Calc. noise from Rf
Simu. noise from Rf
Calc. noise from FET
Simu. noise from FET
(b)
Figure 2.37: (a) Comparing the calculated minimum sensitivity and simulated minimum
sensitivity; (b) comparing the calculated and simulated noise from feedback resistance RF and the
FET channel at the minimum sensitivity.
comparisons between the calculated minimum sensitivities (the red line with circle markers) and
the simulated minimum sensitivities (the yellow line with triangle markers) are shown in Fig. 2.37a.
The difference at each input data rate is less than 0.1 dB. Fig. 2.37b illustrates the details of both
calculated and simulated noise from the feedback RF and the channel FET at the minimum sensi-
tivity. Clearly, these two sources produce comparable noise.
2.6.3 Quality factor, channel length, and input parasitic capacitance
To find the impact of Q in the second-order TIA model, we keep the transistor channel length,
L = 60 nm, and input parasitic capacitance, CD = 80 fF, constant. With different values for Q, we
recalculate sensitivity, feedback resistance RF, and output capacitance Co. The results are shown
in Fig. 2.38a, Fig. 2.38b, and Fig. 2.38c, respectively. The values of Q are Q = 0.707 for the
Butterworth response, Q = 0.577 for the Bessel response, and Q = 0.5 for the critical damping.
For the sake of clarity, we only plot the results at 10 Gb/s and 20 Gb/s. Several valuable conclusions
are drawn from the plots in Fig. 2.38. As Q becomes lower, the sensitivity declines. The best
sensitivity for Q = 0.5 is about 1.3 dB less than that for Q = 0.707 at both data rates. At the same
50
















































































Figure 2.38: Calculated results with Q = 0.5, Q = 0.577, and Q = 0.707 at 10 Gb/s and 20 Gb/s:
(a) the relationship between sensitivity and power consumption; (b) and (c) the relationship
between power consumption and the corresponding RF and Co.
time, the power for the best sensitivity point is higher. The theoretical minimum power lessens for
lower Q, and its sensitivity decreases. Interestingly, the highest value of feedback resistance, RF,
appears at Q = 0.577, and so the largest transimpedance. However, the overall changes of RF for
with various Q is limited. The maximum load capacitance varies significantly, dropping about 43%
from Q = 0.707 to Q = 0.577 and 34% from Q = 0.577 to Q = 0.5 at both data rates. Here is
another reason higher Q should be employed in circuit design.
For a given technology, the TIA’s channel length impacts its sensitivity in two opposing ways.
On one side, a larger L leads to a larger input capacitance, CI, and Miller capacitance, CF, reduc-
ing sensitivity. On the other side, increasing L raises the forward gain of core amplifier A0. The
51
















































































Figure 2.39: Calculated results with L = 60 nm, L = 70 nm, and L = 80 nm at 10 Gb/s and
20 Gb/s: (a) the relationship between sensitivity and power consumption; (b) and (c) the
relationship between power consumption and the corresponding RF and Co.
technology constants for different L values are summarized in Table ??. In this section, we keep
the Q = 0.707 and input parasitics CD = 80 fF. By changing L, we recalculate sensitivity, feed-
back resistance RF, and output capacitance Co; the results are shown in Fig. 2.39a, Fig. 2.39b, and
Fig. 2.39c respectively. As before, we plot the results at only 10 Gb/s and 20 Gb/s.
Based on the plots in Fig. 2.39, we summarize the conclusions as follows. When we increase L,
the sensitivity may be slightly improved at low power consumption due to a higher A0. However, as
power consumption grows, the growth rates of both CI and CF are greater for larger L, resulting in a
net sensitivity decline. For the same reason, the feedback resistance RF for larger L increases at low
power consumption, but decreases at high power consumption. Larger L could slightly improve the
52
minimum power but reduces the load capacitance tolerance. From L = 60 nm to L = 80 nm, the
reduction is about 16% for both data rates.
As is well known, lower parasitic input capacitance improves sensitivity. In this section, we
keep the Q = 0.707 and minimum transistor channel length L = 60 nm and modify the parasitic
input capacitance (CD) by 25%, which includes the capacitor from the optical PD, pads, and the
ESD protection. We then recalculate sensitivity, feedback resistance RF, and output capacitance Co
and present the results in Fig. 2.40a, Fig. 2.40b, and Fig. 2.40c, respectively.
We summarize the conclusions as follows. If we decrease CD by 25% from 80 fF to 60 fF,
the best sensitivity improves around 0.6 dB; if we increase CD by 25% from 80 fF to 100 fF, the
best sensitivity drops around 0.5 dB. However, for higher power consumption, the improvement of
sensitivity is more limited because the capacitance from the amplifier (CI) will gradually dominate,
as does RF. The lower CD also reduces the theoretical minimum power consumption. Surprisingly,
the variations of the load capacitance tolerance are quite limited, as shown in Fig. 2.40c. They are
only around 5% for both data rates.
2.6.4 Conclusions
This section offers a practical method to calculate the accurate size of the inverter-based am-
plifier, feedback resistance RF , load capacitance Co for optimal noise performance. We consider
the Miller capacitance for the optimal noise calculation. Furthermore, we get explicit relationships
among noise (sensitivity), data rate, and power consumption in the broadband optical TIA. Based
on our analysis, we find that for a given data rate, there is a theoretical minimum power consump-
tion that increases with the data rate; and, for a given data rate, there is a theoretical minimum
noise; the CD/CI changes with the data rate for an optimal noise design; finally, increasing power
consumption does not always reduce the TIA’s noise but enhances the maximum capacitive load.
2.6.5 Appendix
Derivation of transfer function:
53


















































































Figure 2.40: Calculated results with CD = 60 fF, CD = 80 fF, and CD = 100 fF at 10 Gb/s and
20 Gb/s: (a) the relationship between sensitivity and power consumption; (b) and (c) the
relationship between power consumption and the corresponding RF and Co.
In Fig. 2.33, we use Kirchhoff’s laws to derive
iin = sCTvin+(sCF+gF)(vin−vo) (2.29a)
(sCF+gF)(vin−vo) = gmvin + (sCo + go)vo (2.29b)
where gF = 1/RF and go = 1/Ro. (2.20) follows from (2.29).
We next explain the approximation of the first equation in (2.20). First, it is easy to understand

















which is even higher than the CMOS transit frequency, fT. As a result, the approximation is com-
pletely reasonable.
Derivation of input-referred noise current:
First, we calculate the input-referred noise current density and derive (2.21). Second, we derive
the total input-referred noise current through integral in (2.22).
To calculate the input-referred noise current density, we need to find the input-referred noise for
noise sources i2n.res and i2n.D, and they are calculated through output noise. As shown in Fig. 2.41, let
us calculate the output voltages vo.a, vo.b, and vo.c respectively. The admittance YF = 1/ZF consists
of the parallel of RF and CF. The admittance Yo = 1/Zo consists of the parallel of Ro and Co.
From Fig. 2.41a, Kirchhoff’s laws yield
−in.TIA = sCTvin.a + YF(vin.a − vo.a) (2.31a)
0 = YF(vin.a − vo.a) + gmvin.a + Yovo.a (2.31b)
Solving (2.31), we get
in.TIA =
sCT(Yo + YF) + (Yo + gm)YF
gm − YF
vo.a = Ha(s)vo.a (2.32)
55
Similarly, from Fig. 2.41b, Kirchhoff’s laws yield
−in.res = sCTvin.b + YF(vin.b − vo.b) (2.33a)
in.res = YF(vin.b − vo.b) + gmvin.b + Yovo.b (2.33b)
and the solution is
in.res =
sCT(Yo + YF) + (Yo + gm)YF
gm + sCT
vo.b = Hb(s)vo.b (2.34)
From Fig. 2.41c, Kirchhoff’s laws yield
0 = sCTvin.c + YF(vin.c − vo.c) (2.35a)
in.D = YF(vin.c − vo.c) + gmvin.c + Yovo.c (2.35b)
and the solution is
in.D =
sCT(Yo + YF) + (Yo + gm)YF
YF + sCT
vo.c = Hc(s)vo.c (2.36)








|gm + sCT |2
|gm − YF |2
di2n.res +
|YF + sCT |2
|gm − YF |2
di2n.D
=
|AoRF + sCTRFRo |2
|AoRF − Ro − sCFRFRo |2
di2n.res +
|Ro + s(CT + CF)RFRo |2
|AoRF − Ro − sCFRFRo |2
di2n.D
To better understand the result, if we apply CF = 0 and RF  Ro to (2.37), we reach the same
equation as in [26].






Using the first equation in (2.20) without any approximation, we get (2.22).
56
2.7 Research on CDR-Based Clocking Scheme
This section introduces our research on the clock data recovery (CDR) based clocking scheme.
Though a forward clocking scheme is implemented in our design as shown in Section 2.4, the
study on CDR-based clocking scheme is an important part of our background investigation and
helps the understanding of TRx design tradeoffs. Based the research, I proposed and patented an
innovative power-efficient clocking scheme based on an injection-locked phase rotator during my
internship at Cisco Systems, Inc.. Section 2.7.1 discusses the applications of CDR-based clocking
scheme in optical links. The research on prior arts are shown in Section 2.7.2. At last, the proposed
power-efficient clocking scheme is introduced in Section 2.7.3.
2.7.1 CDR-based clocking scheme in optical links
For the communication between different racks and servers in a data center and the communi-
cation between data centers, a forward clock is expensive due to the physical distance. Therefore,
the multi-lane receivers (or transmitters) in Fig. 2.42 require a local clock generator, which may
be a frequency synthesizer or a crystal oscillator, and a clock distribution networking. Due to the
slight frequency difference between the input data and the local clock, a phase adjust is employed
in each lane not only for phase adjustment but also for frequency compensation.
In each lane, an appropriate analog front end (AFE), e.g., a trans-impedance amplifier, continuous-
time linear equalizer (CTLE), or a variable gain amplifier, receives the incoming signal and gen-
erates a corresponding electrical signal that can be further processed by respective sub-analog-
to-digital converters (ADCs). The sub-ADCs receive the phase-aligned multiphase clock signals
CLKI , CLKIB, CLKQ, CLKQB, respectively. The ADC supplies its output to an equalizer, such as
a feedforward equalizer (FFE), decision feedback equalizer (DFE), a combination of both FFE and
DFE, or no equalizer. The output of ADC and the output of equalizer are supplied to digital clock-
data recovery logic that supplies a control signal or control word to phase adjuster. The sub-ADCs
receive the phase-aligned multiphase clock signals CLKI , CLKIB, CLKQ, CLKQB, respectively.
57
Figure 2.42: Block diagram of a receive side of a multi-lane wireline or optical transceiver device
including a phase adjuster.
Figure 2.43: The classical clocking scheme for multi-lane wireline or optical transceivers as Prior
Art 1
The ADC supplies its output to an equalizer, such as a feedforward equalizer (FFE), decision feed-
back equalizer (DFE), a combination of both FFE and DFE, or no equalizer. The output of ADC
and the output of equalizer are supplied to digital clock-data recovery logic that supplies a control
signal or control word to phase adjuster.
2.7.2 Previous CDR-based clocking schemes
Prior art 1: a classical solution
Fig. 2.43 shows the classical clocking scheme for multi-lane wireline or optical transceivers




Figure 2.44: Building blocks in Prior Art 1: (a) schematic of a frequency divider; (b) block
diagram of a duty cycle correction; (c) schematic of a classical phase interpolator (PI); (d) phase
constellation of the classical PI
is twice the operating frequency of sub-ADCs, to feed the phase adjuster required in each receiver
lane. A frequency divider (Fig. 2.44a) is employed to generate 4-phase clocks at frequency f0 from
reference clocks at frequency 2 f0 for phase interpolator (PI) subsequently. As is known, any duty
cycle distortion that offsets 50% leads to an IQ mismatch in the frequency divider. Since it is hard
to generate and maintain 50% duty cycle for global reference clocks, a duty cycle correction block
is desired before the frequency divider (Fig. 2.44b).
Two PIs are used to provide differential I-phase clocks and differential Q-phase clocks. As
shown in Fig. 2.44c, a traditional PI consists of four differential pairs as inputs, two for positive
or negative I-phase clocks (CLKI and CLKIB) and two for positive or negative Q-phase clocks
(CLKI and CLKQB). One of I-phase clocks and one of Q-phase clocks are selected and added
59
Figure 2.45: The classical clocking scheme based on an injection-locked mulitphase generator as
Prior Art 2
together each time in (2.38), supposing CLKI and CLKQ are chosen without loss of generality.
Iout = (1 − α)ICLK,I + αICLK,Q (2.38)
where α ranging from 0 to 1 indicates the phase shift from 0◦ to 90◦ relative to the phase of ICLK,I .
As a result, the four possible combinations of I-phase and Q-phase clocks, altogether, provide 360◦
phase rotation.
However, since the sum between I-phase and Q-phase clocks is linear, the phase constellation
of the rotator is diamond-shaped, as shown in Fig. 2.44d. Therefore, the traditional PI suffers from
inherent nonlinearity, resulting in a limited phase resolution within 6 bits, especially for high-
frequency applications.
In a nutshell, this clocking scheme requires 2 f0 clock generation and global distribution, de-
mands two power-hungry PIs and duty-cycle correction circuit, and suffers from limited phase
resolution due to the limitation of traditional PI.
Prior art 2: a solution with injection-locked multiphase generator
Fig. 2.45 shows the clocking scheme based on an injection-locked multiphase generator [57]
[58]. Compared with the classical solution in Prior Art 1, the employment of an injection-locked
multiphase generator brings the following advantages: (1) enabling the generation and distribution




Figure 2.46: Building blocks in Prior Art 2: (a) schematic of an injection-locked multiphase
generator; (b) schematic of a phase interpolator (PI) with 8-phase inputs; (c) phase constellation
of this PI
phase correction loop (Fig. 2.46a) to replace the duty cycle correction block; (3) generating 8-phase
clocks for PIs to improve phase resolution to 7 bits.
Fig. 2.46a shows the schematic of an injection-locked multiphase generator. This idea is orig-
inated from [38] and is known as quadrature locked loop, which is essentially the combination
of injection locking and delay-locked loop. An injection-locked 8-stage (4-differential-stage) ring
oscillator generates 8-phase output clocks from the global input clocks. An internal phase correc-
tion loop detects the 8-phase offsets of output clocks, and then eliminates them by controlling the
free-running frequency of the ring oscillator. Fig. 2.46b shows the schematic of 8-phase-input PI.
61
Figure 2.47: The classical clocking scheme with an injection-locked phase rotator as Prior Art 3
Its operation principles are similar to traditional PI’s. However, by upgrading the diamond-shaped
phase constellation of the rotator to an octagon-shape one (Fig. 2.46c), the resolution of phase
rotation gets improved at the cost of the increase of driving capacity of output clocks and the
complexity of control signals.
In summary, this clocking scheme reduces the power of the generation and distribution of
global reference clocks and enhances the phase rotation resolution compared with Prior Art 1.
Nevertheless, two power-hungry PIs with 8-phase inputs, as well as their complicated control sig-
nals, are required for quadrature phases. In addition, n high-performance injection-locked multi-
phase generator increases design complexity.
Prior art 3: a solution with injection-locked phase rotator
Fig. 2.47 shows the clocking scheme with an injection-locked phase rotator. Similar to Prior Art 1,
global reference clocks at frequency 2 f0 are generated and distributed in transceivers, and a duty
cycle correction block is employed in each channel locally. An injection-locked phase rotator,
which includes a 64-phase ring oscillator, replaces the two power-hungry PIs to rotate the phases
of quadrature clocks and enhance the power efficiency dramatically, avoiding the inherent nonlin-
earity of the classical PI. Compared to the traditional PI that rotates the phase by linearly adding
IQ signals, an injection-locked phase rotator achieves a higher phase rotation resolution by shifting
the injection locking positions to the ring oscillator, which detailed principle is introduced in the
62
Figure 2.48: The proposed power-efficient clocking scheme
next section.
In short, this clocking scheme increases the phase rotator’s resolution and reduces its power
consumption, but still consumes high power from the generation and distribution of global clocks.
Besides, the single-ended 64-phase coupled oscillator in the injection-locked phase rotator suffers
from the instability of a potential false locking issue.
2.7.3 Proposed power-efficient clocking scheme
Fig. 2.48 shows the proposed clocking scheme that combines the merits of Prior Art 1 and
Prior Art 2. An injection-locked multiphase generator is employed to transfer the differential
global reference clocks to quadrature clocks, and both of them are at frequency f0. Notably, only
a 4-phase injection-locked multiphase generator is needed to feed the next stage instead of an 8-
phase one in Prior Art 2, which dramatically reduces the design complexity. The injection-locked
phase rotator is used for precisely rotating the phases of the input quadrature clocks, including
digital logic, a fully differential 64-phase coupled oscillator, and output buffers.
Fig. 2.49 shows details of the injection-locked phase rotator that is part of the phase adjuster.
Digital logic is configured to receive four clock signals CK0, CK180, CK90, CK270 by the multi-
phase generator. The digital logic also receives a control word from digital CDR logic that indicates
how the phase of the four clock signals should be adjusted. In response to the control bits, the digi-
tal logic outputs appropriate injection signals across a 64-bit wide bus to access any of 64 injection
63
Figure 2.49: Schematic of the proposed phase adjuster
Figure 2.50: Schematic of the proposed 64-phase fully-differential oscillator, which is
implemented as 16 four-phase fully differential cross coupled sub-oscillators.
sites (shown in Fig. 2.50 and Fig. 2.51) in the fully differential 64-phase coupled oscillator. More
specifically, the digital logic outputs a 64-bit word (corresponding to injection positions) that is
supplied to the fully differential 64-phase coupled oscillator, which outputs four phase-corrected
clock signals CLKI , CLKIB, CLKQ, CLKQB (or output clocks from P[0], P[16], P[32], P[48]).
Those outputs are passed, respectively, through output buffers, and may then be supplied to sub-
ADCs.
64
Fig. 2.50 is a schematic diagram showing components of the proposed injection-locked phase
rotator, which is implemented as 16 four-phase fully differential cross-coupled oscillators (sub-
oscillator slices). These 16 sub-oscillator slices are further coupled using passive coupling elements
(e.g., resistive or capacitive) to generate 64 uniformly spaced phases. One sub-oscillator slice is
shown in detail in Fig. 2.50. Pairs of series-connected forwarding inverter cells are cross-coupled
to each other. Further, two pairs of cross-coupled inverters ensure that signals being carried by
cross-connected branches of the sub-oscillator slice remain as differentiated as possible. Injection
inverter respectively receives one of the 64 signals output by digital logic to control or adjust the
phase of the clock signals P[0], P[16], P[32], P[48] output of the fully differential 64-phase coupled
oscillator.
As shown Fig. 2.50,16 sub-oscillator slices are provided, resulting in 16 separate two-differential-
stage sub-oscillator slices (n = 0 - 15) that provide a resolution of 2π/64 for phase control. Each
sub-oscillator slice, which is referred to as an “inner ring,” determines the oscillating frequency.
The connection among the 16 sub-oscillator slices, which is referred to as an “outer ring,” defines
the phase resolution. The proposed architecture described herein is an improvement over the prior
art approach [59], which suffers from dependence between the inner rings and outer ring and leads
to incorrect phase resolution at higher oscillating frequencies. In contrast, due to the differential
nature of the sub-oscillator slices, the proposed architecture provides for the independence of each
inner ring, thus enabling stability at all conditions and preventing false locking.
Fig. 2.51 shows internal connectivity of the injection-locked phase rotator, that is, Fig. 2.51
shows the interconnectedness of the components of 16 sub-oscillator slices of the fully differential
64-phase coupled oscillator shown in Fig. 2.50. In Fig. 2.51, forwarding inverter cells shown as
black dots (four of which make up each sub-oscillator slice) are coupled via an “inner ring”. Or-
ange dash lines represent the connectivity established by the cross-coupled inverters of each sub-
oscillator slice, and is referred to as an “inner ring cc”. An “outer ring” depicted by a black broken
line represents how the connectivity provided by passive coupling elements couple individual for-
warding inverters cells to each other. Purple arrows represent injection positions corresponding to
65
Figure 2.51: A schematic showing internal connectivity of the proposed injection-locked phase
rotator.
injection inverters, of which there are 64 corresponding to the 64 wide connection between digital
logic and the fully differential 64-phase couple oscillator, as shown in Fig. 2.49. Finally, outputs
for the four multiphase clock signals CLKI , CLKIB, CLKQ, CLKQB may come from four outputs
(i.e., output buffers in Fig. 2.49) driven by any four nodes that are 90 degrees apart from each other,
such as those labelled P[0], P[16], P[32], P[48].
Another important difference between the prior art [57] and the proposed architecture is the im-
plementation of injection locking. Two major implementations are shown in Fig. 2.52. In Fig. 2.52a,
only a single-ended signal is needed to short the differential stages in the oscillator by opening the
NMOS between them. In Fig. 2.52b, a pair of differential signals is required to drive two injection
inverters separately. Though the former solution is convenient to implement, it suffers a potential
mismatch between differential signals. As illustrated in Fig. 2.53, the impact of the injection sig-
nals on differential stages are identical when Iin j and −Iin jb are the same. However, Iin j and −Iin jb




Figure 2.52: Two methods for the implementation of injection locking: (a) injection locking with
a single-ended signal by shorting the differential stages through NMOS; (b) injection locking
with a pair of differential signals through two inverters.
Figure 2.53: Issues of injection locking with a single-ended signal by shorting the differential
stages through NMOM.
applications. Since a double frequency, 2 f0 is injected to the oscillator in [59], this mismatch is
eliminated by synchronizing twice per period. But in the proposed scheme where only frequency
67
f0 is generated and distributed, the injection implementation in Fig. 2.52a causes several error be-
tween CLKI and CLKIB (CLKQ and CLKQB). Instead, the injection implementation in Fig. 2.52b
takes advantage of the differential inputs and avoids any possible mismatch.
To sum up, the phase control scheme described herein combines the benefits of a multiphase
generator with a dynamic multiphase injection locking (DMIL) technique for phase control. The
clocking scheme described provides a low power solution for global clock generation and distribu-
tion for multi-lane receivers. Further, this method achieves high accuracy and low power for phase
rotation correction and frequency error correction.
68
Chapter 3: Analysis of Injection-Locked Ring Oscillators for Quadrature
Clock Generation in Wireline or Optical Transceivers
3.1 Introduction
Injection-locked ring oscillators (ILROs) have been successfully demonstrated to be a low-area,
low-power, low-jitter, quadrature (IQ) clock generation solution for various applications, including
wireless transceivers [60], high-performance microprocessors [61], multi-channel serial links re-
ceivers for multi-core processing and networking applications [62], and the transceivers embedded
in field-programmable gate arrays (FPGAs) [63].
Fig. 3.1a shows the quadrature clock generator (QCG) in a multi-lane wireline or optical re-
ceiver with a clock data recovery (CDR) based clocking scheme [57]. A global clock generator,
which may be a frequency synthesizer or a crystal oscillator, supplies the reference and distributes
it differentially across the chip for low power consumption. Accurate quadrature clocks are gener-
ated locally by the QCG. Phase interpolators (PIs) in each lane, controlled by the feedback signals
from digital CDR logic, compensate the frequency and timing error between the incoming data and
quadrature clocks and feed the analog-to-digital converters (ADCs) in each RX. Since there is no
frequency error between the coming data and clock reference with a feedforward clocking scheme
(Fig. 3.1b), a delay line replaces PIs in each lane, and the outputs of the QCG directly feed ADCs.
Fig. 3.1c shows the QCG in a multi-lane wireline or optical transmitter [32] [33]. A global clock
generator and distribution are developed locally, similar to the receiver. A delay line and a QCG
provide accurate quadrature clocks to a four-to-one multiplexer (4:1 MUX) in each TX lane.
Fig. 3.2 depicts two options for the implementation of the QCG. The QCG based on frequency
divider (FD) in Fig. 3.2a requires double frequency clocks (2 fclk) as the inputs, resulting in high





Figure 3.1: (a) Quadrature clock generator (QCG) in a multi-lane wireline or optical receiver with
a clock data recovery (CDR) based clocking scheme, (b) with a feedforward clocking scheme; (c)
QCG in a multi-lane wireline or optical transmitter.
duty cycle mismatches before the frequency divider directly transform to output quadrature (IQ)
errors, a power-hungry duty cycle correction is needed in this solution. Fig. 3.2b shows the QCG
based on an ILRO, where a differential (i.e., two-phase) signal at fclk is the input, and quadrature




Figure 3.2: Two architectures for QCG: (a) based on a frequency divider (FD); (b) based on an
injection-locked ring oscillator.
solution eliminates the power in the duty cycle correction and reduces the global clock genera-
tion and distribution power. Since ILROs are sensitive to process, voltage, and temperature (PVT)
variations and suffer from a limited locking range, a correction loop, such as low-power frequency
tracking loop (FTL) [64] [65] or quadrature locked loop (QLL) [38], needs to be used. Typically,
an ILRO-based QCG achieves better performance and a wider locking range.
In this chapter, we analyze QCGs based on ILROs. A single-ended or differential (i.e., two-
phase) signal provided by the global clock generator is used as an injection signal. The ring os-
cillator typically has two or four differential stages to generate a multi-phase clock. This means
that only some of the RO stages receive an injection signal, and some do not, resulting in a partial
injection. Operation of the ILRO with strong injection is preferred to obtain a large locking range
and low phase noise and low jitter [66] [67]. In modern processes, inverter-based ring oscilla-
tors are preferred for their ease of implementation, high energy efficiency, and scalability with the
technology. However, the partial injection into the RO leads to an imbalance between the injection
stages and non-injection stages, including amplitude and phase error. This then impacts crucial de-
sign parameters, such as IQ mismatch, injection locking range, and jitter performance. Therefore,
its design trade-offs are different from the injection-locked LC oscillator and the ILRO with full
multi-phase injection. Therefore, its design trade-offs are different from the injection-locked LC
oscillator and the ILRO with full multi-phase injection.




Figure 3.3: Design trade-offs in QCGs with injection-locked LC oscillators [68]: (a) the
relationship between f0 and jitter performance; (b) the relationship between f0 and IQ mismatch;
design trade-offs in ILRO-based QCGs: (c) the relationship between f0 and jitter performance;
(d) the relationship between f0 and IQ mismatch.
noise of the QCG is at the minimum value when the injection frequency fin j is equal to the oscil-
lator intrinsic frequency f0, i.e., fin j = f0, as shown in Fig. 3.3a. The upper and lower boundaries
of the locking range are symmetrical, i.e., ∆ fup = ∆ fdn = ∆ f . The IQ mismatch is zero when
fin j = f0 (Fig. 3.3b), so the frequencies f0 of these oscillators are designed to be equal to the in-
jection frequency fin j . Higher injection strength (or coupling strength) improves the locking range
and phase noise at the cost of power consumption.
The analysis in [68] is not applicable to ILRO-based QCGs since the quality factor (Q) value
is unknown. The phase domain response (PDR) analysis [69] [70] predicts the locking range and
optimal phase noise, but it is based on simulation and does not provide intuitive insights into the
72
design trade-offs. In this chapter, we present a modified frequency-domain analysis that captures
the imbalance in amplitude and explains the detailed operation in ILRO-based QCG. This analysis
predicts the IQ mismatch, locking range, amplitude changes, and phase relationship, as shown
in Fig. 3.3c and Fig. 3.3d. The calculated results are validated by simulations of a QCG circuit
implementation in a 28nm CMOS technology. The analysis further reveals the design trade-offs
among jitter performance, power consumption, and IQ mismatch for ILRO-based QCG. These
trade-offs are employed in [71].
We review the previous njection-locking analysis techniques in Section 3.2 and propose the
modified frequency-domain analysis in Section 3.3. Simulation results and design trade-offs are
discussed in Section 3.4, and Section 3.5 presents the conclusions.
3.2 Existing Analysis Techniques for ILROs
The theoretical analysis helps designers understand circuit performance and the fundamental
design trade-offs. For ILROs, specifically for quadrature ILROs, we pay attention to jitter perfor-
mance, power consumption, locking range, and IQ mismatches. In this section, we review existing
analysis techniques, including frequency-domain, time-domain, and phase-domain-response anal-
ysis, and summarize their advantages and disadvantages. Because of the complexity of ILROs, not
a single analysis can capture all their characteristics precisely.
3.2.1 Frequency-domain analysis
As shown in Fig. 3.4, a classical phasor method ([66] and its references) is proposed to depict
injection-locked LC oscillators. Given the open-loop characteristics, including the value of induc-
tance, capacitance, and resistance, we can calculate the amplitude-frequency and phase-frequency
response in the LCR tank. Therefore, the phase shift before and after injection locking is illustrated
directly through a current phasor diagram; in addition, the injection locking range is calculated. The
analysis provides an intuitive interpretation of Adler’s equation [72] and has been widely used to
describe the principle of injection locking. It is noted that the transfer function of the LC tank must
73
Figure 3.4: Model of the LC oscillator used in traditional frequency-domain analysis.
be known for the frequency-domain analysis.
Instead of the LCR parameters in injection-locked LC oscillators, the transfer function based
on resistance and capacitance (RC) value is calculated in ILROs [73]. Therefore, a similar phasor
analysis is employed, assuming that only fundamental frequency (injection frequency) is in the
injection loop and the higher-order harmonics are filtered out. This method also can be applied to
non-harmonic oscillators [74] and the strong injection cases [75].
In summary, frequency-domain analysis introduces an intuitive phasor diagram, predicting the
locking range of ILROs. However, the previous publications [73] [74] [75] only discuss odd-stage
ILROs with full multi-phase injection.
3.2.2 Time-domain analysis
In the time-domain analysis, each stage in the ring oscillator is modeled as a delay cell, and its
delay is calculated by charging and discharging of an RC model [76] [77] [78] [79]. This analysis
better captures the nonlinear circuit behavior through the use of time-domain waveforms; how-
ever, the derivations are based on the assumption that the transconductor gm clips at the input
voltage zero-crossing points. Since a clipping transconductor acts as a limiter, it cannot depict the
imbalance between the injection and non-injection stages in an asymmetrical or partial injection
locking oscillator. Though a time-stepped approximation [80] [81] could be used to describe a real
transconductor, the analysis becomes complex and non-intuitive with the introduction of several
74
other assumptions. In addition, time-domain analysis requires the exact RC values for differential
equations, but these values are difficult to estimate in ring oscillators.
3.2.3 Phase-domain-response analysis
The phase-domain response (PDR) analysis is developed from the impulse sensitivity function
(ISF) of an oscillator that is used to model phase noise [82] [83]. The injection signal impacts both
the amplitude and phase of the oscillator, and its amplitude change decays while the phase pertur-
bation remains. The impact on the phase change depends on the injection positions to the oscillator
phase and repeats at the intrinsic frequency f0. This periodic phase change can be simulated by
imposing a single pulse on the oscillator. Based on this simulation result, a phase-domain model is
introduced and analyzed in [69] [84]. The theoretical analysis of PDR, especially for large-signal
injection in wireline or optical transceivers, is derived in [70], but it is limited to LC oscillators.
In summary, the PDR analysis can predict the locking range and noise bandwidth in ring oscil-
lators with weak injection [69] [84] and LC oscillators [70]. It captures the impact of an asymmet-
rical or partial injection, which is instructive for the design of ILROs. However, PDR analysis is a
simulation-based analysis, and it offers limited design insights.
3.3 Modified Frequency-Domain Analysis for ILRO-based QCG
3.3.1 Basis of frequency-domain analysis for RO
For high-speed, low-power clocks in the wireline or optical system, the transition time occu-
pies a large portion of a cycle. Hence the voltage outputs in ring oscillators approach sinusoidal
waveforms rather than square-wave waveforms. Therefore, it is feasible to assume that a single
dominant frequency exists in such ILROs for frequency-domain analysis.
Fig. 3.5a shows the model of an odd N-stage injection-locked ring oscillator, in which Gm, C,
and R represent transconductance, capacitance, and resistance respectively in each stage. Since N
is an odd number, the minus sign in each transconductance can be replaced by an ideal inverter




Figure 3.5: (a) Model for an odd N-stage ILRO; (b) a simplified model of (a) with an ideal
inverter in the loop [73].
and its amplitude and phase vs. frequency are shown in Fig. 3.6a. An injection signal at frequency
fin j is introduced in a single-stage, e.g., stage 1 without loss of generality. There are two important
assumptions to behind this model.
• Assumption 1: only the fundamental frequency component exists in the ILRO.
• Assumption 2: in a steady state, the total phase shift around the loop in the ILRO is 2π rad.
When the injection signal is zero, all the N stages are identical. If we assume that f0 is the




1 + j2π f0RC
(3.1a)




Figure 3.6: (a) Amplitude-frequency and phase-frequency response of the impedance ZL of RC
tank; (b) current phasor diagram
where φ is the phase shift in each stage.
When the injection signal is applied, the ILRO oscillates at the injection frequency fin j in
a steady state. The angle between the injection current Iin j and oscillator output current Iosc is
defined as θ. A phase shift η from RC tank is generated due to the difference between fin j and f0
as shown in Fig. 3.6b. Based on these assumptions, and using φ1 to φN to represent the total phase
shift in each stage, we can derive:
φ2 = ... = φn =
π
N
+ η = tan−1(2π fin j RC) (3.2a)
N∑
i=1
φi = π (3.2b)
Combining (3.1) and (3.2a), we get (3.3) by using a Taylor series.
η ≈
tan(π/N )
1 + tan2(π/N )
fin j − f0
f0
(3.3)
Combining (3.2b) and (3.3), we can solve for the current amplitudes Itot and Iin j , as shown in
77
Fig. 3.6b. As a result, for a given injection current strength |Iin j |/|Iosc |, we get certain f0 and θ in
a one-to-one relationship and can calculate upper/lower bounderies of the locking range.
3.3.2 Modified frequency-domain analysis for quadrature RO
The model above works for an odd-stage ring oscillator, with a number of stages N larger or
equal to 3, where the phase shift φ from the RC tank in each stage is between 0 and π/2. However,
this model cannot be applied to the quadrature RO in Fig. 3.7a. Since the number of stages is
four, each stage requires a π/2 phase shift, which is not achievable with an RC tank. We assume
this limitation is overcome by adding the currents of the forward and cross-coupled paths. Besides
the two Assumptions in the conventional frequency-domain analysis in the previous section, we
introduce one more assumption:
• Assumption 3: at a steady state, the loop gain in the ILRO is 1.
Fig. 3.7b shows the model of the quadrature RO in Fig. 3.7a: R and C stand for the capacitance
and resistance at each stage; V0, V90, V180, and V270 represent the voltages of the four stages, and
they are π/2 apart from each other and have an identical amplitude Vs; Gm_ f wd and Gm_cc are
the transconductance in the forward path and cross-coupled path, respectively, and generate the
corresponding output current I f wd and Icc. We define the ratio between the cross-coupled inverter







Fig. 3.7c shows the current phasor diagram. In each stage, Iosc that is the sum of I f wd and Icc
generates a phase shift φs. Their relationship are:
φs1 = φs2 = tan−1(
Icc1
I f wd1
) = tan−1(k) (3.5a)











Figure 3.7: (a) Schematic of the differentical quadrature RO; (b) frequency-domain model of the
differentical quadrature RO; (c) current phasor diagram of the differentical quadrature RO.
We define the phase shift φrc from the RC tank in each stage. From the phase constraint in
79
Assumption 2 and the gain constraint in Assumption 3, we derive:
φs1 + φrc = π/2 (3.6a)
|V90 | = Vs = |Iosc1 |

R
1 + j2π f0RC
 (3.6b)
Combining (3.5) with (3.6), we obtain:
2π f0RC = 1/k (3.7a)





which reveals the relationship between the value of R, C, Gm_ f wd , and f0 for a given k.
3.3.3 Modified frequency-domain analysis for ILRO-based QCG
Fig. 3.8a and Fig. 3.8b shows the schematic and the model of an ILRO-based QCG. Because of
the injection current Iin j , non-injection stages and injection stages become unbalanced. We define





Compared with the oscillator in Fig. 3.7, the oscillator now operates at injection frequency fin j
rather than intrinsic frequency f0, and we introduce a variable γ = fin j/ f0 to depict the frequency
shift. The amplitude between injection stages and non-injection stages are changed after the injec-
tion, so we introduce two variables a1 and a2, and define the amplitudes as a1Vs and a2Vs, respec-
tively. Because of the symmetry, the differential phases in injection stages and non-injection stages
remain π. However, the quadrature phases cannot be π/2 and we define the quadrature phases as
π/2 − ε or π/2 + ε .
Fig. 3.8c shows the phase diagram of an ILRO-based QCG with a two-phase injection. Based





Figure 3.8: (a) Schematic of the ILRO-based QCG; (b) frequency-domain model of the
ILRO-based QCG ; (c) current phasor diagram of the ILRO-based QCG.
81




j2π fin j RC + 1
 = |a1Vs | (3.9a)
φ′s1 − ψ + φ
′




j2π fin j RC + 1
 = |a2Vs | (3.9c)
(φ′s2 − ε ) + φ
′




(a1kcos(ε )2 + (a2 + a1ksin(ε )2) +
√
1 + k2 βcos(θ))]2 + [
√









a2 + a1ksin(ε )
]
− arctan
[ √1 + k2 βsin(θ)
[
√
(a1kcos(ε )2 + (a2 + a1ksin(ε )2) +
√





























Combined with (3.7), (3.8), and the current phasor diagram in Fig. 3.8c, we can rewrite (3.9) as
(3.10).
For a given design parameter k and β, there are five variables a1, a2, ε , γ, and θ in (3.10).
We use θ as an independent variable and derive the relationship between θ and the other four
variables. Since analytical solutions for (3.10) are not obtainable, we employ the "fsolve" function
in MATLAB for numerical solutions by sweeping θ from −π to π. However, we still need to define
the initial values for variables and make an approximation.
For the inital values of the variables, we use their values before the injection. a1 = a2 = 1,
82











































Figure 3.9: Solutions of (3.10) with initial conditions and approximation (3.11), when k = 0.3,
fin j = 2.5 GHz, and β = 0.1.
6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4






























Figure 3.10: Relationship between intrinsic frequency f0 and IQ mismatch in the proposed
analysis.













When k = 0.3, fin j = 2.5 GHz, and β = 0.1, Fig. 3.9 shows solutions of (3.10) for θ varying
from −π (−180◦) to π (180◦). The Y-axis on the left shows the IQ mismatch ε with the unit ◦, and
the Y-axis on the right shows the values for a1, a2, and γ with the unit 1.
From Fig. 3.9, we derive the relationship between f0 and ε in the blue curve in Fig. 3.10, where
f0 = fin j/γ. However, as shown in Fig. 3.8c, the increase of θ and the decrease of φ′s1, altogether,
83


































Figure 3.11: Relationship between θ and φ′s1 and its derivative dφ
′
s1/dθ in the proposed analysis.
Figure 3.12: Schematic of the testbench.




Fig. 3.11 plots θ vs. φ′s1 and θ vs. dφ
′
s1/dθ. Finally, the stable solution region is marked in Fig. 3.10
with the diamond markers.
As a result, we can get one-to-one relationships between injection included angle θ, intrinsic
frequency f0, and IQ mismatch ε for an ILRO-based QCG. Fig. 3.10 also compares our solutions
with the solutions in [38]. The previous analysis plots a linear relationship between f0 and ε and
cannot predict the locking range.
84




























































































Figure 3.13: Calculated and simulated results at injection frequency fin j = 2.5 GHz and injection
strength β = 0.1. (a) the relationship between amplitude ratio a1, a2, injection included angle θ,
and intrinsic frequency f0; (b) the relationship between IQ mismatch ε , jitter at injection stages,
jitter at non-injection stages, and f0.
3.4 Simulation Results and Design Trade-offs
The proposed theory is verified with simulations in 28 nm CMOS technology. As shown
in Fig.3.12, the size of the PMOS and NMOS of the forward inverters are 4.4um/30nm and
4um/30nm, and the size of the PMOS and NMOS of the cross-coupled inverters are 2.2um/30nm
and 2um/30nm. We tune the capacitor load C to change the intrinsic frequency f0 of the oscillator.
A sinusoidal current set at 2.5 GHz or 7 GHz in our verification is injected between 90◦ and 270◦
of the oscillator as a differential injection signal. The test bench is simulated in the TT corner, at 27
◦C, and with a 0.9 V supply. We calculate their amplitudes with RMS values and their angles with
85





























































































Figure 3.14: Calculated and simulated results at injection frequency fin j = 2.5 GHz and injection
strength β = 0.2. (a) the relationship between amplitude ratio a1, a2, injection included angle θ,
and intrinsic frequency f0; (b) the relationship between IQ mismatch ε , jitter at injection stages,
jitter at non-injection stages, and f0.
zero-crossing points in the simulation. In our test bench, the cross-coupled strength k is 0.24 from
the simulation, and we simulate with injection strength β at 0.1 and 0.2. The oscillator consumes
2.96 mW in the above cases.
Fig. 3.13, Fig. 3.14, Fig. 3.15, and Fig. 3.16 show the calculated and simulated results at fin j =
2.5 GHz with β = 0.1, fin j = 2.5 GHz with β = 0.2, fin j = 7 GHz with β = 0.1, and
fin j = 7 GHz with β = 0.2, respectively when the oscillator is injection locked. In part (a) of
each figure, we plot the normalized amplitude a1, a2, and the angle between injection current and
oscillator output current θ vs. intrinsic frequency f0. The calculated values of these variables and
locking range from the proposed analysis in Section 3.3 generally match the simulated results. We
86



























































































Figure 3.15: Calculated and simulated results at injection frequency fin j = 7 GHz and injection
strength β = 0.1. (a) the relationship between amplitude ratio a1, a2, injection included angle θ,
and intrinsic frequency f0; (b) the relationship between IQ mismatch ε , jitter at injection stages,
jitter at non-injection stages, and f0.
explain the amplitude changes and the phase relationship between injection signals and oscillating
signals with the proposed analysis, which helps us understand the detailed operation in ILRO-based
QCGs.
In part (b) of each figure, we plot the calculated and simulated IQ mismatch ε and the simulated
jitter at the injection stages and the non-injection stages vs. f0. The results demostrate the diagram
in Fig. 3.3c and Fig. 3.3d. In LC-based QCG, IQ mismatch ε = 0 when f0 = fin j [68]. Therefore,
intrinsic frequency f0 is designed and calibrated to be equal to fin j to achieve the best jitter and
IQ mismatch performance. However, in ILRO-based QCG, the IQ mismatch ε approaches almost
the maximum error when f0 = fin j . Zero IQ mismatch points, i.e. ε = 0, are located outside
87


























































































Figure 3.16: Calculated and simulated results at injection frequency fin j = 7 GHz and injection
strength β = 0.2. (a) the relationship between amplitude ratio a1, a2, injection included angle θ,
and intrinsic frequency f0; (b) the relationship between IQ mismatch ε , jitter at injection stages,
jitter at non-injection stages, and f0.
of the locking range even with strong injection (β = 0.2). With the help of a phase correction
loop (FTL [64] [65] or QLL [38]) in Fig. 3.2b, ILRO-based QCG could achieve ε = 0, but the
jitter performance is much worse. When f0 = fin j , the jitters are 131 fs and 59.4 fs in Fig. 3.14
and Fig. 3.16, but their IQ mismatches are 5.35 degree and 5.10 degree, respectively; when the
IQ mismatches achieve the minimum value 0.40 degree, their jitters are 392 fs and 169 fs, which
is almost three times larger than the optimal jitter values. As a result, ILRO-based QCGs cannot
achieve optimal IQ mismatch and jitter performance like LC-based QCGs; specifically, the best IQ
mismatch almost generates the worse jitter performance in ILRO-based QCGs and vice versa. For
the applications in Fig. 3.1a and Fig. 3.1c, IQ mismatch determines PIs’ differential nonlinearity
88
(DNL) and integral nonlinearity (INL); for the applications in Fig. 3.1b, IQ mismatch transforms
to the phase offset in ADC sampling clocks. Jitter performance affects clocks’ quality directly.
Therefore, there is a basic design trade-off between IQ mismatch and jitter performance in ILRO-
based QCG for system designers.
The proposed analysis also explains that f0 should be always smaller than fin j for a stable
minimum IQ mismatch. In addition, the upper locking range∆ fup and lower locking range∆ fdn are
not equal due to the imbalance in the ILRO. From Fig.3.13b, Fig.3.14b, Fig.3.15b and Fig.3.16b, a
larger injection strengh β improves the best jitter performance, widens the injection locking range,
but also leads to the maximum IQ mismatch error. For systems that require zero IQ mismatches,
since ε = 0 are located outside (or at the edge) of the locking range, increasing injection strength
cannot enlarge the noise bandwidth, thus, it cannot improve the jitter performance in [38].
Though there are differences between the calculated and simulated values, their trends match
well. The offsets are mainly from the harmonics in RO and the high-order effect from transcon-
ductance Gm_ f wd and Gm_cc, which are the disadvantages of frequency-domain analysis.
3.5 Conclusions
The quadrature-clock generator (QCG) is an essential building block in the wireline or optical
data link transceivers. Injection-locked ring-oscillator (ILRO) based QCGs transform a global, dif-
ferential reference clock into four-phase quadrature clocks. They have been widely used in multiple
applications due to their low power, low area, and technology scalability. Since an asymmetrical
or partial injection leads to changes in both phase and amplitude for a ring oscillator, the de-
sign trade-offs for ILRO-based QCG are different from traditional LC-based QCG. This chapter
presents a modified frequency-domain analysis for four-stage differential ILROs by introducing an
amplitude constraint. We explain the amplitude changes and the phase relationship in ILRO-based
QCGs. Furthermore, this chapter proves that ILRO-based QCGs achieve the best IQ mismatch with
the almost worse jitter performance and vice versa and that f0 should be always smaller than fin j
for a stable minimum IQ mismatch. Also, we cannot improve the jitter performance by increasing
89
injection strength β for an ILRO-based QCG with ε = 0. The proposed analysis is validated by
simulations in 28 nm CMOS technology.
90
Chapter 4: An Out-of-Band IM3 Cancellation Technique for Wideband
Wireless Receivers
4.1 Introduction
Modern wideband receiver architectures [21] [22] [23] [24] [85] [86] [87] [88] [89] [90] [91]
[92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] provide a broad tuning range
to support multi-band, multi-standard wireless communications in next generation systems. How-
ever, in such systems, out-of-band (OB) blockers can degrade the receiver sensitivity due to the
limited external input filtering. In Fig. 4.1, we review the mechanisms that can affect the receiver
sensitivity. At low blocker powers, the receiver noise figure (NF) and the associated thermal noise
floor dominates the in-band noise. Once the blocker power becomes sufficiently large, reciprocal
mixing with the local-oscillator (LO) OB phase noise or with OB noise from the blocker(s) dom-
inate the IB interference at baseband. For every dB increase in blocker power, the IB interference
grows by 1 dB. This receiver performance is often quantified for a single blocker by measuring the
blocking NF and gain compression (evaluated with the B1dB). However, when two blockers are
present, the IB interference is typically dominated by the third-order intermodulation (IM3), since
it increases by 3 dB for each dB of blocker power increase. As a result, wideband receivers sub-
jected to intermodulating, multi-band interferers require very large effective OB third-order input
intercept points (IIP3). Several receiver architectures have been proposed to improve the intrin-
sic OB-IIP3 of wideband receivers. The mixer-first receiver (Fig. 4.2a) improves the linearity by
removing the low-noise amplifier (LNA) from the RF front end resulting in impressive OB IIP3
[22] [23] [85] [86] [87] [88] [89] [90], as well as good noise performance [24] [91] [92] [93] [94]
[95]. However, given the lack of an LNA, LO leakage is high (often around −60 dBm); LO leak-
age cancellation techniques can provide some limited suppression but at the cost of large area and
91
Figure 4.1: Wideband receiver desensitization mechanisms.
power consumption [96]. On-chip, high-Q, band-pass, N-path filters (Fig. 4.2b) can replace off-
chip, tunable RF filters to reject the OB interferers [97] [98] [99]. However, they introduce extra
loss and thus a noise-figure degradation, and again LO leakage; they further consume significant
power at high frequency. The frequency-translational noise-cancelling (FTNC) receiver [21] [100]
(Fig. 4.2c) attains superior NF by canceling the thermal noise of the matching resistance, but it has
only a moderate OB-IIP3 and again suffers from LO leakage. In [101], the impedance matching is
implemented with a common-gate (CG), low-noise, trans-impedance amplifier (LNTA) to elimi-
nate the LO leakage, but a 2.5 V supply is required to create sufficient headroom for high linearity.
Receivers with frequency-selective feedback [102] [103] [104] (Fig. 4.2d) employ a notch filter in
the feedback loop to suppress blockers; efficient implementations of the notch filter rely on N-path
(NP) realizations which again results in LO leakage.
Digital-assisted cancellation (Fig. 4.2e) has also been used to improve linearity without in-






Figure 4.2: Existing approaches to achieve high OB linearity in wideband receivers: (a)
mixer-first receiver; (b) a receiver preceded by an on-chip, N-path band-pass, RF filter; (c)
frequency-translational noise-cancelling (FTNC) receiver; (d) a receiver with frequency-selective
feedback; (e) a receiver with an RF, IM3-cancellation path.
intermodulation products for wideband RF input signals. The products were then downconverted,
equalized and subtracted in the baseband from the corrupted signal in the main path. A voltage-
mode RF LNA is needed before the AP to realize impedance matching and improve noise perfor-
mance. However, since the IM3 products produced by the trans-impedance amplifier (gm cell) can
not be cancelled, the linearity performance of the receiver is significantly degraded. Besides, the
tuning range is limited by the inductor-based LNA.
We propose a cancellation approach to remove IM3 products in LNTA-based wideband re-
ceivers using a baseband auxiliary path [106] that significantly increases the OB-IIP3 while over-
coming important challenges associated with previous solutions. We demonstrate the approach in
93
Figure 4.3: Concept of the proposed IM3 cancellation approach using a baseband auxiliary path
for LNTA-based receivers.
an FTNC receiver that achieves low noise, good LO isolation, and high OB IIP3 after IM3 prod-
uct cancellation. Organization of the chapter is as follows: Section 4.2 introduces the cancellation
technique and analyzes its operation. Circuit implementation and considerations are presented in
Section 4.3 and measurement results are presented in Section 4.4, followed by conclusions in Sec-
tion 4.5.
4.2 IM3 Cancellation Using A Baseband Auxiliary Path
RF-Current-mode receivers use an LNTA followed by a passive current-driven mixer [21] [94]
[101] [107] [108] and have become very popular for wideband applications since they offer su-
perior linearity performance over LNA-based architectures operating in RF voltage mode. The
low-pass input impedance of the baseband transimpedance amplifier (TIA) is frequency translated
by the passive mixer into a band-pass load for the LNTA that suppresses the OB interferers. As a
result, the LNTA’s input linearity remains as the main limitation for OB receiver linearity. The key
94
challenge to do distortion cancellation in baseband is that the sources of the IM3 products, i.e. the
OB blockers, are filtered by the RF front end and are not available in the digital baseband. We now
describe an IM3 distortion cancellation technique that leverages the RF-current-mode architecture
to make efficient cancellation in the baseband possible.
4.2.1 Concept
The proposed IM3 cancellation approach (Fig. 4.3) adds an auxiliary path (AP) to the main path
(MP) of an RF-current-mode receiver; the MP consists of the LNTA, the passive, current-driven
mixer and the TIA. The baseband AP connects to the mixer to first capture the OB interferers,
and then reconstruct the IM3 products. The detailed operation of this architecture is as follows.
At node 1, the wideband LNTA receives the desired IB signal as well as the strong OB interferers
that introduce IB IM3 products. At node 2, the IB and OB signals are coherently down-converted
and then split into the MP and AP respectively. The desired IB signal with unwanted IM3 products
is filtered and amplified at node 3 in the MP. The OB interferers are captured by the capacitively
coupled current buffer at node 4 in the AP. Cubing circuits reconstruct the IM3 products at node 5,
and the TIAs filter and amplify them at node 6. Finally, after gain and phase shifting, the AP’s
output is added to the MP’s output at node 7 resulting in cancellation of the IM3 in the MP and
OB linearity enhancement at node 8.
4.2.2 Design considerations for the auxiliary Path
A key requirement is that the addition of the AP does not degrade the performance of the
MP. First, we review the impact on the load impedance seen by the LNTA. The 25% duty-cycle
current-driven passive mixer (Fig. 4.4) translates the low-pass TIA input impedance into a band-
pass LNTA load impedance around the LO frequency; the input impedance of the mixer, Zin, is
given by [109].
Zin(ω) = RSW + 4
∞∑
n=−∞
|an |2ZBB (ω − nωLO) (4.1)
In a standard receiver, a capacitor CG is placed at the TIA input to shunt the OB signals to
95
(a) (b)
Figure 4.4: Impact of the auxiliary path (AP) on the equivalent input baseband impedance (ZBB)
presented to the passive mixer by the baseband TIA: (a) without the auxiliary path; (b) with the
auxiliary path.
ground (Fig. 4.4a); in this work, the capacitor CG is connected into the AP, which has equivalent
input impedance ZG, rather than ground (Fig. 4.4b). The baseband load impedance for both cases

















(1 + jaω)(1 + jbω)
(4.2c)
ZBB1 =
jω(CF RFro) + RF + ro




( jω)2(CFCG RFroZG) + jω[(CG ZGr (RF + ro) + CF RFro)] + RF + ro
( jω)2[CGCF RF (ro + ZG (1 + A(ω)))] + jω[(CG ZG (1 + A(ω)) + CF RF ) + CG (RF + ro)) + (1 + A(ω))]
(4.4)
where A(ω) is the two-pole open-loop gain of the TIA. Solving for ZBB, we obtain (4.3) and
(4.4). Ideally, we assume that both ω and CG are large enough. As a result, 1/( jωCG) becomes
much smaller than ZBB due to reduction of A(ω) according to (4.2). The OB the equivalent input
impedances then are:
Zin1 ≈ RSW (4.5a)




( 1jωCF ‖RF ) + ro
1 + A(ω)
]} (4.5b)
A larger Zin2 due to the AP introduces extra distortion from baseband circuits and generates
more thermal noise, thus degrading both the linearity and noise performance of the MP. Since RSW
is always designed to be less than a few Ohms, a very-low input resistance ZG is needed in the AP
to minimize performance penalties on the MP.
Another important AP design consideration is the conversion gain Aint for OB interferers from
node 1 to node 4 (Fig. 4.3). With too large Aint , the input of the cubing circuits gets saturated by
the strong OB interferers, and unwanted high order distortion products would be generated; for
too small Aint , the final output after cancellation would suffer from a noise penalty because of the
higher NF in the AP. In this work, we design an extra-low-input-impedance current buffer (CB)
that provides variable conversion gain for OB interferers. Additionally, large bandwidth for Aint is
required to captured wide range OB interferers.
4.2.3 Design of the current buffer in the AP
Taking input impedance and gain requirements for the CB into consideration, we now review




Figure 4.5: Design of the current buffer (CB) in the auxiliary path: (a) the connections for the
differential CB; (b) the comparison of the equivalent input impedance from three possible CB
implementations; (c) CB with passive components; (d) operational ampilfier based CB; (e) the
proposed low-input impedance CB with programmable gain
compares the equivalent input impedance from three possible solutions for CB (shown in Fig. 4.5c,
Fig. 4.5d, and Fig. 4.5e, respectively). In Fig. 4.5c, a passive solution is shown with a variable re-
sistor directly connected to the capacitor CG, which transforms the OB current into a voltage and
feeds to the AP circuits through an AC coupling capacitor. Almost no extra power is consumed,
for all components are passive. However, the high input resistance significantly degrades the lin-
earity and noise performance of the MP as discussed in the previous section, especially when large
conversion gain Aint is required. An active current buffers is the only way to get a sufficiently
low impedance while maintaining a sufficient Aint . In Fig. 4.5d, an operational-amplifier (op-amp)
based CB is employed to reduce the input impedance. In [110], an inverter is used as simple op-










Where R f _d is the feedback resistor which determines the gain for the OB interferers, ro_d is the
output resistance and gm_d is the transconductance of the inverter. For ZG_d = 5 Ω, gm_d > 200 mS
is needed for each branch, which leads to a large W/L ratio and huge power consumption. In
addition, to reduce the flicker noise corner, the area of the amplifier should be large. We employ a
low input impedance CB shown in Fig. 4.5e. M12 is the common gate transistor, while the voltage
at the source of M13 is connected to the gate of M11 forming a feedback loop with more current
going through M11. Following the derivation in. The equivalent input impedance of the current
buffer is dramatically reduced to [111]:
ZG_e =
1
gm_M11 · gm_M12 · ro_M13 + gm_M12
(4.7)
where gm_M11, gm_M12, ro_M13 are the transconductances and output resistance for the MOSFET
M11, M12, and M13 respectively. M14 replicates the input current, and then outputs the voltage
with a pair of programmable pseudo differential resistors. Common-mode feedback (CMFB) cir-
cuits provide the required DC bias point for the next stage. The capacitor Cx is used to reduce the
distortion from high order RF signal feedthrough, thus improving the linearity.
4.2.4 Cancellation of IM3 products
We now derive how the IM3 product cancellation works. For simplicity, we neglect the higher
order intermodulation distortions. We start with a model for the transmitter as shown in Fig. 4.6a.
The transmitted baseband signals s(t) consist of the I path signal I (t) and Q path Q(t) as s(t) =
I (t) + jQ(t), and s̄(t) is the conjugated signal of s(t). All signals are up-converted with carrier




Figure 4.6: (a) Model for the transmitter; (b) mathematical model for the proposed receiver for
IM3 products cancellation.
resulting output can be presented as:
y(t) = 2kT [I (t)cos(ω0t) −Q(t)sin(ω0t)]
= 2kT · Re[s(t)e jω0t]
= kT s(t)e jω0t + kT s̄(t)e− jω0t (4.8a)
Fig. 4.6b shows the mathematical model for receiver with the proposed IM3 cancellation. The
input RF signals are recorded as x(t) proportional to the transmitted signal y(t), so x(t) = kDy(t)
where kD is the coefficient for signal attenuation. The LO signals are represented as LO(t); α
and β are the transconductance gains of the first-order term and third-order term in the LNTA
respectively; λ is the gain provided from the baseband in the MP; γ represents the reconstructed
total gain in the AP for the IM3 products. The output signal from the MP and the AP in the receiver
100
Figure 4.7: Two cases for strong OB interferers; Case 1: two OB interferers located at one side of
the LO signal. Case 2: two OB interferers located at different sides of the LO signal.
can be given as qM (t) and qA(t); kLO is the strength for LO:
qM (t) = βλx3(t)LO(t) (4.9a)
LO(t) = kLO (e jω0t + e− jω0t ) (4.9b)
qA(t) = αγx3(t)LO3(t) (4.9c)
LO3(t) = k3LO (e
j3ω0t + e− j3ω0t + 3e jω0t + 3e− jω0t ) (4.9d)
We analyze two cases for strong OB interferers (Fig. 4.7): the two modulated singals are on
the same side of the LO frequency in case 1, and on different sides in case 2. In both cases, two
modulated interferers are injected without the wanted signal. In case 1, the carrier frequencies for
the two interferers are ω0 +∆ω and ω0 + 2∆ω, the corresponding input signals x1(t) and x2(t) are:
x1(t) = kT kD[s1(t)e j (ω0+∆ω)t + s̄1(t)e− j (ω0+∆ω)t] (4.10a)
x2(t) = kT kD[s2(t)e j (ω0+2∆ω)t + s̄2(t)e− j (ω0+2∆ω)t] (4.10b)
101
and
x(t) = x1(t) + x2(t)
= kT kD[s1(t)e j (ω0+∆ω)t + s̄1(t)e− j (ω0+∆ω)t + s2(t)e j (ω0+2∆ω)t + s̄2(t)e− j (ω0+2∆ω)t] (4.11)
Using (4.12), the terms in the expansion of x3(t) appear at the following center frequen-
cies: [ω0; −ω0; ω0 + ∆ω; −ω0 − ∆ω; ω0 + 2∆ω; −ω0 − 2∆ω; ω0 + 3∆ω; −ω0 − 3∆ω; 3ω0 + 3∆ω;
−3ω0 − 3∆ω, 3ω0 + 4∆ω; −3ω0 − 4∆ω; 3ω0 + 5∆ω; −3ω0 − 5∆ω; 3ω0 + 6∆ω; −3ω0 − 6∆ω].
Due to the relationship ω0 >> ∆ω >> BB BW (transmitted signals), the output signal through
the low-pass filter in the MP qM (t) and the AP qA(t) respectively are shown in (4.12):
(a + b + c + d)3 = a3 + b3 + c3 + d3 + 3a2b + 3a2c + 3a2d + 3b2a + 3b2c + 3b2d + 3c2a
+3c2b + 3c2d + 3d2a + 3d2b + 3d2c + 6abc + 6abd + 6acd + 6bcd
(4.12)
qM (t) = kT kD βλkLO[s21(t) s̄2(t) + s1(t) s̄
2
2(t)] (4.13a)
qA(t) = 3kT kDαγk3LO[s
2
1(t) s̄2(t) + s1(t) s̄
2
2(t)] (4.13b)
they have a linear relationship and can be cancelled. In case 2, the carrier frequencies for the two
interferers are ω0 + ∆ω and ω0 − 2∆ω, the corresponding input signals x1(t) and x2(t) are:
x1(t) = kT kD[s1(t)e j (ω0+∆ω)t + s̄1(t)e− j (ω0+∆ω)t] (4.14a)
x2(t) = kT kD[s2(t)e j (ω0+2∆ω)t + s̄2(t)e− j (ω0−2∆ω)t] (4.14b)
102
x(t) = x1(t) + x2(t)
= kT kD[s1(t)e j (ω0+∆ω)t + s̄1(t)e− j (ω0+∆ω)t + s2(t)e j (ω0+2∆ω)t + s̄2(t)e− j (ω0−2∆ω)t] (4.15)
Now, the terms in the expansion of x3(t) appear at the following center frequencies: [3ω0;
−3ω0;ω0 + ∆ω;−ω0 − ∆ω;ω0 − 2∆ω;−ω0 + 2∆ω;ω0 + 4∆ω;−ω0 − 4∆ω;ω0 − 5∆ω;−ω0 + 5∆ω;
3ω0 + 3∆ω; −3ω0 − 3∆ω; 3ω0 − 4∆ω; −3ω0 + 4∆ω; 3ω0 − 6∆ω; −3ω0 + 6∆ω]. Similarly, the
output signals through the low-pass filter in the MP qM (t) and the AP qM (t) respectively are:
qM (t) = 0 (4.16a)
qA(t) = 3kT kDαγk3LO[s
2
1(t)s2(t) + s̄1(t) s̄
2
2(t)] (4.16b)
In order to make y · qA(t) = qM (t) = 0, the coefficient y can be set to 0. In conclusion, for these
two scenarios with two strong interferers, the distortions in the MP and the reconstructed distor-
tions in the AP are correlated in our proposed architecture, and they can be cancelled completely.
4.2.5 NF after IM3 product cancellation
The NF increases after IM3 cancellation. Since the IM3 products from the MP and AP should
be equalized, the noise after IM3 cancellation is determined by output power of IM3 products and





where PI M3_M , PI M3_A, Pn_M , Pn_A are the in-band IM3 products and in-band output noise power
in the MP and AP respectively, and h = PI M3_A/PI M3_M . To minimize the noise penalty, we need
a small Pn_A, and a large h. Pn_A is determined by the noise performance of the cubing circuits
specifically the last stage in the AP. The noise from the current buffer is significantly attenuated
by the cubing circuits. Therefore, the more power is dissipated at the cubing circuits, the smaller
103
Figure 4.8: The schematic of the proposed system including chip, PCBs, and DSP.
Pn_A can we get. In this work, h can be changed with the variable resistor in Fig.4.5e. However, the
cubic term can be compressed in the cubing circuits for too large h. In our design, Pn_M is about
2 dB higher than Pn_A and h is about 2 according to simulation. As a result, the NF impact is about
1 dB.
4.3 Implementation of An FTNC Receiver with IM3 Cancellation
4.3.1 Schematic of the proposed receiver
A detailed diagram of the system (Fig. 4.8) shows the chip schematic, the printed-circuit board
(PCB) for baseband amplification and A/D conversion, and the DSP (implemented with MATLAB
code). An FTNC receiver is realized with single-ended common-source (CS) and common-gate
(CG) LNTAs. Current-reuse cascode cells in the both LNTAs maintain high output impedances.
The CG LNTA provides 50Ω input impedance for the receiver, while the high gm of the CS LNTA,
104
which is 160 mS, reduces the NF. The LNTAs drive 4-phase current-mode passive mixers clocked
by 25%-duty-cycle non-overlapping clock signals for quadrature down-conversion. The desired
signals in the MP are filtered and amplified by fully-differential TIAs. In the CG path (CGP),
the OB currents are shunted to ground with capacitors at the input of the TIAs. In the CS path
(CSP), the proposed low-input-impedance current buffers are connected to the shunt capacitors to
collect the OB interferer currents for the AP. The outputs of the current buffers are buffered to the
HP_AU X output to provide wideband spectral awareness. High speed ADCs can be employed
to precisely locate those interferers, or the total OB interferer power can be detected with a sub-
sampling technique. For cancellation, the extracted interferers are processed with cubing circuits
and amplified with TIAs in the AP.
On the baseband PCB (PCB_BB), discrete components including buffers (BUF), variable gain
amplifiers (VGA), and voltage combiners (Comb) are built for each output baseband signal from
the chip. The VGA gain is controlled with a digital to analog converter (DAC) through a micro-
controller unit (MCU). The oscilloscopes act as analog to digital converters (ADC) in this system
to sample the six output baseband signals. Noise cancellation for FTNC and IM3 product can-
cellation is executed in the digital domain in Matlab code after the sampling. An additional PCB
(PCB_Combiner) has been built to provide analog noise-cancellation verification.
4.3.2 Circuit implementation of the building blocks
Fig. 4.9 shows the schematics of important building blocks, including CS LNTA, CG LNTA,
cubing circuits and TIA. For the design of the CS LNTA, a cascoded configuration is used to boost
the output impedance of the LNTAs to enhance power efficiency. A single-ended RF signal is in-
jected into both the NMOS and PMOS of the CS LNTA after two on-chip decoupling capacitors;
Vbp, Vbn1 and Vbn2 provide DC bias for the cascode transistors; the output voltage passes through
an RC low-pass filter and is compared with the reference voltage which serves as a common mode
feedback (CMFB) loop to provide the DC operating point for the next stage; Clnta and Rlnta are
introduced as Miller compensation and strengthen the stability of the CMFB loop. A similar struc-
105
Figure 4.9: The schematic of CS LNTA, CG LNTA, cubing circuits and TIA.
ture is employed for the CG LNTA, but the input RF signal drives the sources of both NMOS and
PMOS. In addition, the transistors with CS connection are designed to help locate the best noise
cancellation point for analog noise-cancellation verification.
The cubing stage is similar to the design in [105], and it can suppress the linear term of the
input signal with respect to the IM3 products. The schematic of cubing circuits includes: squaring
circuits, balun, amplifier, and double balanced mixer as a multiplier. In the first stage, the squar-
ing circuits obtain the second-order term of the differential input signals S0+ S0− at output S1d
while maintaining the first-order term within the differential NMOS pairs. The squaring operation
performed by a MOS squaring circuit can avoid the generation of higher-order intermodulation
products. Next, an active balun is used to transfer the single-ended squared signals into differential
signals. To provide the common-mode rejection for the balun and to reduce the mismatch, dummy
squaring circuits are employed in the first stage, and its output S1c is added into the negative input
terminal of the balun. The differential voltages S2+ S2− are then amplified in the third stage. In
the last stage, the double balanced mixer mixes the input squared term S3+ S3− with S0+ S0−,
resulting in the differential cubing term of S0+ S0− at output S4+ S4−. CMFB is used to provide
the DC operating point for subsequent TIAs.
106
Figure 4.10: The block diagram for the DSP in Matlab.
The TIAs consist of active, fully differential op-amps along with feedback resistors and capac-
itors. The value of the resistors are directly related to the conversion gain of the receiver. In this
design, about 3 kΩ and 12 kΩ are used for CSP and CGP respectively to achieve nearly 40 dB
of conversion gain in both paths. Then the capacitors are selected for a baseband bandwidth of
10 MHz. All the resistors and capacitors are programmable to overcome PVT variations. The two-
stage op-amp implemented in the MP has 52 dB open loop gain, 320 MHz gain-bandwidth product
(GBW), and 4.5 mW power consumption, while an one-stage op-amp is used in the AP with a
2.4 mW power consumption.
4.3.3 Implementation of the DSP
The 12 baseband signals coming from the chip (CSP, CGP and AP with each differential I
and Q) are amplified with VGAs and converted to 6 single-ended signals (CS_I, CS_Q, CG_I,
CG_Q, AU X_I and AU X_Q) on the PCB, and are processed in Matlab after being sampled by
107
the oscilloscopes. After the I/Q correction block, which is designed to compensate for the gain
and phase mismatch in the AP, the block diagram of the DSP (Fig. 4.10) can be divided into two
sections: noise cancellation and IM3 product cancellation. For noise cancellation, we need to find
coefficients gni, gnq, pni and pnq to make that MP_I and MP_Q each have the best SNR:
MP_I = CS_I + gni (pni · CG_I +
√
1 − p2ni · CG_Q) (4.18a)
MP_Q = CS_Q + gnq(pnq · CG_I +
√
1 − p2nq · CG_Q) (4.18b)
We search these coefficients among the feasible solutions and calculate the SNR of the output
signal MP_I and MP_Q after FFT at each point. For IM3 cancellation, similarly, the output signal
OUT_I and OUT_Q can be represented as:
OUT_I = MP_I + gdi (pdi · AP_I +
√
1 − p2di · AP_Q), (4.19a)
OUT_Q = MP_Q + gdq(pdq · AP_I +
√
1 − p2dq · AP_Q) (4.19b)
We search gdi, gdq, pdi and pdq among the feasible solutions to find the minimal output power for
OUT_I and OUT_Q. The DSP can be implemented with just multipliers and adders.
The group delay difference between the MP and AP limits the IM3 product cancellation espe-
cially for the wideband interferers. In this work, a third-order all pass IIR fractional group delay
filter (FGDF) [112] is employed to compensate group delay. Only six registers, six multipliers, and
four adders are used in this filter.
4.4 Experimental Results
A photo of the complete measurement setup is shown in Fig. 4.11. In our measurements, an
off-chip 180 degree hybrid drives the differential LO inputs. The prototype chip was fabricated in
65 nm CMOS GP (Fig. 4.12a). The total size including pads is 1.5 mm× 1 mm, and the active area
is 0.84 mm2. Thirty percent of the active chip area is occupied by the MIM capacitor CG for the
108





















Figure 4.12: (a) Die photo; (b) measured and simulated S11; (c) measured conversion gain of the
MP (from node 1 to node 3 in Fig. 3) and AP (from node 1 to node 4 in Fig. 3) ; (d) measured
IIP3 and B1dB vs. blocker offset frequency; (e) measured NF vs. BB offset frequency at 1 GHz
LO frequency; (f) measured IIP3 and B1dB vs. LO frequency.
109
Figure 4.13: The measured LO leakage power vs. LO frequency.
CSP, CGP, and AP. The availability of higher capacitance density will help reduce the chip size.
The chip is packaged in a 48-pin QFN and mounted on an FR-4 PCB.
4.4.1 Performance without cancellation
A good wideband matching has been achieved for the measured S11 (Fig. 4.12b). The operation
range of the receiver is tested from 0.5 GHz to 2.5 GHz. The conversion gain of the receiver front
end and the BB BW is set to about 40 dB and about 10 MHz respectively.
Fig. 4.12c shows the measured conversion gains of the MP (from node 1 to node 3 in Fig. 4.3)
and AP (from node 1 to node 4 in Fig. 4.3). OB signals ranging from 21.7 MHz to 320 MHz offsets
from LO frequency are captured from the HP_AU X outputs.
Fig. 4.12d shows the linearity performance of the receiver versus interferer offset frequency
∆ω without IM3 product cancellation. At a 1 GHz LO frequency, the two interferers are injected
at ωLO +∆ω and ωLO + 2∆ω− 3 MHz. The OB-B1dB at 100 MHz and 240 MHz offset frequency
are −9.5 dBm and −6.5 dBm, while the OB-IIP3 at 100 MHz and 240 MHz offset frequency are
2 dBm and 5 dBm.
Fig. 4.12e illustrates the NF versus BB offset frequency of the receiver after noise cancellation,
with 3.3 dB NF at 1 GHz LO frequencies. Fig. 4.12f depicts the linearity and NF versus various LO
frequency. The NF varies from 3.2 dB to 5.3 dB, while the OB B1dB and IIP3 is between −10 dBm
110
and −5.5 dBm, and 2.3 dBm and 6 dBm respectively. Both NF and linearity increase at higher LO
frequency. At 1 GHz, the power consumption of LNTAs, LOs, and baseband circuits are 11.5 mW ,
20.4 mW and 4.1 mW , resulting in total power consumption of the receiver of 36 mW from a
1.2 V supply, and an extra 34 mW when the AP is turned on. The measured OB-IIP2 is 54 dBm.
For blockers larger than −40 dBm, the IM3 products are much larger than the IM2 products.
The measured LO leakage power vs. LO frequency is shown in Fig. 4.13, and it remains the
same before and after IM3 cancellation. The curve follows the changes in S11 in Fig. 4.12b. The
LO leakage power is −92 dBm and −88 dBm at 1 GHz and 2 GHz LO frequency respectively.
4.4.2 Cancellation of IM3 of two-tone interferers
At a 1 GHz LO frequency, two sine-wave OB interferers located at 1.1 GHz and 1.197 GHz
are injected into the input of the receiver. The IM3 products is placed at 3 MHz in BB. In a two-
tone cancellation scenario, all output BB signals from the CSP, CGP, and AP are sampled by the
oscilloscope at the rate of 40.96 MSps. Both noise cancellation and IM3 product cancellation are
processed in Matlab. Fig. 4.14a shows the measured spectrum with 2048-points FFT on waveform
snapshots of 50 us (averaged from 10 sets of measurements) giving a resolution bandwidth (RBW)
of 20 kHz. Because of the spectral leakage in the FFT calculation, 60 kHz RBW is used for an
accurate power calculation estimation in the measurements.
In Fig. 4.14a, the two lines at the bottom show the measured input-referred noise floor (per
60 kHz) before and after IM3 product cancellation. As the input power of the interferer increases
up to −14 dBm, the noise floor before IM3 cancellation (corresponding to blocker NF) remains
almost the same, while the noise floor after IM3 cancellation grows as the slope of 1 dB/dB when
input power is large enough. This increase is caused by the mixing between interferer and noise in
the AP. The two upper lines in the figure indicate the power of the IM3 products before and after
cancellation, showing that the IM3 products can be cancelled almost down to the noise floor. At
−14 dBm power input, the IM3 products are cancelled by 56.4 dB and the effective IIP3 is as high



















floor Noise floor 





















limit due to 
aux. path
(b)
Figure 4.14: Measured two-tone cancellation including the IM3 products and noise before/after
cancellation: (a) input interferers located at 1.1 GHz and 1.197 GHz with 1 GHz LO frequency,
(b) input interferers located at 2.1 GHz and 2.197 GHz with 2 GHz LO frequency.
LO frequency, we obtain similar results (Fig. 4.14b) with IM3 product cancellation of 55 dB and
effective OB IIP3 of 31.2 dBm.
112
-35 -30 -25 -20 -15 -10



























Figure 4.15: Measured two-tone cancellation with 1 GHz LO frequency (a) IM3 product
cancellation with recalibration (calibration at each input power) and without recalibration
(calibration once at single input power −15 dBm); (b) IM3 products and noise before/after
cancellation vs. offset frequency of interferers at −15 dBm input power.
The cancellation shown in Fig. 4.14 is recalibrated in the DSP at each input power. In Fig. 4.15a,
the two-tone cancellation without recalibration (i.e. calibration once at single input power of
−15 dBm) is compared the two-tone cancellation with recalibration (i.e. calibration at each input
power). Because of the impact of higher order (such as 5th and 7th) intermodulation distortions,
the best cancellation coefficients (mainly pdi and pdq for phase shifting) vary with the input power
level. The mathematical explanation of the impact and its general soltution are presented in [113].
Fig. 4.15b shows two tone cancellation versus interferer offset frequency at −15 dBm input power
with 1 GHz LO frequency, and we demonstrate that cancellation works well across a wide LO
range and offset frequency.
4.4.3 Cancellation of IM3 of modulated interferers
The demonstration of cancellation for a single-tone interferer and one modulated interferer
is shown in Fig. 4.16a. For an LO frequency of 1 GHz, one −15 dBm 1.1 GHz sine-wave OB
interferer and −15 dBm 1.2 GHz 10 MSps QPSK modulated interferer are received. Output BB
113







































































Figure 4.16: Measured single-tone interferer and one 10 MHz QPSK modulated interferer
cancellation at −15 dBm input power for each interferer: (a) single-tone interferer located at
1.1 GHz and the center frequency of modulated interferer located at 1.2 GHz with 1 GHz LO
frequency; (b) single-tone interferer located at 2.1 GHz and the center frequency of modulated
interferer located at 2.2 GHz with 2 GHz LO frequency; (c) measured cancellation at 1 GHz LO
frequency and calculated cancellation with constant group delay (GD); (d) measured cancellation
at 1 GHz LO frequency vs. interferer offset frequency.
complex signals before and after IM3 cancellation are shown in this plot, resulting in 23 dB IM3
products cancellation on average across 5 MHz BB BW. Fig. 4.16b depicts cancellation results
when one −15 dBm 2.1 GHz sine-wave tone and one −15 dBm 2.2 GHz 10 MSps QPSK modulated
signal are presented with a 2 GHz LO frequency. An IM3 product cancellation of 21.5 dB is
114
demonstrated.
Noise floor is an important constraint for cancellation. Since IM3 product cancellation of 56 dB
is achieved with 60 kHz noise BW at −15 dBm input power, the maximum cancellation for 10 MHz
noise BW can be calculated as: 56 dB − 10 · log(10 M/60 k) = 33.8 dB. Group delay differences
between the MP and AP are another crucial factor limiting the cancellation. In this measurement,
we use 40.96 MHz sampling rate, which offers 1/(2 · 40.96 MHz) = 12.21 ns group delay (GD)
difference at most in theory. However, in reality, the group delay difference is not a constant value,
and gain difference is also frequency dependent, which makes the exact model is hard to predict.
Fig. 4.16c compares the theoretical calculated results with measured results for IM3 cancellation
of modulated interferers. Also, it shows the achievable cancellation versus IF frequency, and il-
lustrates the impact of group delay. In this figure, the input signals are one −15 dBm 1.1 GHz
sine-wave OB interferer and −15 dBm 1.2 GHz 10 MSps QPSK modulated interferer at 1 GHz
LO. Next, we plot four curves under different assumptions: if our receiver is ideal and there is no
group delay between the MP and AP, the cancellation is 33.8 dB and is limited by the noise floor
(diamond markers); if we deliberately add 2 ns of constant group delay between the MP and AP,
the maximum average cancellation reduces to 28.2 dB (square markers); if we deliberately add
12.2 ns of constant group delay, the average cancellation becomes 15.2 dB (circle markers); the
measured cancellation from Fig. 4.16a (no markers) results in 23 dB average cancellation.
Fig. 4.16d shows the modulated interferer IM3 product cancellation versus interferer offset fre-
quency at −15 dBm input power with 1 GHz LO frequency. At 50 MHz interferer offset frequency,
reduction of cancellation is a result of the increase of the noise floor after IM3 cancellation.
For the case of two modulated interferers, cancellation measurements are shown in Fig. 4.17.
The measurements are executed with 1 GHz and 2 GHz LO frequency. two −15 dBm 5 MSps
QPSK interferer located at 1.1 GHz, 1.2 GHz and 2.1 GHz, and 2.2 GHz respectively are applied.
The measurements show 18.8 dB and 17.3 dB cancellation. Calculated from the IM3 equation,
the cancellation is 1.76 dB less than that of one tone and 10 MHz BW modulated signal case
theoretically.
115










































































Figure 4.17: Measured two 5 MHz QPSK modulated interferers cancellation: (a) the center
frequencies of these two interferers are located at 1.1 GHz and 1.2 GHz with 1 GHz LO
frequency; (b) the center frequencies of these two interferers are located at 2.1 GHz and 2.2 GHz
with 2 GHz LO frequency.
4.4.4 Discussion and comparison to the state of the art
A performance summary of our cancellation technique and comparisons with the state-of-the-
art approaches are shown in Table 4.1. Though other solutions (mixer-first receivers, band-pass
filtering technique, FTNC receivers and slective LNA receiver front-end) can achieve high linear-
ity and low noise figure, they all suffer from LO leakage issue. Earlier IM3 cancellation approaches
can eliminate LO leakage issue but achieve only limited OB linearity, BB bandwidth and tuning
range. Our approach combines FTNC and IM3 cancellation, and demonstrates both wide tuning
range, low noise figure, good LO isolation and high OB linearity with >5 MHz baseband band-
width.
4.5 Conclusions
A cancellation technique has been presented to cancel the IM3 distortion from the RF front end
in an LNTA-based wideband receiver. The technique only requires a baseband auxiliary path. The
technique has been demonstrated in a 0.5-2.5 GHz FTNC receiver in 65 nm COMS that achieves
116
Table 4.1: Comparison to the state-of-the-art high linearity receivers.

































FTNC + IM3 
Cancellation
Technology(nm) 65 28 65 28 65 65 45 65 40 180 32 130
Supply (V) 1.2/2.5 1.2 1.1/1.25 1/1.5 1.2 1.2 1.2 1.2 1.2 1.8 2 1.2/2.7
RF Range(GHz) 0.1 ~ 2.4 0.1~2.0 0.1~0.6 0.4~3.5 0.7~3.8 0.1~1.5 0.2~8 0.1~1.2 0.08~2.7 0.1~1.8 0.4~6 N/A
BB BW (MHz) 10 6.5 2 15~50 10 2 10 4 1 1 7.5 1.92
Conv. Gain (dB) 40~70 16 35 35 40 38 21 25 40~70 48 12 30.5





OB B1dB (dBm) 10 13 15 4.6 -6 -6 12 7 N/A N/A 17 N/A




OB-IIP2 (dBm) 56 90 N/A 64 65 50 88 N/A 54 70 N/A 58
LO leakage (dBm) N/A N/A -60 -62
-79 @1G
-71 @2G
N/A -65 -64 -65 N/A -52 N/A
Power (mW) 37-70 38~96 34~80 38~75 27.4~75.4 11 80 38~75 35.1~78 24~37 81-109 34.7 36 36 + 34














40 dB conversion gain, 8.8 MHz BB BW, 3.3 dB NF, −6.5 dBm OB-B1dB and 5 dBm OB-IIP3
before IM3 cancellation at 1 GHz LO. When cancellation is turned on and −14 dBm two-tone input
interferers are present at the input, the IM3 products are cancelled by 56.4 dB and the effective OB-
IIP3 is 32.5 dBm with 34 mW extra power consumption. For two −15 dBm modulated interferers,
18.8 dB of IM3 cancellation has been demonstrated over 10 MHz. The LO leakage of the receiver
is < −84 dBm.
117
Chapter 5: Conclusions and Future Work
In the final chapter of this thesis, the scope and depth of the research work is revisited and
summarized in Section 5.1. Suggestions for future research are outlined in Section 5.2.
5.1 Conclusions
This thesis discusses the design and theory of power-efficient optical transceivers with System-
in-Package (SiP) 2.5D integration for data centers. Our design is implemented in 28 nm CMOS
technology and integrated with the custom photonic integrated circuits and a silicon interposer.
This project is the first 28 nm CMOS tapeout in Columbia University and requires unique packag-
ing and integration. With the limited resources, we overcome difficulties in modeling and layout
and develop our own solutions for digital circuit design, ESD, and I/O pads. We propose an in-
novative transimpedance amplifier (TIA) and a single-ended to differential (S2D) converter for a
low-voltage high-sensitivity receiver; a low-power transmitter with a four-to-one serializer, pro-
grammable output drivers, AC coupling units, and custom pads; a forward clocking scheme with
an improved quadrature locked loop (QLL). Besides, we analyze the inverter-based shunt-feedback
TIA and derive an explicit trade-off among sensitivity, data rate, and power consumption for op-
tical link designers. We also study clock data recovery (CDR) based clocking schemes for optical
links and propose a power-efficient solution based on an injection-locked phase rotator.
Next, we proposed a modified frequency-domain analysis for injection-locked ring oscillators
(ILROs) based quadrature clock generators (QCGs). We capture the imbalances in amplitude and
phase caused by asymmetrical or partial injection locking. The proposed analysis shows the design
tradeoffs in the QCG phase correction loop for multi-lane wireline or optical transceivers.
This thesis also discusses the design of high-linearity wireless wideband receivers. We review
118
their applications on software-defined radio (SDR) and TV white space (TVWS) communications,
investigate the state-of-the-art high-linearity solutions, and propose an out-of-band (OB) IM3 can-
cellation technique using a baseband auxiliary path for wideband LNTA-Based receivers. The pro-
posed approach is implemented using a 65 nm CMOS frequency-translational noise-cancelling
(FTNC) receiver, and it achieves high OB linearity, low noise figure, and good LO isolation.
In summary, this thesis offers five original contributions:
• Design and implementation an power-efficient optical transceiver in 28 nm CMOS with
2.5D integration.
• Analysis of the inverter-based shunt-feedback TIA and derviation of the trade-off among
sensitivity, data rate, and power consumption for optical link designers.
• Idea of a power-efficient solution based on an injection-locked phase rotator for optical
transceivers.
• Frequency-domain analysis of ILROs for quadrature clock generation.
• Design and implementation an OB IM3 cancellation technique for wireless wideband re-
ceivers.
5.2 Future Work
Though the rapid development of optical transceivers, photonic devices, and heterogeneous in-
tegration in the last decade, commercial products, including the wavelength-division-multiplexing
(WDM) silicon photonic links, are still rare in the market. Since the monolithic solution [16] suf-
fers from the limitation of the technology nodes (45 nm and 32 nm), high waveguide loss, low
photodiode responsivity, and low photodiode bandwidth, the implementation in this thesis, which
connects the separated electrical transceivers and photonic devices through the silicon interposer,
is still worth to invest. Circuit designers should work closely with optical link designers and pack-
aging designers to explore the best architecture for the WDM silicon photonic links.
119
Based on the proposed analysis on ILROs, my colleagues and I developed a high-performance
multiple-phase clock generator [71] in 65 nm CMOS technology when this thesis was written. The
proposed architecture achieves the best tradeoff between phase accuracy, power consumption, and
jitter performance. However, the phase interpolator (PI) performance is still worse than expected.
The overall clocking scheme performance can be improved by optimizing PI structure and imple-
mentation.
Although the cancellation technique shown in this thesis achieves a high OB linearity, its lin-
earity in the adjacent channels is limited by the baseband filter in wideband LNTA-Based receivers.
Its performance can be improved if we can combine the linear-periodically time-varying (LPTV)
circuits in the receiver front-end [114] [115]. Using a recent concept of Filtering by Aliasing (FA)
[116], a sharp filter can be achieved for the cancellation in the adjacent channels.
120
References
[1] Q. Cheng, M. Bahadori, M. Glick, S. Rumley, and K. Bergman, “Recent advances in op-
tical technologies for data centers: A review”, Optica, vol. 5, no. 11, pp. 1354–1370, Nov.
2018.
[2] N. Abrams, Q. Cheng, M. Glick, M. A. Jezzini, P. E. Morrissey, P. O’brien, and K. Bergman,
“Silicon photonic 2.5D multi-chip module transceiver for high-performance data centers”,
Journal of Lightwave Technology, vol. 38, no. 13, pp. 3346–3357, Jul. 2020.
[3] Y. Shen, X. Meng, Q. Cheng, S. Rumley, N. Abrams, A. Gazman, E. Manzhosov, M. S.
Glick, and K. Bergman, “Silicon photonics for extreme scale systems”, Journal of light-
wave technology, vol. 37, no. 2, pp. 245–259, Jan. 2019.
[4] C. Sun, M. T. Wade, Y. Lee, J. S. Orcutt, L. Alloatti, M. S. Georgas, A. S. Waterman, J. M.
Shainline, R. R. Avizienis, B. R. M. Sen Lin, R. Kumar, F. Pavanello, A. H. Atabaki, H. M.
Cook, A. J. Ou, J. C. Leu, Y.-H. Chen, K. Asanović, R. J. Ram, M. A. Popovicé, and V. M.
Stojanovicé, “Single-chip microprocessor that communicates directly using light”, Nature,
vol. 528, no. 7583, pp. 534–538, 2015.
[5] K. Yu, C. Li, H. Li, A. Titriku, A. Shafik, B. Wang, Z. Wang, R. Bai, C. Chen, M.
Fiorentino, P. Y. Chiang, and S. Palermo, “A 25 Gb/s hybrid-integrated silicon photonic
source-synchronous receiver with microring wavelength stabilization”, IEEE Journal of
Solid-State Circuits, vol. 51, no. 9, pp. 2129–2141, Sep. 2016.
[6] H. Li, Z. Xuan, A. Titriku, C. Li, K. Yu, B. Wang, A. Shafik, N. Qi, Y. Liu, R. Ding, T.
Baehr-Jones, M. Fiorentino, M. Hochberg, S. Palermo, and P. Y. Chiang, “A 25 Gb/s, 4.4 V-
swing, ac-coupled ring modulator-based WDM transmitter with wavelength stabilization
in 65nm CMOS”, IEEE Journal of Solid-State Circuits, vol. 50, no. 12, pp. 3145–3159,
Dec. 2015.
[7] S. Saeedi, S. Menezo, G. Pares, and A. Emami, “A 25 Gb/s 3D-Integrated CMOS/Silicon-
Photonic Receiver for Low-Power High-Sensitivity Optical Communication”, Journal of
Lightwave Technology, vol. 34, no. 12, pp. 2924–2933, Jun. 2016.
[8] M. Rakowski, Y. Ban, P. De Heyn, N. Pantano, B. Snyder, S. Balakrishnan, S. Van Huylen-
broeck, L. Bogaerts, C. Demeurisse, F. Inoue, K. J. Rebibis, P. Nolmans, X. Sun, P. Bex,
A. Srinivasan, J. De Coster, S. Lardenois, A. Miller, P. Absil, P. Verheyen, D. Velenis, M.
Pantouvaki, and J. Van Campenhout, “Hybrid 14 nm FinFET - silicon photonics technol-
ogy for low-power Tb/s/mm2 optical I/O”, in 2018 IEEE Symposium on VLSI Technology,
Jun. 2018, pp. 221–222.
121
[9] F. Y. Liu, D. Patil, J. Lexau, P. Amberg, M. Dayringer, J. Gainsley, H. F. Moghadam, X.
Zheng, J. E. Cunningham, A. V. Krishnamoorthy, E. Alon, and R. Ho, “10-Gbps, 5.3-mW
optical transmitter and receiver circuits in 40-nm CMOS”, IEEE Journal of Solid-State
Circuits, vol. 47, no. 9, pp. 2049–2067, Sep. 2012.
[10] Y. Chen, M. Kibune, A. Toda, A. Hayakawa, T. Akiyama, S. Sekiguchi, H. Ebe, N. Imaizumi,
T. Akahoshi, S. Akiyama, S. Tanaka, T. Simoyama, K. Morito, T. Yamamoto, T. Mori, Y.
Koyanagi, and H. Tamura, “A 25 Gb/s hybrid integrated silicon photonic transceiver in
28 nm CMOS and SOI”, in 2015 IEEE International Solid-State Circuits Conference—
(ISSCC) Digest of Technical Papers, Feb. 2015, pp. 1–3.
[11] T. Aoki, S. Sekiguchi, T. Simoyama, S. Tanaka, M. Nishizawa, N. Hatori, Y. Sobu, A.
Sugama, T. Akiyama, A. Hayakawa, H. Muranaka, T. Mori, Y. Chen, S. Jeong, Y. Tanaka,
and K. Morito, “Low-crosstalk simultaneous 16-channel × 25 Gb/s operation of high-
density silicon shotonics optical transceiver”, Journal of Lightwave Technology, vol. 36,
no. 5, pp. 1262–1267, Mar. 2018.
[12] IEEE, https://eps.ieee.org/technology/heterogeneous-integration-
roadmap/2019-edition.html.
[13] J. Macri, “AMD’s next generation GPU and high bandwidth memory architecture: FURY”,
in 2015 IEEE Hot Chips 27 Symposium (HCS), Aug. 2015, pp. 1–26.
[14] Intel, INTEL GILEX FPGAs and SoCs, https://www.intel.com/content/
www/us/en/products/programmable/fpga/agilex.html.
[15] nVIDIA, nVIDIA Tesla, https://www.nvidia.com/en-us/data-center/
tesla/.
[16] M. Wade, “TeraPHY: A chiplet technology for low-power, high-bandwidth in-package op-
tical I/O”, in 2019 IEEE Hot Chips 31 Symposium (HCS), 2019, pp. i–xlviii.
[17] T. Haque, M. Bajor, Y. Zhang, J. Zhu, Z. Jacobs, R. Kettlewell, J. Wright, and P. R. Kinget,
“A direct rf-to-information converter for reception and wideband interferer detection em-
ploying pseudo-random lo modulation”, in 2017 IEEE Radio Frequency Integrated Cir-
cuits Symposium (RFIC), Jun. 2017, pp. 268–271.
[18] T. Haque, M. Bajor, Y. Zhang, J. Zhu, Z. A. Jacobs, R. B. Kettlewell, J. Wright, and
P. R. Kinget, “A reconfigurable architecture using a flexible lo modulator to unify high-
sensitivity signal reception and compressed-sampling wideband signal detection”, IEEE
Journal of Solid-State Circuits, vol. 53, no. 6, pp. 1577–1591, Jun. 2018.
[19] F. Bruccoleri, E. A. M. Klumperink, and B. Nauta, “Noise cancelling in wideband CMOS
LNAs”, in 2002 IEEE International Solid-State Circuits Conference. Digest of Technical
Papers, vol. 1, Feb. 2002, 406–407 vol.1.
122
[20] F. Bruccoleri, E. A. M. Klumperink, and B. Nauta, “Wide-band CMOS low-noise amplifier
exploiting thermal noise canceling”, IEEE Journal of Solid-State Circuits, vol. 39, no. 2,
pp. 275–282, Feb. 2004.
[21] D. Murphy, H. Darabi, A. Abidi, A. A. Hafez, A. Mirzaei, M. Mikhemar, and M. C. F.
Chang, “A blocker-tolerant, noise-cancelling receiver suitable for wideband wireless ap-
plications”, IEEE Journal of Solid-State Circuits, vol. 47, no. 12, pp. 2943–2963, Dec.
2012.
[22] Y. Xu and P. R. Kinget, “A switched-capacitor RF front end with embedded programmable
high-order filtering”, IEEE Journal of Solid-State Circuits, vol. 51, no. 5, pp. 1154–1167,
May 2016.
[23] Y. Lien, E. Klumperink, B. Tenbroek, J. Strange, and B. Nauta, “A high-linearity CMOS
receiver achieving +44dBm IIP3 and +13dBm B1dB for SAW-less LTE radio”, in 2017
IEEE International Solid-State Circuits Conference. Digest of Technical Papers, Feb. 2017,
pp. 412–413.
[24] Y. Lien, E. Klumperink, B. Tenbroek, J. Strange, and B. Nauta, “A mixer-first receiver with
enhanced selectivity by capacitive positive feedback achieving +39dBm IIP3 and <3dB
noise figure for SAW-less LTE radio”, in 2017 IEEE Radio Frequency Integrated Circuits
Symposium (RFIC), Jun. 2017, pp. 280–283.
[25] M. Bahadori, S. Rumley, R. Polster, A. Gazman, M. Traverso, M. Webster, K. Patel, and
K. Bergman, “Energy-performance optimized design of silicon photonic interconnection
networks for high-performance computing”, in Design, Automation Test in Europe Confer-
ence Exhibition, 2017, Mar. 2017, pp. 326–331.
[26] E. Sackinger, Analysis and design of transimpedance amplifiers for optical receivers. Hobo-
ken, New Jersey: Wiley, 2018.
[27] B. Razavi, Design of integrated circuits for optical communications. Hoboken, New Jersey:
Wiley, 2012.
[28] A. Sharif-Bakhtiar and A. Chan Carusone, “A 20 Gb/s CMOS optical receiver with limited-
bandwidth front end and local feedback IIR-DFE”, IEEE Journal of Solid-State Circuits,
vol. 51, no. 11, pp. 2679–2689, Nov. 2016.
[29] M. H. Nazari and A. Emami-Neyestanak, “A 24-Gb/s double-sampling receiver for ultra-
low-power optical communication”, IEEE Journal of Solid-State Circuits, vol. 48, no. 2,
pp. 344–357, Feb. 2013.
[30] K. R. Lakshmikumar, A. Kurylak, M. Nagaraju, R. Booth, R. K. Nandwana, J. Pampanin,
and V. Boccuzzi, “A process and temperature insensitive CMOS linear TIA for 100 Gb/s/
123
λ PAM-4 optical links”, IEEE Journal of Solid-State Circuits, vol. 54, no. 11, pp. 3180–
3190, Nov. 2019.
[31] M. G. Ahmed, M. Talegaonkar, A. Elkholy, G. Shu, A. Elmallah, A. Rylyakov, and P. K.
Hanumolu, “A 12-Gb/s -16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm
CMOS”, Ieee journal of solid-state circuits, vol. 53, no. 2, pp. 445–457, Feb. 2018.
[32] D. Li, G. Minoia, M. Repossi, D. Baldi, E. Temporiti, A. Mazzanti, and F. Svelto, “A low-
noise design technique for high-speed CMOS optical receivers”, IEEE Journal of Solid-
State Circuits, vol. 49, no. 6, pp. 1437–1447, Jun. 2014.
[33] I. Ozkaya, A. Cevrero, P. A. Francese, C. Menolfi, T. Morf, M. Brändli, D. M. Kuchta, L.
Kull, C. W. Baks, J. E. Proesel, M. Kossel, D. Luu, B. G. Lee, F. E. Doany, M. Meghelli,
Y. Leblebici, and T. Toifl, “A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm
CMOS FinFET”, IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3458–3473,
Dec. 2017.
[34] A. Sharif-Bakhtiar, M. G. Lee, and A. C. Carusone, “Low-power CMOS receivers for short
reach optical communication”, in 2017 ieee custom integrated circuits conference (cicc),
Apr. 2017, pp. 1–8.
[35] D. Abdelrahman and G. E. R. Cowan, “Noise analysis and design considerations for equalizer-
based optical receivers”, Ieee transactions on circuits and systems i: Regular papers, vol.
66, no. 8, pp. 3201–3212, Aug. 2019.
[36] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, “A double-tail latch-
type voltage sense amplifier with 18ps setup+hold time”, in 2007 IEEE International Solid-
State Circuits Conference. Digest of Technical Papers, Feb. 2007, pp. 314–605.
[37] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, Wenyan Jia, James Kar-Shing Chiu, and M.
Ming-Tak Leung, “Improved sense-amplifier-based flip-flop: Design and measurements”,
IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 876–884, Jun. 2000.
[38] M. Raj, S. Saeedi, and A. Emami, “A wideband injection locked quadrature clock gener-
ation and distribution technique for an energy-proportional 16-32 Gb/s optical receiver in
28nm FDSOI CMOS”, IEEE Journal of Solid-State Circuits, vol. 51, no. 10, pp. 2446–
2462, Oct. 2016.
[39] A. H. Atabaki, S. Moazeni, F. Pavanello, H. Gevorgyan, J. Notaros, L. Alloatti, M. t. Wade,
C. Sun, S. A. Kruger, H. Meng, K. A. Qubaisi, I. Wang, B. Zhang, A. Khilo, C. V. Baiocco,
M. A. Popovicé, V. M. Stojanovicé, and R. J. ram and, “Integrating photonics with silicon
nanoelectronics for the next generation of systems on a chip”, Nature, vol. 556, no. 7701,
pp. 349–354, 2018.
124
[40] J. E. Proesel, Z. Toprak-Deniz, A. Cevrero, I. Ozkaya, S. Kim, D. M. Kuchta, S. Lee, S. V.
Rylov, H. Ainspan, T. O. Dickson, J. F. Bulzacchelli, and M. Meghelli, “A 32 Gb/s, 4.7
pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS”, IEEE Journal of
Solid-State Circuits, vol. 53, no. 4, pp. 1214–1226, Apr. 2018.
[41] T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, Y. Lee, and Y. Matsuoka, “A 25-to-
28 Gb/s high-sensitivity (−9.7 dBm) 65 nm CMOS optical receiver for board-to-board in-
terconnects”, IEEE Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2259–2276, Oct.
2014.
[42] S. Saeedi and A. Emami, “A 25 Gb/s 170µW/Gb/s optical receiver in 28 nm CMOS for
chip-to-chip optical communication”, in 2014 IEEE Radio Frequency Integrated Circuits
Symposium, Jun. 2014, pp. 283–286.
[43] E. Sackinger, “On the noise optimum of FET broadband transimpedance amplifiers”, IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 12, pp. 2881–2889,
Dec. 2012.
[44] J. Li, X. Zheng, A. V. Krishnamoorthy, and J. F. Buckwalter, “Scaling trends for picojoule-
per-bit WDM photonic interconnects in CMOS SOI and FinFET processes”, Journal of
Lightwave Technology, vol. 34, no. 11, pp. 2730–2742, Jun. 2016.
[45] A. H. Ahmed, A. Sharkia, B. Casper, S. Mirabbasi, and S. Shekhar, “Silicon-photonics
microring links for datacenters—Challenges and opportunities”, IEEE Journal of Selected
Topics in Quantum Electronics, vol. 22, no. 6, pp. 194–203, Nov. 2016.
[46] K. T. Settaluri, C. Lalau-Keraly, E. Yablonovitch, and V. Stojanović, “First principles op-
timization of opto-electronic communication links”, IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 64, no. 5, pp. 1270–1283, May 2017.
[47] R. Polster, Y. Thonnart, G. Waltener, J. Gonzalez, and E. Cassan, “Efficiency optimization
of silicon photonic links in 65-nm CMOS and 28-nm FDSOI technology nodes”, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 12, pp. 3450–
3459, Dec. 2016.
[48] D. Li, G. Minoia, M. Repossi, D. Baldi, E. Temporiti, A. Mazzanti, and F. Svelto, “A low-
noise design technique for high-speed CMOS optical receivers”, IEEE Journal of Solid-
State Circuits, vol. 49, no. 6, pp. 1437–1447, Jun. 2014.
[49] R. G. Smith and S. D. Personick, “Receiver design for optical fiber communication sys-
tems”, Topics in applied physics: Semiconductor devices for optical communication, no.
39, 1980.
[50] A. A. Abidi, “Gigahertz transresistance amplifiers in fine line NMOS”, Ieee journal of
solid-state circuits, vol. 19, no. 6, pp. 986–994, Dec. 1984.
125
[51] E. Sackinger, Broadband circuits for optical fiber communication. Hoboken, New Jersey:
Wiley, 2005.
[52] D. Johns and K. Martin, Analog integrated circuit design. New York: Wiley, 1997.
[53] E. Sackinger, “The transimpedance limit”, IEEE Transactions on Circuits and Systems I:
Regular Papers, vol. 57, no. 8, pp. 1848–1856, Aug. 2010.
[54] J. J. Morikuni, A. Dharchoudhury, Y. Leblebici, and S. M. Kang, “Improvements to the
standard theory for photoreceiver noise”, Journal of lightwave technology, vol. 12, no. 7,
pp. 1174–1184, Jul. 1994.
[55] P. A. Francese, T. Toifl, P. Buchmann, M. Brändli, C. Menolfi, M. Kossel, T. Morf, L. Kull,
and T. M. Andersen, “A 16 Gb/s 3.7 mW/Gb/s 8-tap DFE receiver and baud-rate CDR
with 31 kppm tracking bandwidth”, IEEE Journal of Solid-State Circuits, vol. 49, no. 11,
pp. 2490–2502, Nov. 2014.
[56] A. Cevrero, I. Ozkaya, P. A. Francese, M. Brandli, C. Menolfi, T. Morf, M. Kossel, L. Kull,
D. Luu, M. Dazzi, and T. Toifl, “A 100Gb/s 1.1pJ/b PAM-4 rx with dual-mode 1-tap PAM-
4 / 3-tap NRZ speculative DFE in 14nm CMOS FinFET”, in 2019 IEEE International
Solid- State Circuits Conference - (ISSCC), Feb. 2019, pp. 112–114.
[57] S. Chen, L. Zhou, I. Zhuang, J. Im, D. Melek, J. Namkoong, M. Raj, J. Shin, Y. Frans, and
K. Chang, “A 4-to-16GHz inverter-based injection-locked quadrature clock generator with
phase interpolators for multi-standard I/Os in 7nm FinFET”, in 2018 IEEE International
Solid - State Circuits Conference - (ISSCC), Feb. 2018, pp. 390–392.
[58] E. Monaco, G. Anzalone, G. Albasini, S. Erba, M. Bassi, and A. Mazzanti, “A 2–11 GHz
7-bit high-linearity phase rotator based on wideband injection-locking multi-phase gener-
ation for high-speed serial links in 28-nm CMOS FDSOI”, IEEE Journal of Solid-State
Circuits, vol. 52, no. 7, pp. 1739–1752, Jul. 2017.
[59] Y. Huang and B. Chen, “An 8b injection-locked phase rotator with dynamic multiphase
injection for 28/56/112Gb/s Serdes application”, in 2019 IEEE International Solid- State
Circuits Conference - (ISSCC), Feb. 2019, pp. 486–488.
[60] P. Kinget, R. Melville, D. Long, and V. Gopinathan, “An injection-locking scheme for pre-
cision quadrature generation”, IEEE Journal of Solid-State Circuits, vol. 37, no. 7, pp. 845–
851, Jul. 2002.
[61] L. Zhang, A. Carpenter, B. Ciftcioglu, A. Garg, M. Huang, and H. Wu, “Injection-locked
clocking: A low-power clock distribution scheme for high-performance microprocessors”,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 9, pp. 1251–
1256, Sep. 2008.
126
[62] K. Hu, T. Jiang, J. Wang, F. O’Mahony, and P. Y. Chiang, “A 0.6 mW/Gb/s, 6.4-7.2 Gb/s
serial link receiver using local injection-locked ring oscillators in 90 nm CMOS”, IEEE
Journal of Solid-State Circuits, vol. 45, no. 4, pp. 899–908, Apr. 2010.
[63] J. Savoj, K. Hsieh, P. Upadhyaya, F. An, J. Im, X. Jiang, J. Kamali, K. W. Lai, D. Wu, E.
Alon, and K. Chang, “Design of high-speed wireline transceivers for backplane commu-
nications in 28nm CMOS”, in Proceedings of the IEEE 2012 Custom Integrated Circuits
Conference, Sep. 2012, pp. 1–4.
[64] S. Kim, H. Ko, S. Cho, J. Lee, S. Shin, M. Choo, H. Chi, and D. Jeong, “A 2.5GHz
injection-locked ADPLL with 197fsrms integrated jitter and -65dBc reference spur using
time-division dual calibration”, in 2017 IEEE International Solid-State Circuits Confer-
ence (ISSCC), Feb. 2017, pp. 494–495.
[65] S. Cho, S. Kim, M. Choo, H. Ko, J. Lee, W. Bae, and D. Jeong, “A 2.5 5.6 GHz subharmon-
ically injection-locked all-digital PLL with dual-edge complementary switched injection”,
IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 9, pp. 2691–
2702, Feb. 2018.
[66] B. Razavi, “A study of injection locking and pulling in oscillators”, IEEE Journal of Solid-
State Circuits, vol. 39, no. 9, pp. 1415–1424, Sep. 2004.
[67] A. Mirzaei, M. E. Heidari, R. Bagheri, and A. A. Abidi, “Multi-phase injection widens lock
range of ring-oscillator-based frequency dividers”, IEEE Journal of Solid-State Circuits,
vol. 43, no. 3, pp. 656–671, Mar. 2008.
[68] A. Mirzaei, M. E. Heidari, R. Bagheri, S. Chehrazi, and A. A. Abidi, “The quadrature lc
oscillator: A complete portrait based on injection locking”, IEEE Journal of Solid-State
Circuits, vol. 42, no. 9, pp. 1916–1932, Sep. 2007.
[69] D. Dunwell and A. C. Carusone, “Modeling oscillator injection locking using the phase
domain response”, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60,
no. 11, pp. 2823–2833, Nov. 2013.
[70] A. Elkholy, M. Talegaonkar, T. Anand, and P. Kumar Hanumolu, “Design and analysis of
low-power high-frequency robust sub-harmonic injection-locked clock multipliers”, IEEE
Journal of Solid-State Circuits, vol. 50, no. 12, pp. 3160–3174, Dec. 2015.
[71] Z. Wang, Y. Zhang, Y. Onizuka, and P. R. Kinget, “A high accuracy, multi-phase injection-
locked, 8-phase, 7GHz clock generator in 65nm with 7bit phase interpolators for high-
speed data links”, in 2021 IEEE International Solid- State Circuits Conference - (ISSCC).
[72] R. Adler, “A study of locking phenomena in oscillators”, Proceedings of the IRE, vol. 34,
no. 6, pp. 351–357, Jun. 1946.
127
[73] J. Chien and L. Lu, “Analysis and design of wideband injection-locked ring oscillators with
multiple-input injection”, IEEE Journal of Solid-State Circuits, vol. 42, no. 9, pp. 1906–
1915, Sep. 2007.
[74] F. Yuan and Y. Zhou, “Frequency-domain study of lock range of non-harmonic oscillators
with multiple multi-tone injections”, IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 60, no. 6, pp. 1395–1406, Jun. 2013.
[75] B. Hong and A. Hajimiri, “A phasor-based analysis of sinusoidal injection locking in LC
and ring oscillators”, IEEE Transactions on Circuits and Systems I: Regular Papers, vol.
66, no. 1, pp. 355–368, Jan. 2019.
[76] G. R. Gangasani and P. R. Kinget, “A time-domain model for predicting the injection lock-
ing bandwidth of nonharmonic oscillators”, IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 53, no. 10, pp. 1035–1038, Oct. 2006.
[77] M. Farazian, P. S. Gudem, and L. E. Larson, “Stability and operation of injection-locked
regenerative frequency dividers”, IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 57, no. 8, pp. 2006–2019, Aug. 2010.
[78] A. Tofangdarzade and A. Jalali, “An efficient method to analyze lock range in ring oscil-
lators with multiple injections”, IEEE Transactions on Circuits and Systems II: Express
Briefs, vol. 62, no. 11, pp. 1013–1017, Nov. 2015.
[79] A. A. Hafez and C. K. Yang, “Analysis and design of superharmonic injection-locked
multipath ring oscillators”, IEEE Transactions on Circuits and Systems I: Regular Papers,
vol. 60, no. 7, pp. 1712–1725, Jul. 2013.
[80] A. Kabbani, D. Al-Khalili, and A. J. Al-Khalili, “Technology-portable analytical model for
DSM CMOS inverter transition-time estimation”, IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 22, no. 9, pp. 1177–1187, Sep. 2003.
[81] A. A. Abidi, “Phase noise and jitter in CMOS ring oscillators”, IEEE Journal of Solid-State
Circuits, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
[82] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical oscillators”, IEEE
Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179–194, Feb. 1998.
[83] P. Bhansali and J. Roychowdhury, “Gen-Adler: The generalized Adler’s equation for in-
jection locking analysis in oscillators”, in 2009 Asia and South Pacific Design Automation
Conference, Jan. 2009, pp. 522–527.
[84] P. Maffezzoni and S. Levantino, “Phase noise of pulse injection-locked oscillators”, IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 10, pp. 2912–2919,
Oct. 2014.
128
[85] M. C. M. Soer, E. A. M. Klumperink, Z. Ru, F. E. van Vliet, and B. Nauta, “A 0.2-to-2.0
GHz 65nm CMOS receiver without LNA achieving >11dBm IIP3 and < 6.5 dB NF”, in
IEEE International Solid-State Circuits Conference. Digest of Technical Papers, Feb. 2009,
222–223,223a.
[86] C. Andrews and A. C. Molnar, “A passive mixer-first receiver with digitally controlled
and widely tunable RF interface”, IEEE Journal of Solid-State Circuits, vol. 45, no. 12,
pp. 2696–2708, Dec. 2010.
[87] I. Choi and B. Kim, “A passive mixer-first receiver front-end without external components
for mobile TV applications”, in 2013 IEEE Radio Frequency Integrated Circuits Sympo-
sium (RFIC), Jun. 2013, pp. 145–148.
[88] F. Lin, P. I. Mak, and R. P. Martins, “An RF-to-BB-current-reuse wideband receiver with
parallel N-path active/passive mixers and a single-MOS pole-zero LPF”, IEEE Journal of
Solid-State Circuits, vol. 49, no. 11, pp. 2547–2559, Nov. 2014.
[89] S. Hameed and S. Pamarti, “A time-interleaved filtering-by-aliasing receiver front-end with
>70dB suppression at <4 *bandwidth frequency offset”, in 2017 IEEE Radio Frequency
Integrated Circuits Symposium (RFIC), Feb. 2017, pp. 418–419.
[90] Y. Xu and P. R.Kinget, “A chopping switched-capacitor RF receiver with integrated blocker
detection, +31dBm OB-IIP3, and +15dBm OB-B1dB”, in 2016 IEEE Symposium on VLSI
Technology, Jun. 2016, pp. 1–2.
[91] C. Wu, Y. Wang, B. Nikolic, and C. Hull, “A passive-mixer-first receiver with LO leakage
suppression, 2.6dB NF, > 15dBm wide-band IIP3, 66dB IRR supporting non-contiguous
carrier aggregation”, in 2015 IEEE Radio Frequency Integrated Circuits Symposium (RFIC),
May 2015, pp. 155–158.
[92] A. Nejdel, M. Abdulaziz, M. Tormanen, and H. Sjoland, “A positive feedback passive
mixer-first receiver front-end”, in 2015 IEEE Radio Frequency Integrated Circuits Sympo-
sium (RFIC), May 2015, pp. 79–82.
[93] A. Nejdel, H. Sjoland, and M. Tormanen, “A noise-cancelling receiver front-end with fre-
quency selective input matching”, IEEE Journal of Solid-State Circuits, vol. 50, no. 5,
pp. 1137–1147, May 2015.
[94] Z. Lin, P. l. Mak, and R. P. Martins, “2.4 A 0.028mm2 11mW single-mixing blocker-
tolerant receiver with double-RF N-path filtering, S11 centering, +13dBm OB-IIP3 and
1.5-to-2.9dB NF”, in 2015 IEEE International Solid-State Circuits Conference. Digest of
Technical Papers, Feb. 2015, pp. 1–3.
129
[95] H. Westerveld, E. Klumperink, and B. Nauta, “A cross-coupled switch-RC mixer-first tech-
nique achieving +41dBm out-of-band IIP3”, in 2016 IEEE Radio Frequency Integrated
Circuits Symposium (RFIC), May 2016, pp. 246–249.
[96] S. Jayasuriya, D. Yang, and A. Molnar, “A baseband technique for automated LO leakage
suppression achieving < -80dBm in wideband passive mixer-first receivers”, in Proceed-
ings of the IEEE 2014 Custom Integrated Circuits Conference, Sep. 2014, pp. 1–4.
[97] A. Ghaffari, E. A. M. Klumperink, M. C. M. Soer, and B. Nauta, “Tunable high-Q N-path
band-pass filters: modeling and verification”, IEEE Journal of Solid-State Circuits, vol. 46,
no. 5, pp. 998–1010, May 2011.
[98] M. Darvishi, R. van der Zee, E. A. M. Klumperink, and B. Nauta, “Widely tunable 4th
order switched Gm-C band-pass filter based on N-Path filters”, IEEE Journal of Solid-
State Circuits, vol. 47, no. 12, pp. 3105–3119, Dec. 2012.
[99] M. Darvishi, R. van der Zee, and B. Nauta, “Design of active N-path filters”, IEEE Journal
of Solid-State Circuits, vol. 48, no. 12, pp. 2962–2976, Dec. 2013.
[100] H. Hedayati, V. Aparin, and K. Entesari, “A +22dBm IIP3 and 3.5dB NF wideband receiver
with RF and baseband blocker filtering techniques”, in 2014 IEEE Symposium on VLSI
Technology, Jun. 2014, pp. 1–2.
[101] J. Zhu and P. R. Kinget, “A field-programmable noise-canceling wideband receiver with
high-linearity hybrid class-AB-C LNTAs”, in Proceedings of the IEEE 2015 Custom Inte-
grated Circuits Conference, Sep. 2015, pp. 1–4.
[102] J. W. Park and B. Razavi, “A 20mW GSM/WCDMA receiver with RF channel selection”,
in 2014 IEEE International Solid-State Circuits Conference. Digest of Technical Papers,
Feb. 2014, pp. 256–257.
[103] J. Zhu, H. Krishnaswamy, and P. R. Kinget, “Field-programmable LNAs with interferer-
reflecting loop for input linearity enhancement”, IEEE Journal of Solid-State Circuits, vol.
50, no. 2, pp. 556–572, Feb. 2015.
[104] C. k. Luo, P. S. Gudem, and J. F. Buckwalter, “A 0.4 -6-GHz 17-dBm B1dB 36-dBm
IIP3 channel-selecting low-noise amplifier for SAW-Less 3G/4G FDD diversity receivers”,
IEEE Trans. Microw. Theory Techn., vol. 64, no. 4, pp. 1110–1121, Apr. 2016.
[105] E. A. Keehr and A. Hajimiri, “Equalization of third-order intermodulation products in
wideband direct conversion receivers”, IEEE Journal of Solid-State Circuits, vol. 43, no.
12, pp. 2853–2867, Dec. 2008.
130
[106] Y. Zhang, J. Zhu, and P. R. Kinget, “An FTNC receiver with +32.5dBm effective OB-
IIP3 using baseband IM3 cancellation”, in 2017 IEEE Radio Frequency Integrated Circuits
Symposium (RFIC), Jun. 2017, pp. 3–6.
[107] E. Sacchi, I. Bietti, S. Erbat, L. Tee, P. Wmercati, and R. Castellos, “A 15 mW, 70 kHz
1/f corner direct conversion CMOS receiver”, in Proceedings of the IEEE 2003 Custom
Integrated Circuits Conference, Sep. 2003, pp. 459–462.
[108] Z. Ru, E. A. M. Klumperink, G. J. M. Wienk, and B. Nauta, “A software-defined radio
receiver architecture robust to out-of-band interference”, in 2009 IEEE International Solid-
State Circuits Conference. Digest of Technical Papers, Feb. 2009, pp. 230–231.
[109] A. Mirzaei, H. Darabi, J. C. Leete, and Y. Chang, “Analysis and optimization of direct-
conversion receivers with 25% duty-cycle current-driven passive mixers”, IEEE Trans.
Circuits Syst. I, vol. 57, no. 9, pp. 2353–2366, Sep. 2010.
[110] M. Mikhemar, D. Murphy, A. Mirzaei, and H. Darabi, “A cancellation technique for reciprocal-
mixing caused by phase noise and spurs”, IEEE Journal of Solid-State Circuits, vol. 48,
no. 12, pp. 3080–3089, Dec. 2013.
[111] P. R. Kinget and M. S. J. Steyaert, “A 1-GHz CMOS up-conversion mixer”, IEEE Journal
of Solid-State Circuits, vol. 32, no. 8, pp. 370–376, Mar. 1997.
[112] J. P. Thiran, “Recursive digital filters with maximally flat group delay”, IEEE Trans. Circuit
Theory, vol. 18, no. 6, pp. 659–664, Nov. 1971.
[113] E. A. Keehr and A. Hajimiri, “Successive regeneration and adaptive cancellation of higher
order intermodulation products in RF receivers”, IEEE Trans. Microw. Theory Techn, vol.
59, no. 5, pp. 1379–1396, Mar. 2011.
[114] S. Hameed, N. Sinha, M. Rachid, and S. Pamarti, “A programmable receiver front-end
achieving >17dBm IIP3 at <1.25×BW frequency offset”, in 2016 IEEE International Solid-
State Circuits Conference (ISSCC), Jan. 2016, pp. 446–447.
[115] S. Hameed and S. Pamarti, “A time-interleaved filtering-by-aliasing receiver front-end with
>70dB suppression at <4× bandwidth frequency offset”, in 2017 IEEE International Solid-
State Circuits Conference (ISSCC), Feb. 2017, pp. 418–419.
[116] M. Rachid, S. Pamarti, and B. Daneshrad, “Filtering by aliasing”, IEEE Transactions on
Signal Processing, vol. 61, no. 9, pp. 2319–2327, May 2013.
131
