Injection Locked Clocking and Transmitter Equalization Techniques for Chip to Chip Interconnects by Raj, Mayank
Injection Locked Clocking and 
Transmitter Equalization Techniques 
for Chip to Chip Interconnects 
 
 
 
Thesis by 
Mayank Raj 
 
In Partial Fulfillment of the 
Requirements for the Degree of  
Doctor of Philosophy 
 
 
 
 
 
CALIFORNIA INSTITUTE OF TECHNOLOGY 
Pasadena, California 
2015 
(Defended 31 October 2014)
 ii 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 2014 
Mayank Raj 
All Rights Reserved 
 iii 
To my family
 iv 
Acknowledgements 
My graduate studies at Caltech have been a very enriching experience and a source of enormous 
educational and personal growth. This would not have been possible without the guidance and 
support of following individuals: 
First among these is my advisor, Prof. Azita Emami. She always believed in my ideas and 
motivated me to purse them irrespective of the risks. She taught me that circuit design is not just 
about meeting benchmarks but also about building innovative systems.  Her support, advice and 
encouragement have been a defining and essential part of my journey through graduate school. 
I feel very honored and privileged to have worked with her. 
I would like to thank the members of my candidacy and defense committees, Prof. Ali 
Hajimiri, Prof. David Rutledge, Prof. Sander Weinreb and Prof. Hyuck Choo, for their 
willingness to participate in and evaluate my research, and for their probing questions and 
valuable input.   
I have greatly benefitted from interacting with an amazing group of colleagues at Caltech. I 
thank my fellow group members Matthew Loh, Juhwan Yoo, Meisam Hornarvar Nazari, 
Manuel Monge, Saman Saeedi, Abhinav Agarwal, Krishna Settaluri, Angie Wang and Kaveh 
Hosseini. I am especially thankful to Matthew Loh, Meisam Hornarvar Nazari and Juhwan 
Yoo for helping me learn the basics of high speed design and testing at the beginning of my 
graduate studies. An essential part of an IC designer’s life is spending long nights in the office 
during impending tape-out deadlines. I am indebted to my friend Manuel Monge for being by 
my side during those times. My special thanks to Saman Saeedi for collaborating with me on 
the optical receiver chip. I also extend my gratitude to my friends Kaushik Dasgupta, Amirreza 
Safaripour, Kaushik Sengupta, Steven Bowers, Behrooz Abiri, Firooz Alfatouni, Florian Bohn, 
Hua Wang and Alex Pai in Prof. Ali Hajimiri’s group.   
My stay at Caltech would not have been complete without the Badminton Club. I would like 
to thank my friends from the club, namely, Siddhartha Pathak, Remey Mevel, Chinthaka 
Mallikarachchi, Kaushik Dasgupta, Phong Nguyen, Surendranath Somala, Varun Bhalerao, 
Shriharsh Tendulkar, Yi Cao, Cindy Wang and Qi Wen Li. I had great fun organizing and 
participating in tournaments and practice sessions. The club injected my life with enthusiasm 
and energy, which is frequently drained away by the demands of graduate study. 
 v 
I spent very enjoyable three months at Xilinx Inc. in San Jose, CA working with a group of 
very talented IC designers. I would like to thank Ken Chang, Jafar Savoj, Didem Turker, and 
Parag Upadhyaya for this wonderful experience. 
During my undergraduate days at IIT Kanpur I was fortunate to have a wonderful mentor in 
Prof. Shafi Qureshi who became my bachelor's thesis advisor. The encouragement and guidance 
provided by him proved to be instrumental in deciding my career path. I also got an opportunity 
to do an internship at the University of Michigan, Ann Arbor, under Prof. David Blaauw and 
Prof. Dennis Sylvester, which helped me to shape my research interests and get exposure to state 
of the art research facilities as an undergraduate. Overall, IIT Kanpur provided a great ambience 
and strong foundation, which helped me develop as an individual and prepared me to tackle the 
challenges awaiting in graduate school.  
Throughout the course of my graduate studies, I have received a tremendous amount of 
support from my parents, Dr. Bipin Bihari Sharma and Kanchan Sharma and my little sister, 
Dr. Neha Raj. I am forever indebted to them for all the sacrifices that they made for me. Finally, 
special thanks to my wife, Shilpi Sneha, whose immeasurable love guided me through the latter 
half of this journey. I would not have never been even close to where I am today if it was not for 
my family and for that, I dedicate this thesis to them. 
 
 vi 
Abstract 
Semiconductor technology scaling has enabled drastic growth in the computational capacity of 
integrated circuits (ICs). This constant growth drives an increasing demand for high bandwidth 
communication between ICs. Electrical channel bandwidth has not been able to keep up with 
this demand, making I/O link design more challenging. Interconnects which employ optical 
channels have negligible frequency dependent loss and provide a potential solution to this I/O 
bandwidth problem. Apart from the type of channel, efficient high-speed communication also 
relies on generation and distribution of multi-phase, high-speed, and high-quality clock signals. 
In the multi-gigahertz frequency range, conventional clocking techniques have encountered 
several design challenges in terms of power consumption, skew and jitter. Injection-locking is a 
promising technique to address these design challenges for gigahertz clocking. However, its 
small locking range has been a major contributor in preventing its ubiquitous acceptance.    
In the first part of this dissertation we describe a wideband injection locking scheme in an LC 
oscillator. Phase locked loop (PLL) and injection locking elements are combined symbiotically 
to achieve wide locking range while retaining the simplicity of the latter. This method does not 
require a phase frequency detector or a loop filter to achieve phase lock. A mathematical analysis 
of the system is presented and the expression for new locking range is derived. A locking range 
of 13.4 GHz–17.2 GHz (25%) and an average jitter tracking bandwidth of up to 400 MHz are 
measured in a high-Q LC oscillator. This architecture is used to generate quadrature phases from 
a single clock without any frequency division. It also provides high frequency jitter filtering while 
retaining the low frequency correlated jitter essential for forwarded clock receivers.  
To improve the locking range of an injection locked ring oscillator; QLL (Quadrature locked 
loop) is introduced. The inherent dynamics of injection locked quadrature ring oscillator are 
used to improve its locking range from 5% (7-7.4GHz) to 90% (4-11GHz). The QLL is used to 
generate accurate clock phases for a four channel optical receiver using a forwarded clock at 
quarter-rate. The QLL drives an injection locked oscillator (ILO) at each channel without any 
repeaters for local quadrature clock generation. Each local ILO has deskew capability for phase 
alignment. The optical-receiver uses the inherent frequency to voltage conversion provided by 
the QLL to dynamically body bias its devices. A wide locking range of the QLL helps to achieve 
a reliable data-rate of 16-32Gb/s and adaptive body biasing aids in maintaining an ultra-low 
power consumption of 153pJ/bit.  
 vii 
From the optical receiver we move on to discussing a non-linear equalization technique for 
a vertical-cavity surface-emitting laser (VCSEL) based optical transmitter, to enable low-power, 
high-speed optical transmission. A non-linear time domain optical model of the VCSEL is built 
and evaluated for accuracy. The modelling shows that, while conventional FIR-based pre-
emphasis works well for LTI electrical channels, it is not optimum for the non-linear optical 
frequency response of the VCSEL. Based on the simulations of the model an optimum 
equalization methodology is derived. The equalization technique is used to achieve a data-rate 
of 20Gb/s with power efficiency of 0.77pJ/bit.  
 
 
 
 
 
 
 
 
 
 
 
 
 
  
viii 
Contents 
 
Acknowledgements ................................................................................. iv 
Abstract ................................................................................................ vi 
Contents ............................................................................................. viii 
List of Figures ....................................................................................... xii 
List of Tables ....................................................................................... xvii 
Chapter 1: Introduction ............................................................................ 1 
1.1 Optical Interconnects ................................................................................................. 3 
1.2 Injection Locked Clocking in Parallel Links ............................................................... 5 
1.3 Organization .............................................................................................................. 8 
Chapter 2: Background ............................................................................. 9 
2.1 Metrics of High-Speed Interconnect ........................................................................... 9 
2.2 Clocking ................................................................................................................... 11 
2.3 Sub-rate Clocking ..................................................................................................... 13 
2.4 Clock Jitter ............................................................................................................... 14 
2.4.1 Random Jitter .................................................................................................. 15 
2.4.2 Deterministic Jitter .......................................................................................... 15 
2.5 Types of Jitter ........................................................................................................... 15 
2.5.1 Period Jitter ..................................................................................................... 15 
2.5.2 Cycle-to-Cycle Jitter ......................................................................................... 16 
2.5.3 Time Interval Error (TIE) ................................................................................ 16 
2.5.4 Phase Noise (Integrated RMS Jitter) ................................................................ 18 
2.6 Injection Locking Background.................................................................................. 19 
  
ix 
2.7 VCSEL based Optical Transmitter ........................................................................... 23 
Chapter 3: Wideband Injection Locking Scheme and Quadrature Phase Generation 
in LC Oscillator ..................................................................................... 28 
3.1 System Architecture ................................................................................................. 30 
3.1.1 Comparison with ILPLL ................................................................................. 31 
3.1.2 Common Mode Injection ................................................................................ 32 
3.1.3 Implementation Details ................................................................................... 32 
3.1.4 System Analysis in Locked State ...................................................................... 32 
3.1.5 Quadrature Phase Generation .......................................................................... 34 
3.2 Mathematical Analysis ............................................................................................. 36 
3.3 Measurement Results ............................................................................................... 41 
3.3.1 Locking Range and RMS Jitter ........................................................................ 41 
3.3.2 Jitter Transfer Function ................................................................................... 43 
3.3.3 Quadrature Accuracy and Deskew .................................................................. 44 
3.4 Summary ................................................................................................................. 46 
Chapter 4: Quadrature Locked Loop (QLL) ................................................. 48 
4.1 Proposed Approach .................................................................................................. 51 
4.2 Mathematical Analysis ............................................................................................. 56 
4.2.1 Behavioral Modelling ...................................................................................... 59 
4.3 Circuit Implementation ............................................................................................ 62 
4.3.1 Transient Simulation ....................................................................................... 63 
4.4 QLL Based Clocking ................................................................................................ 66 
4.5 Hardware Measurements ......................................................................................... 67 
4.5.1 Locking Range and Integrated Jitter ................................................................ 68 
4.5.2 Reference and Supply Noise Filtering .............................................................. 70 
4.5.3 Quadrature Accuracy ....................................................................................... 72 
4.5.4 Power Consumption ........................................................................................ 73 
4.5.5 Comparison with Prior Art .............................................................................. 74 
  
x 
4.6 Summary ................................................................................................................. 76 
Chapter 5: QLL Based Clocking for a Four Channel Quarter-Rate Optical Receiver
 ........................................................................................................... 77 
5.1 System Architecture ................................................................................................. 78 
5.1.1 Optical Receiver .............................................................................................. 79 
5.1.2 Adaptive Body Biasing .................................................................................... 81 
5.2 Deskew .................................................................................................................... 83 
5.2.1 Symmetric Injection ......................................................................................... 85 
5.3 Hardware Measurements ......................................................................................... 87 
5.3.1 Test Setup ........................................................................................................ 88 
5.3.2 Receiver BER Measurements ........................................................................... 89 
5.3.3 Deskew Range ................................................................................................. 91 
5.3.4 Power Consumption ........................................................................................ 91 
5.3.5 Comparison with Prior Art .............................................................................. 93 
5.4 Summary ................................................................................................................. 93 
5.5 QLL: Future Work ................................................................................................... 94 
Chapter 6: VCSEL Modelling and Equalization .......................................... 100 
6.1 Background ............................................................................................................ 100 
6.2 Speed Limitations .................................................................................................. 102 
6.3 VCSEL Modelling for Simulation .......................................................................... 105 
6.3.1 Simplified Approach ...................................................................................... 105 
6.3.2 Electrical Model ............................................................................................ 106 
6.3.3 Optical Model ................................................................................................ 107 
6.3.4 Complete Model ............................................................................................ 108 
6.4 Model Evaluation ................................................................................................... 109 
6.5 VCSEL Equalization Methodology ........................................................................ 111 
6.5.1 Conventional FIR-Based Pre-Emphasis ......................................................... 111 
6.5.2 Proposed Equalization Technique ................................................................. 114 
  
xi 
6.6 Simulated Results ................................................................................................... 116 
6.7 Circuit Implementation .......................................................................................... 118 
6.8 Experimental Results .............................................................................................. 122 
6.8.1 Optical Measurement Setup ........................................................................... 122 
6.8.2 Measured Eye-Diagrams ................................................................................ 123 
6.9 Summary ............................................................................................................... 126 
Chapter 7: Conclusion ........................................................................... 127 
List of Abbreviations ............................................................................. 131 
Bibliography ........................................................................................ 133 
 
 
 
 
 
 
  
xii 
List of Figures 
Figure 1.1: Scaling in microprocessors. .................................................................................... 2 
Figure 1.2: Microprocessor core count scaling (left) and microprocessor clock frequency scaling 
(right) [2] (data from ISSCC trends 2012). ................................................................... 2 
Figure 1.3: Scaling of common wireline I/O standards (top) [3] and block diagram of chip to 
chip links in a computer server. ................................................................................... 3 
Figure 1.4: Forwarded clock parallel link. ................................................................................ 6 
Figure 2.1: (a) Basic clocked high-speed link. (b) Typical receiver data eye-diagram with voltage 
and timing margins (Vm and Tm). (c) Translation of eye-diagram to bathtub curve. .... 10 
Figure 2.2: (a) Source synchronous (forwarded clock) link. (b) Plesiochronous (embedded clock) 
link. ........................................................................................................................... 12 
Figure 2.3: Block diagram of a quarter-rate receiver. .............................................................. 13 
Figure 2.4: Components of jitter. ........................................................................................... 14 
Figure 2.5: Different types of jitter measurements. ................................................................. 17 
Figure 2.6: Relationship between period, cycle-to-cycle, and TIE jitter. ................................. 17 
Figure 2.7: Phase noise plot and integrated jitter measurement. ............................................. 18 
Figure 2.8: Injection locked oscillator. ................................................................................... 19 
Figure 2.9: Vector field for (2.3). ............................................................................................ 20 
Figure 2.10: Phase noise of the injected output as a function of the phase noise of VCO and input 
signals. ...................................................................................................................... 23 
Figure 2.11: (a) Cross-section of a VCSEL. (b) Die micrograph of a VCSEL. ........................ 24 
Figure 2.12: VCSEL L-I curve. .............................................................................................. 25 
Figure 2.13: VCSEL bandwidth limitations. .......................................................................... 26 
Figure 2.14: Current-mode VCSEL driver. ............................................................................ 27 
Figure 3.1: (a) LC oscillator with injection. (b) Variation of locking range with Q for a constant 
injection strength of 0.1. (c) Variation of power consumption with Q for a constant 
oscillation amplitude of 600 mV. (d) Improvement in locking range vs. power 
consumption for a constant injection strength and oscillation amplitude 
(Simulation). ............................................................................................................. 29 
  
xiii 
Figure 3.2: Block diagram of (a) proposed system and (b) Injection locked phase locked loop 
(ILPLL). ................................................................................................................... 30 
Figure 3.3: Schematic of the proposed system. The input to the common mode of the varactors 
contains 2f and DC components. The DC component brings the natural frequency close 
to the frequency of the reference clock and the 2f component does the injection lock. 31 
Figure 3.4: Simulation results, (a) θ vs. ref. frequency, (b) α vs. ref. frequency, (c) fo – finj vs. ref. 
frequency, (d) DC characteristic of the transmission gate, (e) Vctrl at 14 GHz and 16.5 
GHz clock reference. ................................................................................................. 33 
Figure 3.5: Schematic of the proposed system for quadrature phase generation. .................... 35 
Figure 3.6: System level block diagrams showing injection and PLL feedbacks. .................... 36 
Figure 3.7: New locking range fLnew and regular locking range fL. (b) Transient solutions to 
proposed system (3.7) and regular ILO (3.3). ............................................................. 38 
Figure 3.8: Variation of fLnew with Δo. ..................................................................................... 39 
Figure 3.9: Simulated frequency behavior of Q of the inductor. ............................................. 40 
Figure 3.10: (a)-(e) Measured locked output signals at several reference frequencies. (f) Setup for 
locking range and RMS jitter measurement. (g) Measured input and output j itter at 
different reference frequencies. .................................................................................. 41 
Figure 3.11: (a) Measurement setup for generating PM signal reference. (b) Setup for measuring 
the spectrum of reference and output signals ............................................................. 42 
Figure 3.12: (a) Measured jitter transfer function for 14 GHz, 15 GHz and 16 GHz reference 
frequencies. (b) Response to low frequency (10 MHz) and high frequency (1 GHz) jitter.
 .................................................................................................................................. 43 
Figure 3.13: (a) Measured percentage quadrature phase error vs. reference frequency (b) 
Measured quadrature phase waveforms at 14 GHz (c) Measured quadrature phase 
waveforms at 15 GHz. .............................................................................................. 44 
Figure 3.14: Measured maximum phase shift of the replica oscillator at different reference 
frequencies. ............................................................................................................... 46 
Figure 3.15: Die Micrograph. (A) shows the details of the high-Q inductor and (B) shows the 
placement of the varactors. ........................................................................................ 46 
Figure 4.1: Simulated variation in oscillation frequency of a ring oscillator with change in supply 
voltage for 28nm and 65nm technologies. ................................................................. 49 
Figure 4.2: Histogram of change in fo in a ring oscillator with process variation. .................... 49 
  
xiv 
Figure 4.3: Phase error in a ring oscillator due to injection. .................................................... 50 
Figure 4.4: Multi-phase injection in a ring oscillator. ............................................................. 50 
Figure 4.5: Deriving the quadrature phase error expression in a two stage ring oscillator ....... 52 
Figure 4.6: Quadrature error in unlocked case (a) close to lock (b) far from lock. ................... 53 
Figure 4.7: MQPE vs. fo for a fixed finj of 7GHz ..................................................................... 54 
Figure 4.8: Effect of injection strength on MQPE .................................................................. 55 
Figure 4.9: Block diagram of the proposed system (QLL) ...................................................... 56 
Figure 4.10: Design of the loop filter. ..................................................................................... 58 
Figure 4.11: Transient locking characteristics of Simulink model of QLL for two different loop 
filters. ........................................................................................................................ 59 
Figure 4.12: Simulink model of QLL (top) and Matlab code to extract the linear state-space 
model around the operating point. ............................................................................ 60 
Figure 4.13: Step response and transfer function of linearized QLL Simulink model for different 
loop bandwidths (Small signal behavior). .................................................................. 61 
Figure 4.14: Circuit architecture of QLL. .............................................................................. 62 
Figure 4.15: Ring oscillator based ILO circuit schematic. ...................................................... 64 
Figure 4.16: (a) Transient locking characteristics of QLL. (b) Ring oscillator characteristics. . 65 
Figure 4.17: Locking transient for two different initial conditions .......................................... 66 
Figure 4.18: Die micrograph and layout details. .................................................................... 67 
Figure 4.19: Phase noise and integrated jitter measurements for 8GHz (electrical and optical) 
and 11GHz (electrical). ............................................................................................. 68 
Figure 4.20: Measured phase noise of the locked QLL output across the entire locking 
range. ........................................................................................................................ 69 
Figure 4.21: Measurement setup for generating FM signal reference. (b) Setup for measuring the 
spectrum of output signals. ........................................................................................ 70 
Figure 4.22: (a) Measured Jitter transfer function for 8GHz reference. (b) Response to low 
frequency (10 MHz) and high frequency (1 GHz) jitter. ............................................ 71 
Figure 4.23: QLL response to supply noise compared to unlocked (no reference) case. .......... 72 
Figure 4.24: Measured  quadrature phase error vs. reference frequency and measured quadrature 
phase waveforms at 5, 8 and 11GHz. ........................................................................ 73 
Figure 4.25: (a) Power consumption of the QLL vs. frequency. (b) Power efficiency of the QLL 
vs. frequency. ............................................................................................................ 75 
Figure 5.1: QLL based clock distribution architecture for a 4 channel optical receiver ........... 79 
  
xv 
Figure 5.2: Single channel quarter-rate receiver. .................................................................... 80 
Figure 5.3: (a) FD SOI MOS structure (b) Threshold voltage (Vth) variation with back bias 
(Vb) ........................................................................................................................... 82 
Figure 5.4: Simulated ring oscillator characteristics. .............................................................. 82 
Figure 5.5: Deskewing in forwarded clock links; (a) conventional (b) proposed. .................... 83 
Figure 5.6: Jitter transfer function characteristics of PLL, DLL and ILO. .............................. 84 
Figure 5.7: QLL based deskewing architecture (single channel). ............................................ 84 
Figure 5.8: Symmetric vs. two phase injection. (a) Two phase injection architecture. (b) 
Symmetric injection architecture. (c) Simulation based comparison of two phase and 
symmetric injection. .................................................................................................. 86 
Figure 5.9: Chip micrograph and layout details. .................................................................... 87 
Figure 5.10: Test setup for optical receiver. ............................................................................ 88 
Figure 5.11: Measured eye diagram (a) and BER (b) with PRBS 15 optical data at 32Gb/s. .. 89 
Figure 5.12: BER vs. optical power (receiver sensitivity) at different data-rates (top). Optical 
sensitivity vs. data rate (bottom). ............................................................................... 90 
Figure 5.13: Measured deskewed waveform for 32Gb/s data. ............................................... 91 
Figure 5.14: Power consumption breakdown at 32Gb/s (top) and energy efficiency per bit across 
different data rates. .................................................................................................... 92 
Figure 5.15: QLL based clocking for an n-channel forwarded clock receiver (left). Proposed 
clocking scheme for a single channel forwarded clock receiver (right). ....................... 94 
Figure 5.16 (a) Conventional QLL architecture. (b) Modified QLL architecture to add 
deskew. ..................................................................................................................... 95 
Figure 5.17: MQPE for the QLL without deskew and with deskew. ...................................... 97 
Figure 5.18: Transient locking characteristics of the modified QLL and regular QLL. (a) Initial 
frequency (finit)=5.75GHz (b) finit=8.4GHz ................................................................. 98 
Figure 5.19: Deskewing by changing d1, in the modified QLL. .............................................. 99 
Figure 6.1: Cross-section of a VCSEL ................................................................................... 100 
Figure 6.2: VCSEL L-I curve ................................................................................................ 101 
Figure 6.3: VCSEL small signal AC characteristics [45]. ...................................................... 104 
Figure 6.4: Simplified, non-linear VCSEL modeling. ............................................................ 106 
Figure 6.5: VCSEL electrical parasitics. ................................................................................ 107 
Figure 6.6: Optical model of a VCSEL. ................................................................................ 108 
Figure 6.7: Combined model for simulating a VCSEL. ......................................................... 109 
  
xvi 
Figure 6.8: VCSEL modelling: comparing the measured (top) and simulated (bottom). ........ 110 
Figure 6.9: Simulated modulation bandwidth variation with bias current. ............................ 111 
Figure 6.10: Transmitter equalization boosts the high frequency component to achieve a flat 
response. .................................................................................................................. 112 
Figure 6.11: Pulse response of channel (right) before and after pre-emphasis. ....................... 113 
Figure 6.12: Block diagram of a transmitter with n-tap FIR-based equalization. ................... 113 
Figure 6.13: VCSEL pulse response for (a) isolated 1, (b) isolated 0, (c) responses superimposed.
 ................................................................................................................................. 114 
Figure 6.14: VCSEL pulse responses for different bias currents. ............................................ 115 
Figure 6.15: Proposed equalization technique....................................................................... 115 
Figure 6.16: Proposed method for selecting teq. ..................................................................... 116 
Figure 6.17: Simulated optical eye-diagrams with and without equalization. (a) 20Gb/s high 
current, (b) 20Gb/s low current, (c) 30Gb/s. ............................................................ 117 
Figure 6.18: Circuit architecture. .......................................................................................... 119 
Figure 6.19: Conventional CML-to-CMOS structure used for digital clock generation from an 
analog input. ............................................................................................................ 120 
Figure 6.20: QLL based CML-to-CMOS conversion and quadrature phase generation. ....... 120 
Figure 6.21: Chip micrograph and layout details. ................................................................. 121 
Figure 6.22:  Butt coupling proves too lossy and noisy for VCSEL measurements. ............... 122 
Figure 6.23: Optical measurement setup. .............................................................................. 123 
Figure 6.24: Measured VCSEL optical output at 16Gb/s (PRBS-15), with and without 
equalization. ............................................................................................................ 124 
Figure 6.25: Measured optical eye-diagram for PRBS-15 data at 20Gb/s. (a) Unequalized (b) 
Equalized. ................................................................................................................ 125 
Figure 7.1: Constant growth of the required I/O bandwidth according to ITRS. .................. 127 
Figure 7.2: Number of injection locking based wireline publications in International Solid-States 
Circuits Conference (ISSCC) across a decade. .......................................................... 128 
 
 
 
 
 
  
xvii 
List of Tables 
Table 3.1: Performance comparison for wideband injection locked LC oscillator. ..................45 
Table 4.1: Performance comparison for QLL. .......................................................................74 
Table 5.1: Performance comparison for 4 channel optical receiver. ........................................93 
Table 6.1: Typical VCSEL electrical parasitics values. ..........................................................106  
Table 6.2: Typical VCSEL optical modelling parameters. ....................................................108 
Table 6.3: Summary of simulated improvement by the proposed VCSEL equalization 
technique............................................................................................................................118 
 
 
 
 
 
 
 
 
 
 
 
  
1 
Chapter 1: Introduction 
We are living in an era where number of transistors in ICs (Integrated Circuits) outnumber 
earth’s population (Figure 1.1). The relentless pursuit of Moore’s law has enabled our journey 
from the first Intel 4004 microprocessor in 1971 with a modest 2.3k transistors to the modern 
Orcale Sparc M7 microprocessor with an astounding 10 billion transistors (Figure 1.1).  This 
remarkable growth has made today’s ICs really complex systems with different communicating 
and processing components.  
In the early stages of CMOS technology, integration of more and smaller transistors allowed 
increasing complexity in the design of processing and communication units. It led to a trend 
towards rise in clock speeds (Figure 1.2). This approach provided a tremendous improvement 
in processing speed and power efficiency until 2004, when designers ran into the problem of 
increased power consumption. It turned out that by scaling clock frequency, only marginal 
improvement in processing performance was achieved while a significant power penalty had to 
be paid [1]. Power reduction became mandatory and the trend towards lower clock frequencies 
started, as shown in the frequency trends chart in Figure 1.2. The performance loss resulting 
from lower clock frequencies was compensated for by increased parallelism. Designers 
employed a parallel computing approach through multi-core processors (Figure 1.2). Present 
day high performance microprocessors have over tens of cores on a single chip and an aggregate 
performance of 100’s of gigaflops (floating point operations per second). In the near future 
processors are expected to have hundreds of cores to enable exascale computing.  
For the entire system to benefit from this increased computation throughput, the off-chip 
input/output (I/O) bandwidth should also scale. High aggregate bandwidth can be achieved by 
employing large numbers of inputs and outputs per chip as well as high data rates per I/O. This 
has led to the widespread use of parallel links, where interfaces between chips employ tens to 
hundreds of I/O links in parallel to achieve their aggregate bandwidth targets. But number of 
pins does not scale as fast due to physical connection and area limitations, thus the per pin 
bandwidth also needs to increase. Figure 1.3 shows that per-pin data rate has approximately 
  
2 
doubled every four years across a variety of diverse I/O standards ranging from DDR to graphics 
to high-speed Ethernet.  
 
 
 
Figure 1.2: Microprocessor core count scaling (left) and microprocessor clock frequency 
scaling (right) [2] (data from ISSCC trends 2012). 
 
 
Figure 1.1: Scaling in microprocessors. 
  
3 
 
Figure 1.3: Scaling of common wireline I/O standards (top) [3] and block diagram of chip 
to chip links in a computer server. 
 
However, as we reach the limits of electrical channel bandwidth, continuing along this trend 
for I/O scaling becomes more and more difficult.  
 
1.1 Optical Interconnects 
Electrical interconnects are conventionally the main platform for data communication. 
However, due to their limited bandwidth, the scaling of data rate proves to be very challenging. 
  
4 
Channel bandwidth degradation is the result of many physical effects, including skin effect, 
dielectric loss, and reflections due to impedance discontinuities. As a consequence, high data 
rate pulses transmitted through these channels will broaden to greater than a unit interval (UI), 
thus creating intersymbol interference (ISI) with preceding bits and succeeding bits which 
ultimately leads to signal-to-noise-ratio (SNR) degradation. A common approach in the design 
of high-speed serial links over bandwidth-limited channels is to employ equalization techniques 
to cancel destructive effects of ISI. Typical equalization techniques include decision feedback 
equalization (DFE) [4], feed-forward equalization (FFE) [5] and continuous time linear 
equalization [6] at the receiver and FFE at the transmitter [7]. However, the power and area 
overhead associated with equalization makes it difficult to achieve target bandwidth with a 
realistic power budget. As a result, rather than being technology limited, current high-speed I/O 
link designs are fast becoming channel and power limited.  
A promising solution to the I/O bandwidth problem is the use of optical inter-chip 
communication links. The negligible frequency dependent loss of optical channels provides the 
potential for optical link designs to fully utilize increased data rates provided through CMOS 
technology scaling without excessive equalization complexity. Optics also allow very high 
information density through wavelength division multiplexing (WDM). However, optical links 
do require additional circuits that interface to the optical sources and detectors. Thus, in order 
to achieve the potential link performance advantages, emphasis must be placed on using efficient 
optical devices and low-power and area interface circuits at the transmitter and the receiver ends. 
For optical transmitters, vertical-cavity surface-emitting lasers (VCSELs) [8], [9] are often 
used for electrical to optical conversion. A VCSEL is a semiconductor laser diode which emits 
light perpendicular from its top surface. These surface emitting lasers offers several 
manufacturing advantages over conventional edge-emitting lasers, including wafer-scale testing 
ability and dense 2D array production. They can be modulated directly by varying the laser 
current, thus offering advantage over multiple-quantum-well modulators [10] and ring resonator 
modulators [11] which require a separate continuous-wave laser source. Modulators, also 
require high voltage swing electrical inputs, making them difficult to integrate with modern 
CMOS technology. Unique properties of VCSELs make them a viable candidate for low-power 
and low-cost, optical modulation.  
Typical optical receivers use a photodiode to sense the high-speed optical power and produce 
an input current. This photocurrent is then converted to a voltage and amplified sufficiently, for 
data resolution, conventionally using a transimpedance amplifier (TIA). However, as data-rates 
  
5 
increase TIA based approaches have become more and more power hungry [12].  New 
techniques such as integrating frontend [13] and double-sampling [14] have improved optical 
receivers’ power consumption remarkably. These approaches have paved the way for massively 
parallel optical communications. However, complete utilization of the potential of these low-
power techniques requires innovations on the clocking front as well. In conventional clocking 
schemes that employ a global phase-locked-loop (PLL) locked reference and digitally distributed 
clock through buffer chains and clock grids, the power required to constantly switch the large 
capacitive loads can consume 40% of the chip’s total power budget [15]. Thus, alternative low-
power clocking schemes are required for the next generation of massively parallel optical links.  
Overall, for optical interconnects to become viable alternatives to established electrical links, 
they must be low cost and have competitive energy and area efficiency metrics. To address future 
optical interconnects power consumption requirement, in this dissertation we describe a low-
power clocking circuit for a 4 channel quarter-rate optical receiver and a low-power VCSEL 
based optical transmitter.  
 
1.2 Injection Locked Clocking in Parallel Links 
In communication systems, the generation and distribution of synchronizing clock is a 
fundamental task. Two types of clocking architecture are common in today’s multi-Gb/s I/Os. 
The first is the embedded clock (EC) architecture [16] where timing information in extracted 
from the data by performing clock and data recover (CDR). A per pin CDR proves too power 
hungry and complicated for parallel links with multiple data channels. Hence, for simplicity and 
better power efficiency, a synchronous forwarded clock (FC) architecture [17] is generally 
adopted in parallel. A typical block diagram of FC architecture is shown in Fig. 1.4. It consists 
of a single line of clock and multiple lines of data. The cost and power overhead of the FC circuits 
are amortized across multiple links in the system. Examples of source-synchronous parallel links 
include memory interfaces such as DDR3 [18], and chip-to-chip interfaces such as 
HyperTransport [19] and QuickPath [20].  
In FC links the clock pattern, sent on a separate but similar channel, is used in the receiver to 
sample the data pattern at the optimum point. At multi-Gb/s speeds each data channel may 
have phase mismatch or skew with respect to the reference clock. This necessitates a per channel 
  
6 
“deskewing”.  This is performed by the timing recovery circuit (Figure 1.4) which may be based 
on a phase-locked loop (PLL), a delay-locked loop (DLL) or an injection locked (IL) 
architecture. The pros and cons of each are discussed below. 
 Jitter on the forwarded clock is correlated with jitter on the data because both are generated 
by the same transmitter. Hence, jitter performance is improved by retiming the data with a clock 
that tracks correlated jitter on the forwarded clock [21]. However, since the delay of the data and 
clock paths typically differ by several UIs, very high frequency jitter will appear out-of-phase at 
the receiver and should not be tracked. To account for latency mismatch and sample the data 
pattern at the optimum point, a clock deskew mechanism is used to optimally shift the forwarded 
clock. DLLs in conjunction with phase interpolators (PIs) are commonly used to deskew the 
clock phase. However, due to an all-pass jitter transfer characteristic, a DLL cannot filter the 
high frequency jitter [17]. In fact the high frequency may also be amplified due to the finite 
bandwidth of the delay line of the DLL [22]. High-frequency clock jitter can be filtered by using 
a PLL in conjunction with PIs, owing to the inherent low-pass jitter transfer characteristic of a 
PLL. However, this low-pass phase transfer characteristic diminishes useful jitter components 
(i.e., those that are correlated to the data channel jitter) which could result in suboptimum 
performance and lower clock recovery bandwidth. PLLs also have other disadvantages such as 
susceptibility to jitter accumulation and stability issues.  
Injection locked oscillators (ILO) are a power and area efficient alternative to PLLs and 
DLLs. As discussed in Chapter 2, ILO can be modelled as first order PLL and hence can be used 
 
 
Figure 1.4: Forwarded clock parallel link. 
 
  
7 
to filter high frequency jitter. But unlike a PLL, an ILO has a higher jitter tracking bandwidth 
and thus it does not filter out the useful low frequency correlated jitter [17]. Additionally, ILO 
can perform clock deskew by introducing a frequency offset between the ILO’s free running 
frequency and the injected frequency. The first order nature of injection locking proves very 
useful as it ensures no peaking and guarantees stability. This makes the design of injection locked 
based circuits very simple compared to a PLL.  
Despite being so well suited to timing recovery in forwarded clock applications; the 
fundamental hindrance with all injection locked based systems is their small locking range. Ring 
and LC oscillators typically have a maximum locking range of 10% [23] [24]. This problem 
exacerbates with scaling as process, voltage and temperature (PVT) variations make it difficult 
to design reliable systems with small locking ranges. We propose techniques to enhance the 
locking ranges of LC and ring oscillators to ensure reliable operation of injection locking based 
techniques in forwarded clock architectures. 
One of the challenges that arise at higher data rates is timing and synchronization. As the UI 
(unit-interval) size, or bit time, decreases, the receiver has smaller and smaller timing margin 
and clocking becomes more difficult. In a full-rate link, the period of the clock is the same as the 
length of a UI and, for example, a 10Gb/s link will operate with a 10GHz clock. At multi-Gb/s 
data rates, however, the high-frequency clocks required for this approach consume large 
amounts of power and complicate the process of timing recovery. As a result, designers use sub-
rate clocking schemes. These are essentially multiplexing/demultiplexing schemes, where the 
clock operates at some integer fraction of the data rate and the data is transmitted and/or 
received using multiple phases of a clock period. Particularly popular are half-rate and quarter-
rate schemes. An essential prerequisite for these is the generation of quadrature phase clocks at 
low power overhead. Both ring and LC based dividers have been frequently used for quadrature 
phase generation [25]. However, they operate at twice the desired frequency, hence tend to be 
power inefficient. Quadrature phase generation through ring ILO’s without frequency division 
leads to phase inaccuracies [26]. In this dissertation, we describe techniques for injection locking 
based wideband accurate quadrature phase generation in LC and ring oscillators without any 
frequency division. 
  
8 
1.3 Organization 
This dissertation is composed of three major parts. Chapter 2 provides a review of clocking in 
high-speed data transmission systems. Metrics used for characterizing clock and data in high-
speed links are introduced. Injection locking dynamics are discussed. Basics of the VCSEL based 
optical transmitter are introduced.  
Chapter 3 describes a novel technique for wideband injection locking in an LC oscillator. We 
show how PLL and injection-locking elements can be combined symbiotically to achieve a wide 
locking range while retaining the simplicity of the latter. A mathematical analysis of the system 
is presented and the expression for the new locking range is derived. A locking range of 13.4 
GHz–17.2 GHz (25%) and an average jitter tracking bandwidth of up to 400 MHz are measured 
in a high-Q LC oscillator. This architecture is used to generate quadrature phases from a single 
clock without any frequency division. It also provides high frequency jitter filtering while 
retaining the low frequency correlated jitter essential for forwarded clock receivers.  
A unique injection locking technique called the QLL (Quadrature Locked Loop) is 
introduced in Chapter 4. It utilizes the inherent dynamics of the injection locked quadrature ring 
oscillator to improve its locking range from 5% (7-7.4GHz) to 90% (4-11GHz). The QLL is used 
to generate accurate clock phases for a four channel optical receiver using a forwarded clock at 
quarter-rate. Chapter 5 details the QLL based clocking for a four channel quarter-rate optical 
receiver. The QLL drives an ILO at each channel without any repeaters for local quadrature 
clock generation. Each local ILO has deskew capability for phase alignment. The optical-
receiver uses the inherent frequency to voltage conversion provided by the QLL to dynamically 
body bias its devices. A wide locking range of the QLL helps to achieve a reliable data-rate of 
16-32Gb/s, and adaptive body biasing aids in maintaining an ultra-low power consumption of 
153pJ/bit.  
From an optical receiver we move on to discussing a VCSEL based optical transmitter in 
Chapter 6. A non-linear time domain optical model of the VCSEL is built and evaluated for 
accuracy. Based on the simulations of the model, an optimum equalization methodology to 
enable low-power, high-speed optical transmission is derived. The equalization technique is used 
to achieve a data-rate of 20Gb/s with power efficiency of 0.77pJ/bit.  
Conclusions of the work are presented in Chapter 7.  
  
9 
Chapter 2: Background 
In this chapter we develop the framework for discussions in the later chapters. We start with a 
quick review of the metrics of a high-speed link. Next we delve into the details of clocking in 
high-speed interconnects. We describe the nature of timing uncertainty (jitter) in clocks and the 
common techniques used to characterize it. Then we describe the fundamentals of injection 
locking; a promising technique for high performance clock generation and distribution. We end 
this chapter by discussing a fundamental building block of an optical transmitter, vertical-cavity 
surface-emitting laser (VCSEL). 
 
2.1 Metrics of High-Speed Interconnect 
Figure 2.1 (a) shows the components and configuration of the basic clocked link. It consists of a 
transmitter, receiver, and channel. The transmitter (Tx) converts the digital data into an 
electrical/optical signal and launches it on the channel. Since the signal sent down the channel 
exists in the continuous time analog domain, the purpose of the receiver (Rx) is to determine the 
optimum decision point, in time and amplitude, in order to estimate the original bit-stream and 
minimize errors. Since a link’s receiver needs to convert an analog signal back into digital data, 
there is always a probability that (bit) errors will occur. Thus an important metric called bit-error 
rate (BER) is used to measure the reliability of the link in data communication links. A link’s 
maximum data rate is usually specified at a specific BER (e.g. 10−12) to guarantee the robustness 
of the overall system. In an additive white Gaussian noise (AWGN) channel, the BER is 
classically characterized by the voltage margin, Vm at the sampling point [27]:  
 
 
BER = 𝑒−
(
𝑉𝑚
𝑉𝑟
⁄ )
2
2  
(2.1) 
 
where Vr  is the root-mean square (RMS) voltage noise; since Gaussian noise is assumed, this 
is equivalent to the noise standard deviation.  
  
10 
 
 
Figure 2.1: (a) Basic clocked high-speed link. (b) Typical receiver data eye-diagram with 
voltage and timing margins (Vm and Tm). (c) Translation of eye-diagram to bathtub curve. 
 
Besides voltage noise, the second major contributor to BER is timing uncertainty at the 
receiver. Like voltage noise, this uncertainty is a random process, and it is characterized by the 
jitter of the receiver clock as well as that of the transmitted signal. Both sources of jitter shift the 
sampling point away from its optimum, and have the effect of reducing the voltage margin and 
  
11 
degrading the BER. This effect is of particular concern as data rates increase, since jitter can 
become a substantial portion of a data period (also known as a unit interval, UI). As a result, 
timing margin can become a larger concern than voltage margin in high-speed links [28]. A 
helpful and common tool for visualizing the effects of noise and jitter on a link is the eye diagram, 
which is generated by superimposing many UIs of the data signal (Figure 2.1(b)).  
In addition to the eye diagram, the bathtub curve is another diagnostic tool for performing 
signal integrity analysis. Bathtub curves are usually created by measuring the BER while 
sweeping the sampling clock over the bit time. Figure 2.1(c) shows a typical bathtub curve. 
Bathtub curves are useful tools for characterizing the performance of the receiver and show how 
tolerant the system is to the sampling clock jitter noise, as well as the amount of horizontal and 
vertical eye opening. 
 
2.2 Clocking 
One of the challenges that arise at higher data rates is timing and synchronization. As the UI 
size decreases, the receiver has a smaller and smaller timing margin and clocking naturally 
becomes more difficult. In order to provide a framework for discussion on this subject, it is 
helpful to outline several common clocking styles: 
 
-Synchronous: In a synchronous link, the transmitter and receiver clocks are assumed to have 
the same frequency and phase. This is generally only a tenable assumption at low data rates. 
- Mesochronous: In a mesochronous link, the transmitter and receiver clocks are assumed to 
have the same frequency, but may be out of phase. A popular sub-set of this category is the 
source-synchronous link, where the clock is generated at the transmitter and forwarded along 
with the data. These are also known as forwarded clock links. 
- Plesiochronous: In a plesiochronous link, the transmitter and receiver clocks may have slight 
differences in frequency. The receiver is required to align its clock by extracting timing 
information from the incoming data stream. These are also known as embedded clock links. 
- Asynchronous: An asynchronous link is not really clocked at all. Rather, it uses either 
control symbols inserted in the data stream itself or handshaking signals to convey timing 
information. 
  
12 
 
As the mesochronous/source-synchronous and plesiochronous styles are most frequently 
adopted for high-speed interconnect design, they shall be the focus of the discussion here. 
In source-synchronous links (Figure 2.2 (a)), the TX transmits its clock on a separate channel 
along with multiple data channels. The RX uses this forwarded clock as a frequency reference. 
However, at high data-rates, the inter-signal skew can be a significant percentage of the symbol 
interval and thus these links need to perform per-pin skew compensation [29] to ensure that data 
is optimally sampled. The timing recovery circuit receives the forwarded clock and performs 
jitter filtering and deskewing. Forwarded clock links are used in dense parallel links. Examples 
of such links include memory interfaces such as DDR3, and chip-to-chip interfaces such as 
HyperTransport and QuickPath. 
 
 
Figure 2.2: (a) Source synchronous (forwarded clock) link. (b) Plesiochronous (embedded 
clock) link. 
  
13 
In contrast, plesiochronous schemes, shown in Figure 2.2 (b) use independent clock sources 
in the TX and RX. The TX does not forward a clock and the RX performs its own clock recovery 
i.e., it uses the timing information embedded in the incoming data to position the sampling 
clock. It needs to track both the frequency and the phase of the incoming clock. The lower 
routing overhead makes plesiochronous links popular for communication between add-in cards 
and over server backplanes (e.g. PCI-Express [30]), which generally have to travel longer 
distances than the source-synchronous links described previously. 
 
2.3 Sub-rate Clocking 
At multi-Gb/s data rates, the high-frequency clocks required for a “full-rate” architecture 
consume large amounts of power and complicate the process of timing recovery. As a result, 
designers use sub-rate clocking schemes. These are essentially multiplexing/demultiplexing 
schemes, where the clock operates at some integer fraction of the data rate and the data is 
transmitted and/or received using multiple phases of a clock period. Although it is, in principle, 
possible to generate as many phases of the clock as desired and lower the clock rate arbitrarily, 
 
Figure 2.3: Block diagram of a quarter-rate receiver. 
 
 
  
14 
practical concerns typically limit link implementations to half and quarter-rates. Figure 2.3 
shows an example of a quarter-rate receiver. The timing recovery circuit generates the 
quadrature clock from a single phase clock reference (Rx Clock). With increasing data-rates, 
half-rate and quarter-rate clocking are becoming more prevalent, consequently reliable, low-
power quadrature phase generation has become a fundamental building block in high speed 
transceivers. 
 
2.4 Clock Jitter 
Jitter can be defined as “short-term variations of a signal with respect to its ideal position in 
time” (International Telecommunication Union [31]). As clock speeds and communication 
channels run at higher frequencies, the data UI becomes smaller and smaller. Thus I/O systems 
become more susceptible to deviation in a clock’s output transition from its ideal position. 
Excessive jitter can increase the bit error rate (BER) of a communications signal by incorrectly 
transmitting a data bit stream. Accurate understanding of jitter is necessary for ensuring the 
reliability of a system. The two major components of jitter are random jitter and deterministic 
jitter (Figure 2.4). 
 
 
Figure 2.4: Components of jitter. 
 
  
15 
2.4.1 Random Jitter 
Random jitter (RJ) is timing noise that cannot be predicted because it has no discernible pattern. 
The random component in jitter is due to the noise inherent in electrical circuits and typically 
exhibits a Gaussian distribution. This noise interacts with the slew rate of signals to produce 
timing errors at the switching points causing the random jitter. RJ is Gaussian because it results 
from the composite effects of many uncorrelated noise sources (central limit theorem). Because 
of its Gaussian distribution, its instantaneous noise value is mathematically unbounded and so 
it is characterized by its standard deviation (RMS) value.  
2.4.2 Deterministic Jitter 
Deterministic jitter (DJ) is timing jitter that is repeatable and predictable. It is not intrinsic or 
random and has a specific source. It is often periodic and narrowband. Sources of DJ are 
generally related to imperfections in the behavior of a device or transmission media but may also 
be due to power supply noise, cross-talk, or signal modulation. DJ can be further sub classified 
into periodic jitter and data-dependent jitter. The example of an interfering noise coming from a 
switching power supply is periodic because the noise will have the same frequency as the 
switching power supply. In contrast, an example of data-dependent jitter is intersymbol 
interference (ISI) caused by an isochronous 8B/10B [32] coded serial data stream. Both types of 
DJ are linearly additive and always have a specific source i.e. they are correlated to (or caused 
by) something. This jitter component has a non-Gaussian probability density function and is 
always bounded in amplitude. DJ is characterized by its bounded, peak-to-peak, value. 
 
2.5 Types of Jitter 
There are different types of jitter, based on the techniques used for measuring it. They are 
described below. 
2.5.1 Period Jitter 
Period jitter is the deviation in cycle time of a clock signal with respect to the ideal period over 
a number of cycles. Figure 2.5 shows period jitter measurements P1, P2, and P3 they simply 
  
16 
measure the period of each clock cycle in the waveform. From these measurements the average 
clock period as well as the standard deviation and the peak-to-peak value can be calculated. The 
standard deviation and the peak-to-peak value are frequently referred to as the RMS value and 
the peak-to-peak period jitter, respectively. Period jitter is mostly used in digital systems for 
calculating timing margins. 
2.5.2 Cycle-to-Cycle Jitter 
Cycle-to-cycle jitter is the difference in a clock’s period from one cycle to the next. It is indicated 
by C1 and C2 in Figure 2.5. It measures how much the clock period changes between any two 
adjacent cycles. Thus, the cycle-to-cycle jitter can be found by applying a first-order difference 
operation to the period jitter. Cycle-to-cycle jitter is typically reported as a peak value which 
defines the maximum deviation between the rising edges of any two consecutive clocks. The 
cycle-to-cycle jitter measurement is used to determine high frequency jitter in applications as it 
measures the jitter between two adjacent clock cycles. It is expressed as an RMS (standard 
deviation) value as well.  
It is interesting to note that no knowledge of the ideal edge locations of the reference clock is 
required in order to calculate either the period jitter or the cycle-to-cycle jitter. 
2.5.3 Time Interval Error (TIE) 
The time interval error (TIE) measures how far each active edge of the clock varies from its ideal 
position. The TIE is shown in Figure 2.5 by the measurements T1 through T4. For this 
measurement to be performed, the ideal edges must be known or estimated. As shown in Figure 
2.6, TIE may also be obtained by integrating the period jitter, after first subtracting the nominal 
(ideal) clock period from each measured period. TIE is important because it shows the 
cumulative effect that even a small amount of period jitter can have over time. TIE 
measurements are especially useful when examining the behavior of transmitted data streams, 
where the reference clock is typically recovered from the data signal using a Clock/Data 
Recovery (CDR) circuit. A large TIE value shows that the CDR circuit is not able to properly 
track the variation in the incoming data stream. TIE is expressed as an RMS which measures 
the standard deviation of the timing errors, and peak-to-peak, which measures the difference of 
the minimum and maximum timing errors. 
 
  
17 
 
 
Figure 2.6: Relationship between period, cycle-to-cycle, and TIE jitter. 
 
 
Figure 2.5: Different types of jitter measurements. 
  
18 
2.5.4 Phase Noise (Integrated RMS Jitter) 
Phase noise is measured in the frequency domain, and is a ratio of signal power to noise power 
normalized to a 1Hz bandwidth at a given offset from the carrier signal. Integrated RMS jitter is 
measured by integrating the phase noise across specified frequency offsets from the carrier signal. 
It measures the amount of energy present in the specified frequency offsets from the carrier signal 
(fc) compared to the energy of the carrier signal by integrating the area under the phase noise 
plot. It is expressed in seconds. Figure 2.7 shows a phase noise plot for a carrier signal at fc and 
the shaded region between f1 and f2 represents the integrated RMS jitter. Mathematically it is 
defined as  
 
RMS Integ. Jitter =
√2 ∫ 10
𝑃𝑁(𝑓)
10 𝑑𝑓
𝑓2
𝑓1
2𝜋𝑓𝑐
 
(2.2) 
 
Integrated RMS jitter proves very useful in I/O design as it can be used to precisely show the 
effects of jitter addition or jitter filtering by the transmitter or receiver on the reference clock. 
Different I/O protocols use different frequency offsets to make integrated RMS jitter 
measurements. As an example, SONET (Synchronous Optical Networking) [33] uses a 
frequency offset of 12 kHz to 20 MHz from the carrier signal in order to integrate the area under 
the phase noise plot and measure phase jitter. Fiber Channel [34] uses a frequency offset of 637 
kHz to 10 MHz from the carrier signal in order to integrate the area under the phase noise plot 
and measure phase jitter. 
 
 
Figure 2.7: Phase noise plot and integrated jitter measurement. 
  
19 
2.6 Injection Locking Background 
In the multi-gigahertz frequency range, conventional clocking techniques have encountered 
several design challenges in terms of power consumption, skew and jitter. Injection-locking is a 
promising technique to address these design challenges for gigahertz clocking. We describe the 
fundamentals of injection locking dynamics in order to develop a framework for discussion in  
later chapters. 
Oscillator injection locking is a well known and deeply studied phenomenon. 17th century 
Dutch scientist Christiaan Huygens, noticed that the pendulums of two clocks on the wall moved 
in unison if the clocks were hung close to each other [35]. He postulated that the coupling of the 
mechanical vibrations through the wall drove the clocks into synchronization. It has also been 
observed that humans left in isolated bunkers reveal a “free-running” sleep-wake period of about 
25 hours [36] but, when brought back to nature, they are injection-locked to the Earth’s cycle. 
This phenomenon also occurs in many other biological systems, such as the synchronized 
flashing of fireflies, the singing of certain crickets, and heartbeat patterns linked to breathing 
speed. The technique of injection locking has recently gained substantial attention in CMOS 
communication circuits. Recent applications include quadrature voltage-controlled oscillators 
(VCOs) [26], frequency dividers  [37], frequency multipliers  [38], clock recovery  [39], and jitter 
filtering and phase deskew  [24]. 
 
 
 
Figure 2.8: Injection locked oscillator. 
  
20 
When an external signal (ωinj) is applied to an oscillator (ωo), then under the right conditions 
the latter ceases to be an autonomous circuit and synchronizes to the external signal with a 
constant phase delay (θ) (Figure 2.8). The conditions under which this happens have been 
investigated by Adler [40]. Mathematically, the injection locking process can be described by: 
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑠𝑖𝑛(𝜃) 
(2.3) 
 
Here ωL is called the locking range. For LC oscillators ωL can be shown to be [41] equal to   
 
 𝜔𝐿 =
𝜔𝑜
2𝑄
× 𝑘 (2.4) 
 
In (2.4) Q is the quality factor of the LC tank and ωo is the natural frequency of oscillation. 
K is the relative injection strength (Iinj/Iosc) (Figure 2.8).   For an n stage ring oscillator ωL can be 
shown to be [29] equal to   
 𝜔𝐿 =
𝜔𝑜
(
𝑛
2) sin
𝜋
(
𝑛
2)
× 𝑘 
(2.5) 
 
We can analyze (2.3) by its vector fields (Figure 2.9). When ωl < (ωo-ωinj) there are no fixed 
points hence no stable solutions exist. When ωl > (ωo-ωinj) there are two fixed points (A and B). 
Of the two fixed points, the stable point is when θ is less than π/2 (A) and the other point is 
unstable in which θ greater than π/2 (B). 
 
Figure 2.9: Vector field for (2.3). 
  
21 
 
Within the lock range, the steady state output frequency will always track the injected 
frequency and the phase difference between the injected and ILO output becomes constant. 
 
 𝜃 = sin−1 (
𝜔𝑜 − 𝜔𝑖𝑛𝑗
𝜔𝐿
) (2.6) 
 
As (2.6) suggests, for small frequency offsets the phase shift is approximately linear with 
respect to (ωo-ωinj). This property is utilized for ILO-based clock phase shifting or deskewing. 
The transient phase response of the ILO can be obtained by integrating (2.3) with respect to 
time:  
 
 
𝜃 = 2 tan−1 [
𝜔𝐿
𝜔𝑜 − 𝜔𝑖𝑛𝑗
−  
𝜔𝑏
𝜔𝑜 − 𝜔𝑖𝑛𝑗
tanh (
𝜔𝑏𝑡
2
)] (2.7) 
 
where 
 
𝜔𝑏 = √𝜔𝐿
2 − (𝜔𝑜 − 𝜔𝑖𝑛𝑗)2 
(2.8) 
 
(2.7) although accurate, gives limited intuition. To gain more insight we linearize (2.3) 
around the stable point θo. From (2.6) we have sin(θo)=(ωo-ωinj)/ωl. We replace θ with θo+θn 
given θn << θo. Here θn is the time varying component. Thus (2.3) becomes 
 
 𝑑(𝜃𝑜 + 𝜃𝑛)
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿sin(𝜃𝑜 + 𝜃𝑛) 
(2.9) 
 
Noticing that the derivative of θo is 0, (2.9) can be further simplified to: 
 
 𝑑𝜃𝑛
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑠𝑖𝑛(𝜃𝑜)cos (𝜃𝑛) − 𝜔𝐿𝑠𝑖𝑛(𝜃𝑛)cos (𝜃𝑜) 
(2.10) 
 
As θ n is small we set cos(θn) =1 and sin(θn)= θn in (2.10): 
 
 𝑑𝜃𝑛
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑠𝑖𝑛(𝜃𝑜) − 𝜔𝐿𝜃𝑛cos (𝜃𝑜) 
(2.11) 
 
  
22 
Replacing sin(θo)=(ωo-ωinj)/ωl we have 
 
 𝑑𝜃𝑛
𝑑𝑡
= −𝜔𝐿𝜃𝑛cos (𝜃𝑜) 
(2.12) 
 
Using (2.6) and (2.8) we can show that 
 𝑑𝜃𝑛
𝑑𝑡
= −𝜔𝑏𝜃𝑛 
(2.13) 
 
(2.13) is a first order response. Thus, ILOs are functionally equivalent to a first order PLL 
[37] where input phase noise is low pass filtered. Therefore in the frequency domain we can 
write this relationship as 
 
𝐽𝑇𝐹𝑖𝑛 =
𝐽𝑖𝑡𝑡𝑒𝑟(𝑠)𝑖𝑛𝑝𝑢𝑡
𝐽𝑖𝑡𝑡𝑒𝑟(𝑠)𝑜𝑢𝑡𝑝𝑢𝑡
=
1
1 +
𝑠
𝜔𝑏
 (2.14) 
 
In a similar manner to a PLL, corresponding VCO noise is high pass filtered: 
 
 
𝐽𝑇𝐹𝑉𝐶𝑂 =
𝐽𝑖𝑡𝑡𝑒𝑟(𝑠)𝑣𝑐𝑜
𝐽𝑖𝑡𝑡𝑒𝑟(𝑠)𝑜𝑢𝑡𝑝𝑢𝑡
=
𝑠
𝜔𝑏
1 +
𝑠
𝜔𝑏
 (2.15) 
 
In totality, if Sinj is the phase noise of the injected signal and SVCO is the phase noise of the 
VCO, then the phase noise of the locked output Sout (assuming Sinj and SVCO are uncorrelated) 
can be given as 
 𝑆𝑜𝑢𝑡 = |𝐽𝑇𝐹𝑖𝑛|
2𝑆𝑖𝑛𝑗 + |𝐽𝑇𝐹𝑉𝐶𝑂|
2𝑆𝑉𝐶𝑂 (2.16) 
 
   Figure 2.10 shows the typical phase noise (Sout) of the locked output for given Sinj and SVCO. 
For frequencies below the JTFin (2.14) bandwidth (typically very high: hundreds of MHz [17]) 
Sout follows the phase noise of the reference (Sinj), and for frequencies beyond the JTFin bandwidth 
it follows the phase noise of the VCO (SVCO). This proves useful for forwarded clock parallel links 
where the presence of a low phase noise clock reference allows for ‘clean’ clock generation by 
injection locked based timing recovery. 
 
  
23 
 
Figure 2.10: Phase noise of the injected output as a function of the phase noise of VCO 
and input signals. 
 
 The first order nature of injection locking proves very useful as it ensures no peaking and 
guarantees stability. This makes the design of injection locking based circuits very simple 
compared to design of a PLL. However, injection locking is inherently a narrowband process. 
The locking range (ωL) is typically very small. (2.4) and (2.5) may suggest that ωL can be 
increased indefinitely by simply increasing the injection strength (k), but it should be noted that 
(2.4) and (2.5) are accurate for weak injection (k<<1), at higher injection strengths the 
relationship between ωL and k becomes much weaker [41]. Hence, even with strong injection, 
ring and LC oscillators typically have a maximum locking range of 10% [23] [24]. This makes 
injection locking less suitable for wideband application. In addition this also makes system prone 
to (process, voltage and temperature) PVT variations. 
In this dissertation we propose two architectures that tackle this issue. The two techniques 
relate to two kinds of oscillator common in today’s CMOS designs; LC oscillators and ring 
oscillators. 
2.7 VCSEL based Optical Transmitter 
The rapid scaling of CMOS technology continues to increase the processing power of 
microprocessors and the storage volume of memories. This increases the need for high 
  
24 
bandwidth interconnection between chips, which can be achieved by employing large numbers 
of inputs and outputs (IOs) per chip as well as high data rates per IO. As microprocessor system 
interface data rates have grown, the electrical channel has started to hamper performance. To 
alleviate this bottleneck, microprocessor interfaces have adopted advanced equalization 
techniques such as linear equalization, DFE, and optimized interconnect topologies. The power 
and area overhead associated with equalization make it difficult to achieve target bandwidth 
with a realistic power budget.  A promising solution to the I/O bandwidth problem is the use of 
optical inter-chip communication links. This section gives an overview of the key optical link 
component, namely, the optical transmitter.  
Multi-Gb/s optical links exclusively use coherent laser light due to its low divergence and 
narrow wavelength range. Modulation of this laser light is possible by directly modulating the 
laser intensity through changing the laser’s electrical drive current. A popular coherent laser light 
source used in optical transmitters is the vertical-cavity surface-emitting laser (VCSEL). 
 
 A VCSEL is a semiconductor laser diode which emits light perpendicular to its top surface 
(Figure 2.11). VCSELs have important practical advantages compared with edge-emitting 
semiconductor lasers. They can be tested and characterized directly after growth, i.e. before the 
wafer is cleaved. Furthermore, it is possible to combine a VCSEL wafer with an array of optical 
 
Figure 2.11: (a) Cross-section of a VCSEL. (b) Die micrograph of a VCSEL. 
 
  
25 
elements (like collimator lenses) and then dice the composite wafer instead of mounting the 
optical elements individually for each VCSEL. This allows for low cost mass production of laser 
products. The most common emission wavelengths of VCSELs are in the range of 750-980nm 
[42] [43], as obtained with the GaAs/AlGaAs material system. While VCSELs appear to be an 
ideal source due to their ability to both generate and modulate light, they also suffersfrom some 
serious bandwidth limitations. 
As shown in Figure 2.12, a VCSEL emits optical power that’s a linear function of the current 
flowing through the device once a threshold current, Ith, is reached and stimulated emission, or 
lasing, occurs. As the threshold current magnitude is a function of the active area current density, 
it is often reduced by confining the current with an oxide aperture. Typical values of Ith vary from 
0.5mA to 1mA [44]. Once the VCSEL begins lasing, the optical output power is related to the 
input current by the slope efficiency η  (typically 0.3-0.5mW/mA), and a high contrast ratio 
between a logic “one” signal and a logic “zero” signal can be achieved by placing the “zero” 
current value near threshold. While a low “zero” level current allows for high contrast, a speed 
limitation does exist due to the VCSEL bandwidth being a function of the device current. 
 
 
Figure 2.12: VCSEL L-I curve. 
 
VCSEL has inherent bandwidth limitations. Its bandwidth is limited by a combination of 
electrical parasitics and the electron-photon interaction described by a set of second-order rate 
equations. Figure 2.13 shows the small-signal ac response of the VCSEL for different bias 
currents [45]. The modulation characteristics varies as the bias current changes. This dependence 
  
26 
of the VCSEL bandwidth on its bias current makes its modulation response highly non-linear. 
This is markedly different from the response of an electrical channel which is linear. The details 
of VCSEL response modelling and non-linearity will be discussed in Chapter 6.   
 
 
Figure 2.13: VCSEL bandwidth limitations. 
 
Current-mode drivers are typically used to modulate VCSELs due to the direct relationship 
between drive current and optical output power (Figure 2.14). A typical VCSEL output driver is 
shown in Figure 2.14, with a differential stage steering current between the optical device and a 
dummy load, and an additional static current source used to bias the VCSEL sufficiently above 
the threshold current in order to ensure adequate bandwidth. Often the output stage uses a 
separate higher voltage supply due to typical VCSEL diode knee voltages (typically 1.7V) 
exceeding normal CMOS supplies. As data rates scale, designers have begun to implement 
transmitter equalization circuitry to compensate for VCSEL bandwidth constraints. A VCSEL 
equalization technique that takes into account the inherent non-linearity in its high speed 
response, will be introduced in Chapter 6. 
 
  
27 
 
Figure 2.14: Current-mode VCSEL driver. 
  
  
28 
Chapter 3: Wideband Injection 
Locking Scheme and Quadrature 
Phase Generation in LC Oscillator 
Injection-locked-oscillators (ILOs) have been used in many wireline receivers because of their 
simple implementation and instantaneous locking characteristics. However, their application is 
hindered by their limited locking range compared with alternative techniques such as phase-
locked-loops (PLLs). Recent standards [46] require operation with data rates that span more 
than 10% of the nominal frequency. Therefore transceivers must operate reliably over this range. 
A large locking range is also desirable to counter the inevitable PVT variations in modern scaled 
technologies. 
The injection range of an LC ILO is inversely proportional to Q of the tank [41]. To this 
reason low-Q tanks have been used [24] to increase the locking range in an LC ILO, but this 
comes at the expense of higher power consumption, as shown in Figure 3.1. Intricate frequency-
tracking mechanisms such as reference PLL have also been used to set the oscillator’s natural 
frequency so that it is within the injection range of the reference clock [39]. This adds additional 
design complexity and an area/power penalty to the otherwise simple circuit, thus offsetting the 
merits of injection-locked based system.  
Another important requirement of wireline receivers that employ half-rate and quarter-rate 
architectures is the generation of accurate quadrature phases.   Injection-locked LC dividers have 
been frequently used for generating quadrature phases [25]. But they require complementary 
clocks at twice the desired frequency, which tends to be power inefficient. Quadrature phase 
generation from a single phase of clock without any frequency division is highly desirable for 
half-rate and quarter-rate CDR architectures.   
   We propose a method for wideband injection locking in an LC oscillator that maintains the 
simplicity of an injection locked system. We also describe an extension of this method to produce 
  
29 
quadrature phases from a single reference clock without any frequency division. The system has 
a wide jitter tracking bandwidth, which makes it useful for forwarded clock receivers [17]. 
 
 
Figure 3.1: (a) LC oscillator with injection. (b) Variation of locking range with Q for a 
constant injection strength of 0.1. (c) Variation of power consumption with Q for a 
constant oscillation amplitude of 600 mV. (d) Improvement in locking range vs. power 
consumption for a constant injection strength and oscillation amplitude (Simulation). 
 
  
30 
This chapter is organized as follows. Section I describes the system architecture. Section II 
presents a mathematical analysis describing the dynamics of the system. Measurement results 
are presented in Section III. Finally, Section IV summarizes the work and presents the 
conclusions. 
 
3.1 System Architecture 
Figure 3.2(a) shows the simplified block diagram of the proposed system. It consists of three 
basic elements, namely VCO, mixer and buffer. The buffered VCO output is mixed with the 
input reference and the resultant signal is fed back to the VCO to complete the feedback 
architecture.  
 
 
Figure 3.2: Block diagram of (a) proposed system and (b) Injection locked phase locked 
loop (ILPLL). 
 
  
31 
3.1.1 Comparison with ILPLL 
In the locked state, an ILO can be modeled as a first order PLL [37]. A first-order PLL comprises 
of a VCO, a mixer and a low pass filter. In this work we propose to eliminate the loop filter 
altogether. The resultant high frequency component of the mixer is used to perform injection 
locking. This is different from an ILPLL structure (Figure 3.2(b)), which consists of a full PLL 
with additional injection in the VCO to improve its phase noise characteristics. Additionally, 
unlike the ILPLL, both IL and PLL actions are performed at the same node using common 
mode injection in the varactors.  
 
 
 
 
Figure 3.3: Schematic of the proposed system. The input to the common mode of the 
varactors contains 2f and DC components. The DC component brings the natural 
frequency close to the frequency of the reference clock and the 2f component does the 
injection lock. 
  
32 
3.1.2 Common Mode Injection   
In most LC oscillators, the control voltage of the varactor is used to set the frequency of 
oscillation, fo. In such architectures the instantaneous voltage oscillation at the output node 
results in transient changes in the capacitance (Figure 3.3). Due to this effect, the voltage of the 
common-node A has an extra frequency component at 2fo [47]. Similarly, if we inject a 2fo 
component at the varactors’ common node, then the mixing action of the varactors will inject a 
current at fo into the tank. However, such a circuit will constitute a frequency divider, which is 
not desirable in many applications. We will describe the basic principles of the proposed 
architecture that avoids such a division and provides a very wide locking range. 
3.1.3 Implementation Details 
Figure 3.3 shows the basic schematic of the proposed wideband injection locking system. A 
complementary transmission gate is used as a single balanced passive mixer. The output of the 
LC oscillator is buffered by the CML to CMOS stage. The transmission gate is driven by the 
outputs of the buffer and the reference clock is used as the input. The output of the transmission 
gate is directly fed to the varactors in the LC oscillator, thereby completing the loop.  
3.1.4 System Analysis in Locked State 
In the locked state the output of the transmission gate contains a high frequency 2f component 
and a DC component. The value of the DC component is determined by the phase difference 
between the reference and buffer output (α ) and is proportional to cos(α ) (Figure 3.4(d)). The 
phase difference between the oscillator output and the injected clock (θ ) is given by [41]: 
 
 sin(𝜃) = (𝜔𝑜 − 𝜔𝑖𝑛𝑗)/𝜔𝐿 (3.1) 
 
Assuming a constant delay, Δ o, through the CML to CMOS buffer, the phase difference 
between the clock and buffer output α  is given by 
 
 𝛼 = 𝜃 + 𝛥𝑜 × (2𝜋𝑓𝑖𝑛𝑗) (3.2) 
 
Therefore the DC component of the switch output is dependent on θ . In the unlocked state, 
the DC component brings the fo close to finj (PLL action) and the 2finj component performs the 
  
33 
injection lock. Thus the phase difference θ  becomes dependent on the reference frequency, 
which enables wideband locking. Figure 3.4(e) shows the simulated varactor control voltages 
under locked conditions for two frequencies (14 GHz and 16.5 GHz). The DC levels are different 
and are overridden by the corresponding 2finj components.  
Figure 3.4(a) shows the simulated oscillator output phase difference (θ ) versus input 
frequency. θ  is smaller at lower frequencies and it increases as frequency increases. This is in 
 
 
Figure 3.4: Simulation results, (a) θ  vs. ref. frequency, (b) α  vs. ref. frequency, (c) fo – finj 
vs. ref. frequency, (d) DC characteristic of the transmission gate, (e) Vctrl at 14 GHz and 
16.5 GHz clock reference. 
 
  
34 
accordance with the DC characteristic of the transmission gate (Figure 3.4(d)) and phase 
difference between the CML to CMOS output and the reference clock (α ). The fact that CML 
to CMOS buffers add a constant delay across all frequencies helps increase the injection range 
as it amplifies the phase shift when frequency increases (2). This helps the switch output to cover 
the entire voltage range (0-Vdd), as shown in Figure 3.4(d). 
It is important to clarify that the proposed work achieves wider locking range due to the PLL 
like loop which brings the center frequency of the oscillator within the injection range 
automatically. The inherent properties of a VCO only system like injection range and jitter 
tracking bandwidth remain intact and are still a function of the Q of the oscillator. Our unique 
methodology alleviates the need to use a loop filter so that the system can have a high jitter 
tracking bandwidth. 
3.1.5 Quadrature Phase Generation 
For quadrature phase generation a secondary matched LC oscillator is coupled to the primary 
in a QVCO configuration. Figure 3.5 shows the schematic of quadrature phase generation 
circuit. Anti-phase coupling is achieved using PMOS differential pairs. The strength of the 
coupling is controlled by varying the tail current of the PMOS differential pair. A coupling factor 
of above 25% was used to provide sufficient oscillation reliability [48]. 
The control voltage of the secondary is generated from the output of the transmission gate 
after sending it through a passive low pass filter, consisting of two RC sections in series  with 
R=1 kΩ and C=80 fF. The passive filter is chosen to reduce power consumption and values of 
RC are chosen to have a 3dB bandwidth of 1 GHz, which provides more than 50dB of 
attenuation to the 2f component and allows the DC component to pass through. This has two 
effects. Firstly it allows both oscillators to have the same fo and secondly it ensures that there is 
no coupling between them through the varactors’ common mode. A two stage RC section is 
chosen for more efficient isolation as it provides sharper (-40dB/dec) attenuation without 
slowing down the feedback loop, as the 3dB bandwidth doesn’t need to be too small. This 
isolation is important for generating accurate quadrature phases as it ensures that coupling 
between primary and secondary oscillators is solely anti-phase through the PMOS differential 
pairs and there is no in-phase coupling through the varactors. If not attenuated, in-phase 
coupling would force the phases of the oscillators to be aligned. 
  
35 
ILOs have been frequently used for clock de-skewing applications [24]. (3.1) Suggests that 
the phase of the output clock can be varied by changing the fo of the oscillator. In our architecture 
the phase of the replica oscillator can be adjusted by changing the bias of secondary varactors 
VarA and VarB, which are chosen to be more than seven times smaller than the main varactors 
(Figure 3.5). Secondary varactors are controlled externally and are not a part of the loop. Thus 
sizing of the secondary varactors present a trade-off between de-skew range and locking range. 
Sizes of the secondary varactors were kept much smaller than primary varactors so that locking 
range is minimally altered. To provide sufficient de-skew the control voltages of VarA and VarB 
were altered in opposite direction. This phase controllability is also imperative for clock receiver 
application, where exact quadrature phases may not be required due to polarized-mode 
dispersion effects [49]. 
 
 
 
Figure 3.5: Schematic of the proposed system for quadrature phase generation. 
 
  
36 
3.2 Mathematical Analysis 
In this section we propose a mathematical model of our system and derive the new effective 
locking range. To simplify the analysis we delink the IL and PLL aspects of our design. Figure 
3.6 shows both IL and PLL characteristics. Injection is modeled as an additive input. The output 
tracks the input (ωinj) except for a phase difference θ, which may be time varying. The PLL part 
of the system consists of a mixer with a gain of γ  and a constant delay of Δ o. Mixer has inputs 
from the reference clock and the delayed version of the LC oscillator output. The output of the 
mixer goes to the common mode of the varactors which through its mixing action converts it to 
equivalent injection at ωinj. 
 
 
 
Figure 3.6: System level block diagrams showing injection and PLL feedbacks. 
 
The injection locking dynamics for weak injection Vosc >> Vinj are governed by the famous 
Adler’s equation [40]: 
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿 sin(𝜃) 
(3.3) 
 
Here ωL is the locking range defined as 
  
37 
 
 
𝜔𝐿 =
𝜔𝑜
2𝑄
×
𝑉𝑖𝑛𝑗
𝑉𝑜𝑠𝑐
 (3.4) 
 
To take into account the PLL action we replace ωo by ωo+Kvco*Vctrl : 
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜+𝐾𝑣𝑐𝑜𝑉𝑐𝑡𝑟𝑙 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿 sin(𝜃) 
(3.5) 
 
where 
 
 𝑉𝑐𝑡𝑟𝑙 = 𝛾𝑉cos(𝛼) + 𝛾𝑉cos(2𝜔𝑖𝑛𝑗𝑡 + 𝛼) (3.6) 
 
However we have already taken the 2ωinj component into account in form of injection so we 
are left with  
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜 + 𝐾𝑣𝑐𝑜𝛾cos(𝜃 + 𝜔𝑖𝑛𝑗𝛥𝑜) − 𝜔𝑖𝑛𝑗 − 𝜔𝐿 sin(𝜃) 
(3.7) 
 
To make (3.7) comparable to Adler’s equation we modify it to have only a single sinusoid: 
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − [{𝜔𝐿 + 𝐾𝑣𝑐𝑜𝛾sin (𝜔𝑖𝑛𝑗𝛥𝑜)}sin(𝜃) −
𝐾𝑣𝑐𝑜𝛾cos(𝜃) cos(𝜔𝑖𝑛𝑗𝛥𝑜)]  
(3.8) 
 
where 
 𝐾𝑣𝑐𝑜𝛾 = 𝐾𝑣𝑐𝑜𝛾𝑉  and  𝛼 = 𝜃 + 𝜔𝑖𝑛𝑗𝛥𝑜 (3.9) 
 
We therefore have 
 
 𝑑𝜃
𝑑𝑡
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑛𝑒𝑤{sin(𝜃) cos(∅) − sin(∅) cos(𝜃)} 
= 𝜔𝑜 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑛𝑒𝑤(sin(𝜃 − ∅)) 
 
(3.10) 
Defining 
 
tan(∅) =
𝐾𝑣𝑐𝑜𝛾cos (𝜔𝑖𝑛𝑗𝛥𝑜)
𝜔𝐿 + 𝐾𝑣𝑐𝑜𝛾sin (𝜔𝑖𝑛𝑗𝛥𝑜)
 (3.11) 
 
  
38 
 
𝜔𝐿𝑛𝑒𝑤 = √𝐾𝑣𝑐𝑜𝛾
2 + 𝜔𝐿
2 + 2𝜔𝐿𝐾𝑣𝑐𝑜𝛾sin (𝜔𝑖𝑛𝑗𝛥𝑜) 
(3.12) 
 
In locked state 𝑑𝜃/𝑑𝑡 = 0 , so for a real solution, 
 
 |
𝜔𝑜 − 𝜔𝑖𝑛𝑗
𝜔𝐿𝑛𝑒𝑤
| = |sin (𝜃 − ∅)| ≤ 1 (3.13) 
 
 |𝜔𝑜 − 𝜔𝑖𝑛𝑗| ≤ |𝜔𝐿𝑛𝑒𝑤|  (3.14) 
 
Thus the new effective locking range is ωLnew. It can be inferred from (3.11) that for all values 
of Δ o, such that 
 𝛥𝑜 <
𝜋
𝜔𝑖𝑛𝑗
 (3.15) 
 
ωLnew will be greater than ωL, and hence the improvement in locking range.  For a maximum 
reference frequency of 18 GHz, the upper limit of Δ o is 27.7 ps.  
 
 
Figure 3.7: New locking range fLnew and regular locking range fL. (b) Transient solutions to 
proposed system (3.7) and regular ILO (3.3). 
 
Figure 3.7(a) shows a plot of the new locking range fLnew and the regular locking range fL based 
on (3.12) and (3.4) respectively. It predicts an average new locking range of 1.8GHz which is a 
9 fold improvement over that of a regular injection locked LC oscillator. To further examine the 
system, a simulink based behavioral model was designed. Using the same, transient solutions to 
(3.3) and (3.7) were calculated for the case where the oscillator natural frequency (fo) is 13GHz 
and injected frequency (finj) is 14.8GHz.  Figure 3.7(b) clearly shows that our proposed system 
  
39 
locks to the injected frequency because of its extended locking range whereas the regular ILO 
fails to do so as the injected frequency is well beyond its locking range. 
Spectre based simulations reveal a single sided locking range (fLnew) of 1.7GHz, 1.8GHz and 
2.1GHz for the reference frequencies 13GHz, 15GHz, and 17GHz respectively. Comparing the 
simulation results with the predictions of our mathematical model (Figure 3.7(a)) reveal a 
locking range mismatch of -0.1GHz, 0GHz, and 0.3GHz at 13GHz, 15GHz, and 17GHz 
respectively. Mismatch can be attributed to the fact that the simple mathematical model does 
not take into account the variation of parameters like Kvco and Q with frequency. 
Figure 3.8 shows the behavior of fLnew with variation in Δo. Initially fLnew increases as Δo 
increases but as Δo increases to 30ps, fLnew starts decreasing. This clearly shows that there is an 
optimum Δo for maximum locking range. We choose Δo to be 20ps, to maximize the locking 
range. 
 (3.10) suggests the dynamics of the proposed system are similar to those of the injection 
locked VCO only system, as described by Adler’s equation (3.3). Jitter tracking bandwidth of a 
simple ILO is proportional to its locking range (ωL), as derived in [24]: 
 
 
𝐵𝑊 =  𝜔𝐿
𝐾 + cos (𝜃)
(1 + 𝐾𝑐𝑜𝑠(𝜃))2
 (3.16) 
 
 
 
Figure 3.8: Variation of fLnew with Δ o. 
 
  
40 
where K is the injection strength. 
Thus the proposed system has a similar jitter transfer function to that of the usual ILO i.e. a 
first-order PLL [37]. However, due to its larger locking range (ωLnew), it has a higher tracking 
bandwidth than a conventional ILO for a given Q and injection strength (3.16). The jitter from 
the incident signal is filtered by the low-pass characteristic of the noise transfer function, and the 
output signal tracks the phase variations of the incident signal within the loop bandwidth. 
Measured results for jitter transfer show a first order behavior with -20dB/dec attenuation 
(Figure 3.12(a)).  
The phase of the oscillator is fixed for a given frequency as shown in Figure 3.4(a). However, 
the phase of the replica oscillator can be changed by controlling the bias of the secondary 
varactors VarA and VarB. The replica oscillator is not the part of the feedback loop hence the 
de-skew relationship is described by (1). This would suggest a total de-skew range of 180o. 
However, measured results show an average de-skew range of 140o (Figure 3.14). This is due to 
the size of the secondary varactors which are not large enough to change the natural frequency 
of the oscillator for a full 180o phase shift. 
 
 
 
Figure 3.9: Simulated frequency behavior of Q of the inductor. 
  
41 
3.3 Measurement Results 
A prototype has been designed and fabricated in 65nm CMOS technology, with a 1 V supply 
voltage. nMOS transistors in accumulation mode were used to implement the varactors with the 
control voltage applied to the drain/source. Spiral inductors of value 0.67 nH were designed to 
have simulated Q of over 14 in the frequency range of interest (Figure 3.9). They were 
constructed using thick, top two metal layers with added ground mesh for Q enhancement. The 
die micrograph (Figure 3.15) shows their octagonal structure each of size 110x110μm2. A high 
Q design was chosen to substantiate the efficacy of the proposed locking range extension 
technique as injection locking range is inversely proportional to Q in standard ILOs [41].  
The key ILO parameters based on design methodology and simulation results are described 
in Figure 3.7. 
3.3.1 Locking Range and RMS Jitter 
In our measurement setup (Figure 3.10(f)), an external signal generator is used to provide the 
reference clock used for injection. The frequency of the reference clock was varied and output 
 
Figure 3.10: (a)-(e) Measured locked output signals at several reference frequencies. (f) 
Setup for locking range and RMS jitter measurement. (g) Measured input and output j itter 
at different reference frequencies. 
 
  
42 
waveforms were observed on a sampling oscilloscope (Figure 3.10(a-e)). A locking range of 13.4 
GHz – 17.2 GHz was measured, which translates to 24.8% around the center frequency. The 
achieved locking range is limited by the varactor tuning range. The power consumption depends 
on the frequency of operation and varies between 8.5 mW and 9.5 mW going from low to high 
frequencies. For comparison, a previous design [24] uses a low-Q (2.5) inductor to achieve a 
maximum locking range of 12% with strong injection while consuming 13.1 mW for a single 
injection locked LC oscillator. 
The rms jitter of the reference and the output waveforms were also measured across several 
frequencies in the locking range and are plotted in Figure 3.10(g). A maximum RMS jitter 
addition of 0.15 ps is observed at 17 GHz, which is expected considering that the system output 
goes through several buffers to drive the output stage. 
 
 
Figure 3.11: (a) Measurement setup for generating PM signal reference. (b) Setup for 
measuring the spectrum of reference and output signals 
 
  
43 
3.3.2 Jitter Transfer Function 
The jitter transfer function was measured using the test setup shown in Figure 3.11. In this setup, 
a secondary clock (fjitter) was mixed with the primary clock (fo) to generate an amplitude 
modulated (AM) signal. This signal was transformed to a phase modulated (PM) signal by on-
chip CML-CMOS converters. The PM signal was used as the new reference clock. The 
secondary clock frequency (fjiiter) was varied from 10 MHz to 2 GHz for each fo and  the spectrum 
components of the output and the reference were measured at the carrier (fo) and sideband (fjitter) 
frequencies (Figure 3.11(b)) using a spectrum analyzer. 
 
 
Figure 3.12: (a) Measured jitter transfer function for 14 GHz, 15 GHz and 16 GHz 
reference frequencies. (b) Response to low frequency (10 MHz) and high frequency (1 
GHz) jitter. 
 
  
44 
 Measurements were made (Figure 3.12(a)) for three reference frequencies (14 GHz, 15 GHz, 
and 16 GHz), and an average jitter tracking bandwidth (JTB) of 400 MHz was recorded. High 
JTB helps in retaining the low frequency jitter while eliminating high frequency jitter as depicted 
in Figure 3.12(b). It is important to retain the low frequency jitter in forwarded clock receivers 
as low frequency jitter is correlated with the data [17]. 
 
3.3.3 Quadrature Accuracy and Deskew 
Quadrature phase accuracy was confirmed by measuring the phase difference between the 
outputs of the two oscillators after careful calibration of the measurement setup. A maximum 
offset of 2.8% (from 90o) is observed between the two phases at 15 GHz (Figure 3.13(a)).  Bias 
to VarA and VarB (Figure 3.5) were fixed while making quadrature accuracy measurements. 
They were then varied from 0-Vdd to measure the maximum phase shift of the replica oscillator 
(Figure 3.14). 
 
Figure 3.13: (a) Measured percentage quadrature phase error vs. reference frequency (b) 
Measured quadrature phase waveforms at 14 GHz (c) Measured quadrature phase 
waveforms at 15 GHz. 
 
  
45 
 Table 3.1 compares the performance of the proposed system with similar works. We achieve 
the best locking range compared to other injection locked systems and our high Q LC oscillator 
design allows us to achieve excellent jitter performance at a lower power consumption. 
 
 
 
This work 
[50] [51] 
[24] [25] [17] [52] 
Injection arch. PLL aided ILO ILO 
IL 
Divider 
MILO-
ILO 
PILO 
Oscillator arch. LC LC LC Ring LC 
Process 
technology 
65nm CMOS 
45nm 
CMOS 
90nm 
CMOS 
65nm 
CMOS 
130nm 
CMOS 
Injection range 
24.8% (13.4GHz 
- 17.2GHz) 
12% 
(12.6GHz - 
14.3GHz) 
18.1% __ __ 
RMS jitter 
0.82ps (at 13.5 
GHz) 
1.4ps (at 
13.5 GHz) 
__ 
1.4ps (at 
3.2 GHz) 
0.13ps (at 
3.2 GHz) 
Average jitter 
tracking BW 
400 MHz 
200 – 
700MHz 
__ 
25 – 
300MHz 
__ 
Active area 0.3 x 0.11mm2 0.15mm2 0.026mm2 0.03mm2 0.4mm2 
Supply voltage 1 V 1.1 V 1.2 V 1 V __ 
Average power 
consumption 
9 mW (LC 
oscillators 65 % 
and buffers 35 %) 
13.1 mW 
(for single 
LC osc.) 
6.4 mW 
6.8 mW 
(for entire 
Tx) 
28.6 mW 
(single LC 
oscillator) 
Average de-skew 140o 160o NA 400o __ 
Quadrature phase 
error 
2.8% from 90o at 
15GHz 
NA 90o ± 1.8o __ NA 
Table 3.1: Performance comparison for wideband injection locked LC oscillator. 
 
  
46 
 
Figure 3.14: Measured maximum phase shift of the replica oscillator at different reference 
frequencies. 
 
 
Figure 3.15: Die Micrograph. (A) shows the details of the high-Q inductor and (B) shows 
the placement of the varactors. 
 
3.4 Summary 
A new locking scheme for extended injection range in an LC oscillator was introduced and 
analyzed. The dynamics of the system were derived and the new locking range was proven to 
be better than that of a conventional ILO.  The technique breaks the existing tradeoff between 
  
47 
power consumption and locking range in LC oscillators. The system requires only a single clock 
phase for operation. Quadrature phase generation was demonstrated by adding a secondary 
coupled oscillator to the system. This wide locking range of the proposed system eliminates the 
need for center-frequency adjustment.  
Our work ensures that injection locking can be reliably used in a half-rate or quarter-rate 
forwarded clock I/O architecture with minimal power overhead and reduced clock jitter (higher 
Q). Our approach is scalable because as data rates increase it becomes easier to have high Q 
inductors on chip. 
 
  
  
48 
Chapter 4: Quadrature Locked Loop 
(QLL) 
The rise in the aggregate bandwidth of microprocessors has led to an insatiable demand for 
massively parallel low-power links with high data rates. This has imposed stringent requirements 
on on-chip clock generation and distribution. Ring oscillator (RO) based injection-locked (IL) 
clocking has been used in the past [53] to provide a low-power, low-area and low-jitter solution. 
ROs are easily integrated in standard CMOS process and have smaller on-chip area compared 
to LC tank based oscillators making them suitable for dense parallel links. Ring based injection-
locked oscillators (ILO) can also be used to generate quadrature phases from a reference clock 
[26] without frequency division, which is desirable for half-rate and quarter-rate CDR 
architectures. 
 However, ILO inherently has a small locking range [23] making it less suitable for wideband 
applications; for example the transceivers embedded in field-programmable gate arrays (FPGAs) 
[54]. In addition, drift in free running frequency due to process, voltage and temperature (PVT) 
variations may lead to poor jitter performance and locking failures [55]. Scaling worsens the 
situation as smaller feature size makes the ROs’ free running frequency more susceptible to PVT 
variations. This fact is exemplified in Figure 4.1. It shows percentage change in natural 
frequency of a simple five stage ring oscillator with change in supply voltage for 28nm and 65nm 
technologies. The variation in 28nm can be about 20% for a 100mV change in VDD. Figure 4.2 
shows a simulated histogram of the change of a ring oscillator’s fo with process variation in 28nm 
technology. A 3σ variation of 0.95GHz is observed around an oscillation frequency of 10GHz.  
For robust performance the locking range should be several times bigger than the variation in 
natural frequency but maximum locking range in ring based ILOs is only about 10% [23]. 
   Adding a PLL to an ILO provides frequency tracking. However, PLL aided techniques 
have second order characteristics that lead to jitter peaking. They also add design complexity 
and power consumption [56]. A simple frequency-locked-loop (FLL) is not sufficient to 
compensate for the drift as the output of an injection-locked oscillator is always fixed at the 
desired frequency, and FLL only comes to action after system loses lock [55]. 
  
49 
 We present a novel frequency tracking method that exploits the dynamics of the injection 
locking process in a quadrature ring oscillator to increase the effective locking range. We also 
show that the resultant system is still a first order system, unlike an injection locked phase locked 
loop (IL PLL). Additionally, this system is used to generate accurate quadrature clock phases 
for a four channel quarter-rate optical receiver.  
Generating quadrature phases at low area and power overhead from a reference clock is 
desirable for quarter-rate forwarded clock architectures. Both ring and LC based dividers have 
 
Figure 4.1: Simulated variation in oscillation frequency of a ring oscillator with change in 
supply voltage for 28nm and 65nm technologies.  
 
 
Figure 4.2: Histogram of change in fo in a ring oscillator with process variation. 
 
 
  
50 
been frequently used for quadrature phase generation. However, because they operate at twice 
the desired frequency they tend to be power inefficient. Quadrature phase generation through 
ring ILO’s without frequency division leads to phase inaccuracies [23] (Figure 4.3). Previous 
works have tried to solve this issue with multiphase injection with RC-CR filters (Figure 4.4). 
This results in significant additional power consumption in the buffers driving the passive filter. 
Also poly-phase filters limit the locking range and only work with pure sin signals [26]. We 
propose a power efficient approach to accurate quadrature phase generation without frequency 
division.  
This chapter is organized as follows: Section I describes the system architecture. Section II 
presents a mathematical and behavioral analysis describing the dynamics of the system. Circuit 
implementation and clocking for four channel quarter-rate optical receiver are discussed in 
Sections III and IV respectively. Hardware measurement results are presented in Section V. 
Finally, Section VI summarizes the work and presents the conclusions. 
 
Figure 4.3: Phase error in a ring oscillator due to injection. 
 
 
Figure 4.4: Multi-phase injection in a ring oscillator. 
 
  
51 
4.1 Proposed Approach 
We propose a novel frequency tracking method that exploits the dynamics of injection locking 
in a quadrature ring oscillator to increase the effective locking range and produce accurate 
quadrature phases. When a ring oscillator with natural frequency fo is injected with an external 
signal with frequency finj, the outputs of the ring oscillator incur a phase mismatch error if fo is 
not equal to finj [23].  We prove that the mean of this error, i.e., mean quadrature phase error 
(MQPE), contains information about the difference between the natural frequency of the 
oscillator and injected frequency (i.e. |finj − fo|) in both locked and unlocked states. A phase 
detector and a low pass filter is used to measure the MQPE. Their output is used in a negative 
feedback configuration to set the natural frequency of the ring oscillator there by nullifying the 
|finj − fo| and quadrature phase error. This loop provides frequency tracking, thereby assuring 
wideband injection. We call this technique a quadrature locked loop, or QLL in short (Figure 
4.9). 
 In this section we derive an expression for the MQPE. To do so we first quantify the phase 
error caused due to injection. Figure 4.5 shows a two stage differential ring oscillator with a 
natural frequency of fo; thus both delay stages have an inherent delay of 1/4fo.  One of the delay 
stages (A) is injected with a signal at finj. Injection causes the delay of stage A to change to 1/4fo 
+ Δ  and the oscillator oscillates at a frequency f (not necessarily a constant) instead of fo. The 
delay of the other stage (B) stays the same, thus 
 
 
𝐷𝑒𝑙𝑎𝑦𝐼𝑄(𝑡) =
1
4𝑓𝑜
 (4.1) 
 
But as the frequency of oscillation is f, phase delay can be expressed as 
 
 
𝐷𝑒𝑙𝑎𝑦𝐼𝑄(𝜃) =
1
4𝑓𝑜
× 2𝜋𝑓 =
𝜋
2
×
𝑓
𝑓
𝑜
 (4.2) 
 
Now from (4.2) we can calculate the quadrature error as  
 
 
𝑄𝑢𝑎𝑑. 𝐸𝑟𝑟𝑜𝑟 = 𝐷𝑒𝑙𝑎𝑦𝐼𝑄(𝜃) −
𝜋
2
=
𝜋
2
(
𝑓
𝑓𝑜
− 1) =
𝜋
2
(
𝜔
𝜔𝑜
− 1) (4.3) 
  
52 
 
With this result (4.3) we can move ahead to calculating the MQPE. We do so by separately 
analyzing the locked and unlocked cases. 
In the locked state f(t)= finj (a constant), hence 
 
 
𝑀𝑄𝑃𝐸 =
𝜋
2
(
𝑓𝑖𝑛𝑗
𝑓𝑜
− 1) =
𝜋
2
(
𝜔𝑖𝑛𝑗
𝜔𝑜
− 1) (4.4) 
 
To calculate the variation of quadrature phase error in the unlocked state, we need to 
calculate the variation of instantaneous frequency of the oscillator in the unlocked state. Given 
that ω= ωinj + dθ /dt. This is calculated easily by differentiating (2.7).  
 
where 
 
(4.5) shows that in the unlocked state the instantaneous frequency (ω) beats with a frequency 
ωb. Thus, as suggested by (4.3), the quadrature phase error also varies beats with frequency ωb 
as shown in Figure 4.6. This periodicity allows us to calculate the MQPE in the unlocked state 
by integrating (4.3) from 0 to 2π/ωb. 
 
Figure 4.5: Deriving the quadrature phase error expression in a two stage ring oscillator 
 
 
𝜔 = 𝜔𝑖𝑛𝑗 +
𝜔𝑏
2
𝜔𝑜 − 𝜔𝑖𝑛𝑗
×
𝑠𝑒𝑐2 (
𝜔𝑏𝑡
2 )
1 + (
𝜔𝑙
𝜔𝑜 − 𝜔𝑖𝑛𝑗
+
𝜔𝑏
𝜔𝑜 − 𝜔𝑖𝑛𝑗
𝑡𝑎𝑛 (
𝜔𝑏𝑡
2 ))
2 
 
(4.5) 
 
𝜔𝑏 = √(𝜔𝑜 − 𝜔𝑖𝑛𝑗)2 − 𝜔𝐿
2 (4.6) 
  
53 
 
 
 
𝑀𝑄𝑃𝐸 =
𝜋
2
[
𝜔𝑖𝑛𝑗
𝜔𝑜
− 1 +
𝜔𝑏
2𝜋𝜔𝑜
{𝜃 (
2𝜋
𝜔𝑏
) − 𝜃(0)}] (4.8) 
 
θ varies by 2π over one period (2.7) thus we have 
 
 
𝑀𝑄𝑃𝐸 =
𝜋
2
[
𝜔𝑖𝑛𝑗
𝜔𝑜
+
𝜔𝑏
𝜔𝑜
− 1] =
𝜋
2
[
𝑓𝑖𝑛𝑗
𝑓𝑜
+
𝑓𝑏
𝑓𝑜
− 1] (4.9) 
 
 
Figure 4.6: Quadrature error in unlocked case (a) close to lock (b) far from lock. 
 
 
𝑀𝑄𝑃𝐸 =
1
2𝜋
𝜔𝑏
∫
𝜋
2
× (
𝜔(𝑡)
𝜔𝑜
− 1) 𝑑𝑡 
2𝜋
𝜔𝑏
0
 (4.7) 
  
54 
(4.3) and (4.9) form the cornerstones of the theory of QLL. Figure 4.7 shows the variation of 
MQPE with change in fo for a fixed finj of 7GHz and injection strength (k) of 0.05. It has two 
distinct regions, locked and unlocked. As expected, the MQPE is 0 for finj=fo. In the locked state 
the MQPE increases (almost linearly) as |finj − fo| increases. MQPE goes to zero asymptotically 
(never reaching it) as |finj − fo| increases in the unlocked state. This suggests that the MQPE is 
a measure of the sign of finj − fo in both locked and unlocked states. This in turn implies that a 
quadrature phase error detector can be used as a phase frequency detector (PFD) in an injection 
locking environment. Hence the quadrature error can be indeed used in a feedback system to set 
the natural frequency (fo) of the oscillator such that fo=finj, thereby boosting the effective locking 
range.  
 
Figure 4.7: MQPE vs. fo for a fixed finj of 7GHz 
 
An interesting feature of this technique is that the MQPE itself can be controlled by changing 
the injection strength. As shown in Figure 4.8, increasing the injection strength (K) increases the 
intrinsic locking range of the injection locked oscillator (2.5), hence widening the linear region. 
This fact proves useful as injection strength can be controlled externally, allowing off-chip 
control of the MQPE.  
  
55 
 
 
Figure 4.8: Effect of injection strength on MQPE 
 
Figure 4.9 shows the block diagram of the proposed system. It consists of an injection locked 
two stage differential ring oscillator.  Instantaneous quadrature error is measured by using a 
phase detector (PD), which takes the I and Q phases of the clock from an ILO as inputs. The 
error is averaged using a charge pump and a loop filter, and fed back to the oscillator’s Vctrl 
(Figure 4.9). The loop tracks the changes in the injected frequency and natural frequency of the 
oscillator until their difference |finj − fo| is minimized, assuring a wide locking range.  
This technique obviates the need for a phase frequency detector (PFD) and its speed 
limitations. Wide jitter tracking bandwidth inherent to IL helps in preserving the correlated low 
frequency jitter and suppressing the uncorrelated high frequency jitter. In addition, since the 
reference clock is not used by the PD, it does not need to be rail to rail. As described in the next 
sections, QLL has a first order response, assuring stability without jitter peaking. 
 
  
56 
 
 
Figure 4.9: Block diagram of the proposed system (QLL) 
 
 
4.2 Mathematical Analysis 
In this section we propose a mathematical model of our system. We analyze the effect of the 
quadrature error correcting loop on the injection locking dynamics and derive the dynamics of 
the overall system. We show that the overall system can be designed to have a first order 
behavior, and bolster our claims with Simulink based behavior modelling and measured results. 
The dynamics of the system is similar to those of normal injection locked oscillator (2.3) 
except for the fact that ωo is not fixed any more.  The value of ωo is set by the loop as 
 
 𝑑(𝜃)
𝑑𝑡
= 𝜔𝑜 + 𝐾𝑉𝐶𝑂𝑉𝑐𝑡𝑟𝑙 − 𝜔𝑖𝑛𝑗 − 𝜔𝐿𝑠𝑖𝑛(𝜃) 
(4.10) 
 
  
57 
Vctrl is generated after low pass filtering the transient quadrature phase error by a loop filter 
‘H’. We therefore have 
 
𝑉𝑐𝑡𝑟𝑙(𝑡) = 𝐻 (
𝜔𝑖𝑛𝑗 +
𝑑𝜃
𝑑𝑡
𝜔𝑜 + 𝐾𝑉𝐶𝑂𝑉𝑐𝑡𝑟𝑙
− 1) (4.11) 
 
Using (4.10) we can simplify (4.11) to 
 
 
𝑉𝑐𝑡𝑟𝑙(𝑡) = 𝐻 (
−𝜔𝐿𝑠𝑖𝑛(𝜃)
𝜔𝑜 + 𝐾𝑉𝐶𝑂𝑉𝑐𝑡𝑟𝑙
) (4.12) 
 
At equilibrium dθ/dt=0 and ωo+KVCOVctrl=ωinj. Substituting these values in (4.10) we get that 
in equilibrium, θ=0. 
The highly non-linear nature of (4.11) and (4.12) make it difficult to get a convenient close 
form solution. However, we can still gain some insight about how the loop behaves with regard 
to input noise by linearizing about the equilibrium point (i.e. θ=0).  We replace θ with θn, given 
|𝜃𝑛| ≪ 1 (small signal assumption) 
  
 𝑑(𝜃𝑛)
𝑑𝑡
≈ 𝐾𝑉𝐶𝑂∆𝑉𝑐𝑡𝑟𝑙 − 𝜔𝐿𝜃𝑛 
(4.13) 
 
 
𝛥𝑉𝑐𝑡𝑟𝑙 ≈ 𝐻 (
−𝜔𝐿𝜃𝑛
𝜔𝑜
) (4.14) 
 
Substituting the value of ΔVctrl from (4.13) in (4.12) we get  
 
 𝑑(𝜃𝑛)
𝑑𝑡
≈ −𝐾𝑉𝐶𝑂 𝐻 (
𝜔𝐿𝜃𝑛
𝜔𝑜
) − 𝜔𝐿𝜃𝑛 (4.15) 
 
where H denotes a low pass filter in frequency domain (Figure 4.10) with bandwidth ωfilter such 
that ωfilter<<ωL. ωL is the locking range of the regular ILO as in (2.5). 
  
  
58 
 
Figure 4.10: Design of the loop filter. 
 
If θn varies faster than ωfilter then 𝐻 (
𝜔𝐿𝜃𝑛
𝜔𝑜
) ≈ 0 and we have 
 
 𝑑(𝜃𝑛)
𝑑𝑡
= −𝜔𝐿𝜃𝑛 
(4.16) 
 
This is similar to a first order PLL response with bandwidth ωL, characteristic of an injection 
locked system (2.3).  
If θn varies slower than ωfilter then 𝐻 (
𝜔𝐿𝜃𝑛
𝜔𝑜
) ≈
𝜔𝐿𝜃𝑛
𝜔𝑜
 and we have 
 
 𝑑(𝜃𝑛)
𝑑𝑡
= − (
𝐾𝑣𝑐𝑜
𝜔𝑜
+ 1) 𝜔𝐿𝜃𝑛 (4.17) 
 
This is also a first order PLL response with a bandwidth higher than ωL. The exact bandwidth 
is not important in this case because the variation in θn is much slower than ωL. 
So overall the system allows all the variations in the θ n slower than ωL to go through, and 
attenuates all variations faster than ωL with, -20db/dec (first order) slope. This is an important 
conclusion. It essentially means that allowing the quadrature error correction loop to run much 
slower than the injection locking loop ensures that the system has a first order response with 
bandwidth same as that of an ILO, i.e., ωL.  
  
59 
We verify the accuracy of our derivations by simulating a more accurate behavioral model in 
Simulink. Actual chip measurement results will also be used to bolster the accuracy of our 
modeling. 
 
4.2.1 Behavioral Modelling 
In order to investigate the stability of the system with greater accuracy a behavioral model was 
constructed in Simulink by implementing (4.11) and (4.12) as shown in Figure 4.12. The model 
is initialized to set fo to 5GHz and finj to 7GHz. The ILO’s inherent locking range (fL) was set to 
175MHz. Figure 4.11 shows the transient response of QLL Simulink model for two different 
loop bandwidths. The first with loop bandwidth of 100kHz (<< fL) and second with loop 
bandwidth of 20MHz (comparable to fL). In both cases the system attains the same final locked 
state, i.e., θ=2nπ and f=7GHz. However, there are some important differences. In the first case 
the transient has a first order response with no overshoot whereas the second case has significant 
ringing in its transient response and is thus farther from stability.  
 
 
 
 
Figure 4.11: Transient locking characteristics of Simulink model of QLL for two different 
loop filters.  
  
60 
 
Figure 4.12: Simulink model of QLL (top) and Matlab code to extract the linear state-
space model around the operating point. 
 
  To further analyze the stability of the QLL model, we linearized the Simulink model around 
the equilibrium point and by using the Matlab’s “linmod” function. Once the state-space model 
  
61 
was determined the step response and system transfer were calculated for different quadrature 
error correction bandwidths.  
Once again the inherent locking range (fL) of the ILO was fixed to 175MHz and we simulated 
the linearized model for two different loop bandwidths. The first with loop bandwidth of 100kHz 
such that it was << fL. A first order response step response observed. There was no overshoot in 
the step response and no peaking in the system transfer function (Figure 4.13) with -20dB/dec 
decay. 
 In the second case we set the loop bandwidth to 20MHz which is much closer to ωL. We 
observed ringing in the step response and system transfer function had some peaking and had a 
second order (-40dB/dec) decay. We used this modelling insight in our circuit design. 
 
 
The model suggests that in order for the system to be stable the secondary loop needs to run 
much slower than the bandwidth of the injection locking itself. If the above condition is assured 
 
Figure 4.13: Step response and transfer function of linearized QLL Simulink model for 
different loop bandwidths (Small signal behavior). 
  
62 
then the bandwidth of the system is the bandwidth of the ILO (fL). This is similar to our 
theoretical analysis in the previous section.  
 
4.3 Circuit Implementation 
Figure 4.14 shows the circuit diagram of the major sections of the QLL.  The reference clock 
can be injected both electrically and optically. A trans-impedance amplifier (TIA) based optical 
front-end is used in the latter case. The TIA consists of an inverter with a resistor of value 4kΩ, 
connected in feedback. The bandwidth of the TIA is more than 10GHz. The TIA’s output 
voltage amplitude (150 mV) is sufficient for the IL architecture because of its high voltage gain 
 
Figure 4.14: Circuit architecture of QLL. 
 
  
63 
[53]. The electrical input is provided directly by an on-chip 50Ω transmission line. An analog 
multiplexor is used to select between the electrical and optical (from TIA) inputs. The selected 
input is fed to the single to differential convertor. It consists of an NMOS with symmetrical drain 
and source loads. The differential outputs from the drain and source are 180o apart within an 
11GHz bandwidth. Outputs from the single to differential convertor are ac coupled to the ILO 
injection ports.  
 Each ILO (Figure 4.15) consists of a V/I converter and a two-stage, cross-coupled, pseudo 
differential current-starved ring oscillator. A two-stage ring oscillator architecture is chosen to 
minimize power consumption.  The bias circuit is designed such that current starvation is 
achieved in both PMOS and NMOS in the invertors of the ring oscillator for a 50% duty cycle. 
Current injection is achieved by NMOS differential V/I converters without resistive loading 
which helps in mitigating the interaction with the DC bias at the injection point [23]. The sizes 
of the NMOS differential pair are chosen to reduce the parasitic loading while fully steering the 
current source.  
A simple XOR-XNOR based phase detector takes the I and Q phases of the clock from the 
ILO as inputs. It generates Up and Dn signals containing the instantaneous quadrature error 
information. The error is averaged using a simple charge pump and a loop filter consisting of a 
capacitor of value 1pF.  The charge pump consists of an amplifier with an NMOS differential 
pair and diode connected PMOS loads. The differential output of the amplifiers is converted to 
a single ended output by current mirroring. The body biases of the NMOS differential pair in the 
charge pump is used for externally calibrating for the current mismatch in the charge pump. The 
bandwidth of the charge pump filter is digitally controllable, by altering the load on the 
differential pair. The output of the charge pump and loop filter is fed back to the oscillator’s Vctrl, 
thereby completing the loop. 
 
4.3.1 Transient Simulation 
Figure 4.16 (a) shows the transient locking characteristics (frequency and Vctrl) of the proposed 
QLL. For the simulation, the injected frequency was fixed to 7GHz and the initial frequency of 
the oscillator was 7.7GHz, such that system was outside its locking range. 
 The locking takes place in three different stages. When the system is in the unlocked state 
the loop brings the frequency of the oscillator close to the injected frequency. When the 
  
64 
frequency of the oscillator comes within the injection locking range of the ILO, frequency lock 
is achieved. However, the phase still keeps changing. The loop changes the Vctrl of the oscillator 
until the quadrature error is nullified i.e. when fo=finj. This negative feedback loop ensures that 
fo=finj and there is no phase error in the outputs. Figure 4.16 (b) shows ring oscillator’s frequency 
vs. control voltage characteristics. In the final locked state the Vctrl settles to 0.61mV such that 
the natural frequency of the ring oscillator is equal to 7GHz (Figure 4.16 (b)). 
Transient simulations were repeated to show that the QLL has inherent frequency detection 
in both directions as shown in Figure 4.17. The injected frequency was kept at 7GHz and the 
initial frequency was kept at 7.75GHz (>7GHz) in one case and at 6.65GHz (<7GHz) in the 
other. The system locks, in both cases, to the injected frequency. Difference in locking times 
because of the dependence of is dependent on MQPE on fo (4.4). 
 
 
Figure 4.15: Ring oscillator based ILO circuit schematic. 
 
  
65 
 
Figure 4.16: (a) Transient locking characteristics of QLL. (b) Ring oscillator 
characteristics. 
 
 
  
66 
 
Figure 4.17: Locking transient for two different initial conditions 
 
 
4.4 QLL Based Clocking  
To validate the QLL as a robust building block, QLL based ultra-low-power clocking is 
demonstrated for a four channel, quarter-rate optical receiver (Figure 5.1).  
The QLL is used to generate accurate quadrature clock phases from a single phase of 
electrical/optical clock input. The four phases are distributed without any repeaters and sent to 
local ring oscillators, which are placed near the clocked optical receivers. The local ring 
oscillators are injection locked to the global clock and frequency of oscillation is varied to control 
the phase of, local ring oscillator’s output (deskew). The data receivers have a quarter-rate 
architecture hence require accurate quadrature phases. Symmetric injection with four clock 
phases ensure that quadrature accuracy is maintained even with deskew. The optical receiver 
uses the inherent frequency-to-voltage conversion provided by the QLL to dynamically body 
bias its devices. The details of QLL based clocking will be described in greater detail in the next 
chapter. 
  
67 
4.5 Hardware Measurements  
The test chip for the QLL was fabricated in a 28nm FD SOI CMOS process. The die micrograph 
and core detail are presented in Figure 4.18. Core area is 60μm x 50μm, in a 5mm x1.1mm die. 
The top metal layers are designed to be compatible with copper-pillar flip-chip bonding as well 
as bond-wire. 
 
 
 
Figure 4.18: Die micrograph and layout details. 
 
  
68 
4.5.1 Locking Range and Integrated Jitter 
In our measurement setup, an external signal generator (Anritsu N5181B) is used to provide the 
reference clock used for injection. The frequency of the reference clock was varied and output 
waveforms were observed on an Agilent 86100D sampling oscilloscope. To demonstrate the 
increase in locking range we disable the loop and set the Vctrl (Figure 4.9) of the ILO at VDD/2. 
Without the quadrature phase error tracking, a locking range of 7-7.4GHz (5%) is observed at  
 
 
Figure 4.19: Phase noise and integrated jitter measurements for 8GHz (electrical and 
optical) and 11GHz (electrical).  
 
  
69 
an injection strength (K) of 0.05. With the loop activated the locking range improves to 4-11GHz 
(90%). The achieved locking range is limited by the tuning range of the ring oscillator. In order 
to measure the response of the QLL to fast changes in frequency, the frequency of the reference 
clock was changed in steps of 2GHz with each step having a time duration of 1ms (equipment 
limited).The large bandwidth of the QLL allows it to sustain 2GHz frequency step changes in 
frequency without losing lock.  
Figure 4.19 shows the measured phase noise of the output of the QLL in both locked and 
unlocked states at 8GHz. A -40dBc/Hz improvement is observed at 1MHz offset, between the 
locked and unlocked states. Integrated output jitter (100kHz-1GHz) of 558fs and 577fs are 
measured at 8GHz for electrical and optical inputs respectively. At the highest locking frequency 
(11GHz) the integrated output jitter is 642fs. Figure 4.20 shows the measured phase noise (at 
10MHz offset) of the locked QLL across the entire locking range. A phase noise variation of 
only 6dBc/Hz is observed as the frequency is varied from 4GHz to11GHz. Thus, QLL 
maintains low phase noise performance across its entire locking range.   
 
 
Figure 4.20: Measured phase noise of the locked QLL output across the entire locking 
range. 
 
  
70 
4.5.2 Reference and Supply Noise Filtering 
The jitter transfer function is measured using the test setup shown in Figure 4.21. In this setup, 
a secondary clock (fjitter) is mixed with the 90o phase shifted primary clock (fo). This signal is 
transformed to a narrowband FM signal by summing it with a non-phase shifted primary clock. 
The resultant FM signal is used as the new reference clock. The secondary clock frequency (fjiiter) 
is varied from 1kHz to 1.2GHz for each fo and  the spectrum components of the output and the 
reference are measured at the carrier (fo) and sideband (fjitter) frequencies (Figure 4.21) using an 
Agilent E4440A spectrum analyzer.  
 
 
Figure 4.21: Measurement setup for generating FM signal reference. (b) Setup for 
measuring the spectrum of output signals. 
 
Figure 4.22 (a) shows the measured jitter transfer function of the system for a reference 
frequency of 8GHz. It has a low-pass characteristic with a jitter tracking bandwidth (JTB) of 
150MHz and a -20dB/dec attenuation, suggestive of a first-order system. High JTB helps in 
retaining the low frequency jitter while eliminating high frequency jitter as depicted in Figure 
  
71 
4.22 (b). It is important to retain the low frequency jitter in forwarded clock receivers as low 
frequency jitter is correlated with the data [17]. 
Ring oscillators are susceptible to power supply variations [57]. Power supply variations 
directly translate into phase noise and jitter in the ring oscillators’ output as their oscillation 
frequency is inversely proportional to VDD (Figure 4.1). Substrate noise also directly affects the 
total oscillator jitter and is found to be strongly correlated to supply variations [57]. High 
frequency noise on the supply can be reduced adding bypass capacitors. However, low frequency 
VDD noise is more difficult to eliminate with bypass capacitors because of significant area penalty. 
Injection locking helps in suppressing low frequency VDD noise as shown in Figure 4.23. VDD 
 
Figure 4.22: (a) Measured Jitter transfer function for 8GHz reference. (b) Response to low 
frequency (10 MHz) and high frequency (1 GHz) jitter. 
 
  
72 
noise transfer has a high pass transfer function with a bandwidth of 150MHz and a -20dB/dec 
attenuation. This is complementary to the jitter transfer function measurement (Figure 4.23) and 
characteristic of a first order injection locked system as predicted by (2.14) and (2.15). The 
measurement is made by adding sinusoidal noise ranging from 10MHz to 1GHz on the VDD 
using a bias tee and then measuring the relative frequency sidebands on the output in unlocked 
and locked cases (Figure 4.23).  
 
 
Figure 4.23: QLL response to supply noise compared to unlocked (no reference) case. 
 
4.5.3 Quadrature Accuracy 
Quadrature phase accuracy between the phases of the QLL outputs is confirmed by measuring 
their phase difference. The quadrature output phases (I and Q) of the QLL are selected using an 
on-chip digital multiplexor. Quadrature error is measured in a two-step process. First, the ‘I’ 
phase is selected and its phase difference with the input reference is measured. Then the digital 
  
73 
bit to the multiplexer is altered to select the ‘Q’ phase and its phase difference with the input 
reference is measured. The difference between the two measured values provides the quadrature 
phase error. This multiplexing allows the I and Q phases to have the same signal paths and hence 
a more accurate measurement can be made. Figure 4.24 shows the measured quadrature 
accuracy across 4-11GHz and the corresponding 3σ  error margins. An average offset of 1.5o 
(from 90o) is observed between the two phases across the entire locking range.  
 
Figure 4.24: Measured  quadrature phase error vs. reference frequency and measured 
quadrature phase waveforms at 5, 8 and 11GHz. 
 
4.5.4 Power Consumption 
A power efficient two-stage ring oscillator and simplicity of injection locking ensures that the 
QLL circuit only consumes 2-2.8mW for 4-11GHz operation. As shown in Figure 4.25 (a), the 
power consumption increases with operation frequency. This is due to the digital nature of the 
ring oscillator. The power efficiency (Figure 4.25 (b)) decreases as frequency increases making it 
suitable for high-speed applications.  
  
74 
4.5.5  Comparison with Prior Art 
 
* Optical clock input **Not measured directly  
Table 4.1: Performance comparison of the QLL 
 
Table 4.1 compares the QLL with prior art. The QLL based frequency tracking technique allows 
us to achieve the best locking range and robust I/Q performance compared to otjer works. Our 
jitter and power performance is comparable with the state-of-the-art. We achieve the best figure 
of merit (FOM) which was defined as 
 
 
This work [58] 
[59] 
[26] [23] [55] [56] 
Architecture QLL ILO ILO IL-PLL PPM IL 
Oscillator CMOS Ring 
CMOS 
Ring 
CMOS 
Ring 
CMOS 
Ring 
CMOS 
Ring 
Technology 28nm FD SOI 
250nm 
BiCMOS 
90nm 
CMOS 
65nm 
CMOS 
20nm 
CMOS 
Locking range 4GHz - 11GHz 340MHz 203MHz __ __ 
Output Integrated 
Jitter (σ ) 
558fs -577fs*  
(at 8GHz) 
642fs  
(at 11GHz) 
(100kHz-1GHz) 
__ 
<1.5ps 
(RMS 
Jitter at 
2.5GHz) 
0.7ps at 
1.2GHz(1
0kHz-
40MHz) 
434fs/268
fs at 
15GHz 
(100kHz-
1GHz) 
I/Q error 1.5o 0.7o** 4.5o NA NA 
Active Area 0.003mm2 0.09mm2 0.026mm2 0.022mm2 0.044mm2 
Supply 1V 3V 1.2V __ 1.25/1.1V 
Power Diss. (P) at (F) 
2.77mW at 
11GHz 
15mW at 
2.7GHz 
1.3mW at 
2GHz 
0.97mW 
at 
1.2GHz 
46.2mW 
at 15GHz 
Figure of Merit 
(FOM)  
-250dB __ -238dB -244dB -247dB 
  
75 
 
𝐹𝑂𝑀 = 10𝑙𝑜𝑔 [(
𝜎
1𝑠
)
2
.
𝑃
1𝑚𝑊
.
1𝐺𝐻𝑧
𝐹
] (4.18) 
 
where σ  is the RMS integrated jitter, P is the power consumption and F is the frequency of 
operation. Thus the lowest FOM will be achieved by a system with lowest jitter and power 
consumption at the highest frequency of operation. 
 
 
 
Figure 4.25: (a) Power consumption of the QLL vs. frequency. (b) Power efficiency of the 
QLL vs. frequency.  
  
76 
4.6 Summary 
A new frequency tracking technique based on the quadrature phase error cancellation in an 
injection locked ring oscillator was introduced and analyzed. The technique improves the ILOs’ 
locking range from 5.5% (7-7.4GHz) to 90% (4-11GHz) without using a phase frequency 
detector (PFD). The dynamics of the system were derived and were shown to have first order 
characteristics. This guarantees stability without peaking, unlike a second order injection locked 
PLL.  The system was used to generate accurate quadrature phases, without any frequency 
division, from a single phase of reference clock input, supplied electrically or optically. A power 
efficient two stage ring oscillator, combined with the low jitter performance of the ILO, allows 
us to achieve the best FOM. 
The theory of the QLL also applies to subharmonic and superharmonic injection locked 
quadrature ring oscillators. And because the phase detector used in the QLL loop only uses the 
phases from the ILO for comparison; this technique could be easily extended to be used for 
wideband injection in injection locking based frequency multipliers [17] and CDR [39].  
 
 
 
 
  
  
77 
Chapter 5: QLL Based Clocking for a 
Four Channel Quarter-Rate Optical 
Receiver 
As discussed in Chapter 1, integrated circuit scaling has enabled a huge growth in processing 
capability, which necessitates a corresponding increase in inter-chip communication bandwidth. 
This trend is expected to continue, requiring both an increase in the per-pin data rate and the 
I/O pins.  
While I/O circuit performance benefits from technology scaling, the bandwidth of electrical 
channels does not scale with the same trend. Especially as the data rate increases, they exhibit 
excessive frequency-dependent loss, which results in significant inter-symbol interference (ISI). 
In order to continue scaling data rates, equalization techniques can be employed to compensate 
for the ISI. However, the power and area overhead associated with equalization make it difficult 
to achieve target bandwidth with a realistic power budget.  
A promising solution to the I/O bandwidth problem is the use of optical inter-chip 
communication links. The negligible frequency dependent loss of optical channels provides the 
potential for optical link designs to fully utilize increased data rates provided through CMOS 
technology scaling without excessive equalization complexity. Optics also allow very high 
information density through wavelength division multiplexing (WDM). Hybrid integration of 
optical devices with electronics has been demonstrated to achieve high performance [60] and 
recent advances in silicon photonics have led to fully integrated optical signaling [61]. These 
approaches pave the way for massively parallel optical communications. In order for optical 
interconnects to become viable alternatives to established electrical links, they must be low cost 
and have competitive energy and area efficiency metrics. Dense arrays of optical detectors 
require very low-power, sensitive, and compact optical receiver circuits. Existing designs for the 
input receiver, such as TIA, require large power consumption to achieve high bandwidth and 
low noise, and can occupy large area due to bandwidth enhancement inductors. In addition to 
  
78 
receiver circuits, the clocking circuit also needs to be more power efficient. In conventional 
clocking schemes that employ a global PLL-locked reference and a digitally distributed clock 
through buffer chains and clock grids, the power required to constantly switch the large 
capacitive loads can consume 40% of the chip’s total power budget [15]. In recent years, 
injection-locked clocking has been proposed as a solution for reducing power consumption of 
the clock network [53]. Also, as discussed in Chapter 1, ILOs are well suited for forwarded clock 
receivers because of their high bandwidth jitter filtering properties [17] and easy deskewing. 
However, ILOs are plagued by their small locking range making them susceptible to PVT 
variations and unsuitable for wideband receivers.  
In the previous chapter we introduced the quadrature locked loop (QLL); a frequency 
tracking technique to increase the locking range of the ring based quadrature injection locked 
oscillator. This technique was used to generate the accurate quadrature phase from a single phase 
of electrical/optical clock without any frequency division. In this chapter, we use QLL based 
clocking for a four channel, quarter-rate, forwarded clock, optical receiver. QLL is used to 
generate accurate clock phases for a four channel optical receiver using a forwarded clock at 
quarter-rate. The QLL drives an ILO at each channel, without any repeaters for local quadrature 
clock generation, ensuring low power clocking. Each local ILO has deskew capability for phase 
alignment. The wide locking range of the QLL ensures reliable operation across wide data rates.  
A compact low-power optical receiver [62] maintains per-bit energy consumption across 
16Gb/s-32Gb/s by adaptive Body Biasing (BB), using the Vctrl generated by the QLL.  
This chapter is organized as follows: Section I describes the system architecture of the optical 
receiver and adaptive body biasing. In Section II we describe the QLL based deskewing 
technique. Hardware measurement results for the optical receiver are presented in Section III. 
Section VI summarizes the work and presents conclusions. Finally, in Section V, we propose an 
extra dimension to the QLL idea that will be useful for future applications. 
5.1 System Architecture 
The clocking structure is shown in Figure 5.1. The optical receiver has four optical data inputs 
and one forwarded clock (electrical/optical) input. The optical clock is converted to an electrical 
clock using a TIA as mentioned in the previous chapter. The electrical clock is then sent to a 
global QLL circuit. The QLL generates four quadrature phases. The four phases are distributed 
  
79 
without any repeaters and sent to local ring oscillators, which are placed near the clocked optical 
receivers.  The local ring oscillators are injection locked to the global clock and frequency of 
oscillation is varied to control the phase of the local ring oscillator’s output (deskew). The data 
receivers have a quarter rate architecture and hence require accurate quadrature phases. 
Symmetric injection with four clock phases ensure that quadrature accuracy is maintained even 
with deskew. The details of the same will be described in later sections. 
 
 
Figure 5.1: QLL based clock distribution architecture for a 4 channel optical receiver 
 
5.1.1 Optical Receiver  
An optical receiver uses a photodiode to convert an incoming optical signal to electrical current. 
If a simple resistor is used to convert the current of a photodiode to a voltage, for a target signal-
  
80 
to-noise ratio (SNR) and a given photodiode capacitance, the input time constant (RC) severely 
limits the bandwidth and data rate of the receiver. In order to increase the RC bandwidth while 
maintaining the same gain, transimpedance amplifiers (TIAs) are commonly employed. The 
overall bandwidth of conventional TIAs is chosen to be (RC)-1. 
Such high-bandwidth TIAs are highly analog, power hungry, and do not scale well with 
technology. A more recent approach uses an integrating front-end and a resistor termination 
with a time constant that is  much larger than the bit interval (RC>>Tb) [14]. Dynamic offset 
modulation is then used to provide a constant voltage at its input regardless of the data sequence.  
 Figure 5.2 shows the top-level architecture of the adaptive receiver (single channel) with 
dynamic BB using Vctrl of the QLL. The first stage of the receiver is a low-power TIA with 3kΩ 
feedback resistor. The TIA’s output is sampled at the end of two consecutive bits (Vn, Vn+1) and 
these samples are compared to resolve each bit. The TIA provides isolation between PD’s 
capacitor and sampling capacitors, which reduces charge-sharing effect and enables use of ultra-
low capacitance photodetectors in scaled silicon photonic technologies. Besides, for a given PD 
capacitance, S/H capacitors can be chosen to be bigger (even comparable to PD’s capacitance) 
 
 
Figure 5.2: Single channel quarter-rate receiver. 
 
  
81 
to relieve KT/C noise. This had been an important bottleneck in double sampling optical 
receivers in the past [14], [63]. Sampling capacitors are followed by an amplifier, which also 
provides isolation between sampling nodes and sense-amp to minimize kickback. The dynamic 
offset modulation employed at the output of the amplifier introduces an offset so that the sense-
amp differential input is always constant regardless of the previous bit. The sense-amp is 
followed by an SR-latch to retrieve the NRZ data. Similar to [62], dynamic offset modulation 
provides a constant voltage at sense-amp’s input regardless of the bit sequence. De-multiplexing 
factor of four is achieved immediately after the TIA using quarter-rate clocked samplers.  
The quarter-rate architecture of the receiver, necessitates accurate quadrature clocks. In 
addition, due to the multiple channels there is a need for per channel deskewing to align the 
clock to the data. We explain the details of accurate quadrature phase generation and deskewing 
in next sections. 
5.1.2 Adaptive Body Biasing 
The optical receiver implementation shown in Figure 5.2 has analog building blocks with bias 
currents. These are biased to provide the maximum bandwidth and gain for operation at the 
highest data rates, thus consuming maximum power. For operation at lower data rates a high 
bandwidth is not required. However, since the bandwidth of the analog components do not 
change with data rates, power is ‘wasted’. This leads to degradation of the power efficiency (the 
energy per-bit) of the optical receiver at lower data-rates [14], [63], [62].  
It is advantageous to bias the circuits adaptively so as to reduce the bias current (and hence 
power) of the analog components at lower data rates. This requires information about the data 
rate and a method to use this information to change the bias currents of the analog components. 
The former is provided by the QLL as it generates the Vctrl (Figure 5.4) which is dependent on 
the input clock frequency, hence the data rate. The latter is achieved by taking advantage of the 
FD SOI (fully depleted silicon on insulator) technology as described below. 
The prototype chip is fabricated in 28nm FD SOI CMOS. In the FD SOI CMOS process, 
the channel forms in an ultra-thin (7nm) layer of intrinsic silicon over a layer of buried oxide 
(BOX) (Figure 5.3 (a)). Given the extreme thinness of the buried oxide layer (25nm) and the 
conducting layer under the BOX, effect of body biasing (BB) is improved compared with 
standard CMOS process. By connecting the transistor bodies to a bias network in the circuit 
layout rather than to power or supply, Vth of the transistors can be tuned by 80mV per 1V 
  
82 
modulation of VBB (Figure 5.3 (b)). This proves crucial in adaptively body biasing the critical 
devices in the amplifier and the TIA.  
 
 
Figure 5.3: (a) FD SOI MOS structure (b) Threshold voltage (Vth) variation with back bias 
(Vb) 
 
 
 
Figure 5.4: Simulated ring oscillator characteristics. 
 
The Vctrl generated by the QLL follows the ring oscillator’s characteristics as shown in Figure 
5.4, i.e. as the reference frequency increases the Vctrl decreases from 1 to 0. The body bias 
generator is designed so that the transfer function from Vctrl of the QLL to VBB generator outputs, 
  
83 
is such that, receiver’s building blocks optimally work at any given data-rate. By fitting the 
transfer function of the body bias generator from Vctrl of QLL to body bias of respective blocks, 
the gain-bandwidth product of the TIA and Amp’s gain are adaptively set to be proportional to 
the data-rate. The optical receiver and body bias generator were designed by Saman Saeedi. 
 
5.2 Deskew 
We have used a forwarded clock (FC) architecture for the four channel optical receiver. At multi-
Gb/s speeds each data channel may have phase mismatch or skew with respect to the reference 
clock. This necessitates a per channel phase shift or “deskewing”. The conventional approach 
to achieving this is to use a PLL/DLL followed by a phase interpolator (PI) or a voltage 
controlled delay line (VCDL) [64] (Figure 5.5 (a)). As discussed in Chapter 1, PLL based systems 
generally have second order characteristics which may lead to jitter peaking. Also, their small 
jitter tracking bandwidth leads to filtering of useful correlated (with data) jitter. DLLs, on the 
other hand, have an all pass characteristic, which allows the high frequency uncorrelated jitter 
to pass thorough. Injection locking based systems, (Figure 5.5 (b)), on the other hand have a first 
 
Figure 5.5: Deskewing in forwarded clock links; (a) conventional (b) proposed. 
 
 
  
84 
order characteristics with a high tracking bandwidth, which do not filter out the useful correlated 
jitter and suppress the uncorrelated high frequency jitter. Figure 5.6 summarizes the properties 
of PLL, DLL and ILO. Compared to traditional VCDL or PI based approaches, ILO-based 
deskew provides a better supply noise rejection (Figure 4.23) and lower power [29] .   
 
 
Figure 5.6: Jitter transfer function characteristics of PLL, DLL and ILO. 
 
 
 
Figure 5.7: QLL based deskewing architecture (single channel). 
  
85 
Figure 5.7 (b) shows the architecture of the QLL based clocking for a single channel. The 
QLL is used to generate accurate clock phases for a four channel optical receiver using a 
forwarded clock at quarter-rate. The QLL drives an ILO at each channel, without any repeaters, 
for local quadrature clock generation for the quarter rate receiver. Due to their high sensitivity 
[15], ILOs can operate with very small input amplitude (100mV); this allows the reference clock 
to be distributed without repeaters, with low power. Figure 5.7 shows the structure of the local 
ILO. It has the same two stage pseudo differential architecture as the ring oscillator used in the 
QLL (Figure 4.15). The Vctrl generated by the QLL is also distributed to the local ILOs. This is 
used to set the natural frequency of the ILO (fo) same as that of the injected frequency (finj). It 
ensures that the local ILOs do not go out of lock as the data rate changes.  To invoke deskew, 
the (fo) of the local ILO is varied externally (Figure 5.7). All four phases of clock generated by 
the QLL are distributed and used for symmetric injection in the local ILOs. This ensures no 
quadrature mismatches even with deskew. This is described in a greater detail in the next section.    
 
5.2.1 Symmetric Injection 
As described earlier in Chapter 2, deskew in an ILO (locked at finj) can be performed by 
varying the natural frequency of oscillation (fo) of the oscillator. The amount of deskew is given 
by 
 
 
𝑑𝑒𝑠𝑘𝑒𝑤 = sin−1 (
𝑓𝑜 − 𝑓𝑖𝑛𝑗
𝑓𝑙
) (5.1) 
 
where fl is the locking range of the ILO.  
If the input clock is injected in only one of the delay stages, the asymmetry between the 
effective delay of the delay stages leads to quadrature phase mismatch between I and Q phases 
of the oscillator. As derived in Chapter 4, it is given by 
 
 
𝑄𝑢𝑎𝑑. 𝐸𝑟𝑟𝑜𝑟 =
𝜋
2
(
𝑓𝑖𝑛𝑗
𝑓𝑜
− 1) (5.2) 
 
Combining (5.1) and (5.2) we get 
 
  
86 
 
𝑄𝑢𝑎𝑑. 𝐸𝑟𝑟𝑜𝑟 =  
𝜋
2
(
−𝑓𝑙 sin(𝑑𝑒𝑠𝑘𝑒𝑤)
𝑓𝑙 sin(𝑑𝑒𝑠𝑘𝑒𝑤) + 𝑓𝑖𝑛𝑗
) (5.3) 
 
(5.3) suggests that as the deskew increases so does the magnitude of the quadrature error. So 
as fo is varied to invoke deskew, the I and Q phases of the ILO don’t shift by an equal amount. 
Inaccuracies in the quadrature phases may lead to increased BER in the quarter rate receiver. 
  
 
 The trade-off between deskew and quadrature error is broken by injecting all four phases of 
clock generated by the QLL into both the delay elements of the ILO (Figure 5.8 (b)). This 
symmetric injection of clock allows the variation of the delay of both the delay elements by equal 
amount. Thus, even when the fo of the ILO is varied, the inherent symmetry in the delay 
elements allows the phase relationship between the I and Q phases to be constant, resulting in 
 
Figure 5.8: Symmetric vs. two phase injection. (a) Two phase injection architecture. (b) 
Symmetric injection architecture. (c) Simulation based comparison of two phase and 
symmetric injection.   
  
87 
no quadrature error. This fact is exemplified in the simulation of ILO’s with two phases (clock 
and clock bar) and symmetric injection, as shown in Figure 5.8 (c). The Vctrl of the two ILOs is 
varied to change their fo. This leads to quadrature error in the former cases whereas in the latter 
the phase relationship between the I and Q phases remains 90o. 
 
5.3 Hardware Measurements 
The test chip is fabricated in a 28nm FD SOI CMOS process. The die micrograph and core detail 
are presented in Figure 5.9. The core area is 300μm x 60μm, in a 5mm x 1.1mm die.  The top 
metal layers are designed to be compatible with copper-pillar flip-chip bonding as well as bond-
 
Figure 5.9: Chip micrograph and layout details. 
  
88 
wire. The clock output from the QLL is symmetrically distributed to all four local ILOs with a 
total trace length 260μm (Figure 5.9). 
5.3.1 Test Setup 
The optical test setup is shown in Figure 5.10. For optical testing, the receiver is bonded to a 
photodiode with responsivity of 0.9A/W (Figure 5.9). The total capacitance at the input node 
was estimated to be 120fF. The optical beam from a 1550nm distributed feedback (DFB) laser 
is modulated by a high speed Mach-Zender modulator (MZM) and coupled to the photodiode 
with a single-mode fiber. The optical fiber is placed close to the photodiode aperture using a 
micro-positioner (butt coupling).  As the beam has a Gaussian profile, the gap between the fiber 
tip and the photodetector causes optical intensity loss. Combined optical loss due to the optical 
coupling and optical connector is measured to be 2.8dB. Quarter-rate clock generated by the  
 
Figure 5.10: Test setup for optical receiver. 
 
  
89 
pattern generator was used as (electrical) reference for the QLL.  
5.3.2 Receiver BER Measurements 
The functionality of the receiver is validated using the PRBS-7, 9, 15 sequences generated by the 
pattern generator (Figure 5.10). Each of the four channels are tested separately. Figure 5.11 (a) 
shows the recovered quarter-rate data eye diagram for 32Gb/s optical data, for one of the 
 
 
Figure 5.11: Measured eye diagram (a) and BER (b) with PRBS 15 optical data at 32Gb/s. 
  
90 
channels. Figure 5.11 (b) shows the bath curves for 32Gb/s and 20Gb/s. Error free (BER=10-12) 
operation is shown for 0.16UI and 0.33UI for 32 and 16Gb/s respectively. The maximum 
achievable data-rate (32Gb/s) is limited by the maximum data-rate of the external pseudo 
random bit sequence (PRBS) generator.  
 
 
Figure 5.12: BER vs. optical power (receiver sensitivity) at different data-rates (top). 
Optical sensitivity vs. data rate (bottom). 
 
Optical receiver sensitivity is defined as the minimum optical power that a receiver needs to 
operate reliably with error free (BER=10-12) operation. Figure 5.12 shows the measured BER as 
the optical power is varied for different data rates. From this information we derive the optical 
  
91 
sensitivity as shown in Figure 5.12. The receiver achieves more than -12dBm of sensitivity at 
16Gb/s, which reduces to -10dBm at 28Gb/s and -8.8dBm at 32Gb/s. Sensitivity degradation 
with increased data rate is mainly due to reduced bit interval and integration time. 
5.3.3 Deskew Range 
The amount of phase shift allowed by the local ILO is measured by varying the deskew (shown 
in Figure 5.7) from 0 to VDD at 8GHz for 32Gb/s operation. Agilent 86100D sampling 
oscilloscope is used to record the ILO waveforms for different values of the Vctrl. A total deskew 
range of 137o is measured. The optical receiver needs a maximum deskew range of 90o because 
of its quarter-rate architecture, so a measured deskew range greater than 90o proves sufficient. 
 
 
Figure 5.13: Measured deskewed waveform for 32Gb/s data. 
 
5.3.4 Power Consumption 
The receiver’s power breakdown and power efficiency (energy per-bit) are shown in Figure 5.14. 
Total power consumption per channel at the highest data rate (32Gb/s) is 4.87mW. The QLL 
and local ILOs consume a third of the total power.  
  
92 
To show the efficacy of the adaptive body biasing scheme, two sets of measurements are done 
with the adaptive VBB generator on and off (Figure 5.14). When adaptive VBB generator is active, 
the per-bit energy efficiency improves from 103fJ/b at 32Gb/s to 94fJ/b at 16Gb/s. Without the 
body bias the per-bit energy efficiency at 16Gb/s is 160fJ/b. 
 
 
 
Figure 5.14: Power consumption breakdown at 32Gb/s (top) and energy efficiency per bit 
across different data rates. 
  
93 
5.3.5 Comparison with Prior Art 
 
* Excludes clocking. 
Table 5.1: Performance comparison of the optical receiver. 
 
Table 5.1 compares the optical receiver with prior art. Low power QLL based clocking and body 
biasing helps achieve the highest efficiency compared to the state-of-the art. Ring oscillator based 
clock distribution helps achieve a very compact design, smallest compared to other works. 
 
5.4 Summary 
In the previous chapter we introduced the idea of the quadrature locked loop (QLL); a frequency 
tracking technique to increase the locking range of the ring based quadrature injection locked 
oscillator. This technique was used to generate accurate quadrature phases from a single phase 
of electrical/optical clock without any frequency division. In this chapter, we introduced QLL 
based clocking for a four channel quarter-rate optical receiver. It validated the QLL as a robust 
building block for future designs.  
The system was implemented in 28nm FD SOI CMOS and supports up to 32Gb/s of data-
rate. The unique properties of the FD SOI technology were used in synchronization with the 
QLL and optical receiver to achieve an ultra-low power consumption of 153fJ/bit. Experimental 
results validated the feasibility of the QLL and the optical receiver for ultra-low-power, high-
data-rate, and highly parallel optical links. 
 
 This work [58] [62] [65] [66] 
Technology 28nm FD SOI 28nm CMOS 65nm CMOS 28nm CMOS 
Data-Rate 32Gb/s 25Gb/s 28Gb/s 28Gb/s 
Efficiency 
103fJ/bit data and 
50fJ/bit clock 
170fJ/bit* 3.25pJ/bit 1.03pJ/bit 
Active area 
0.3x0.06mm2 (4 
channel) 
0.0018mm2 3.25mm2 0.318mm2 
Sensitivity (Optical) -8.8dBm at 32Gb/s 
-6.8dBm at 
25Gb/s 
-9.7dBm at 
25Gb/s 
-6dBm at 
10Gb/s 
  
94 
5.5 QLL: Future Work 
In this last section we explore an additional dimension to the QLL circuit that can be used by 
future IC designers. In the previous parts of this chapter we explained how a combination of 
QLL and ILO’s could be used for deskewing. While this yields a more power efficient solution 
for parallel links, a standalone QLL based deskewing would more suitable for single channel 
forwarded clock receivers (Figure 5.15). 
 
 
Figure 5.15: QLL based clocking for an n-channel forwarded clock receiver (left). 
Proposed clocking scheme for a single channel forwarded clock receiver (right). 
 
 In the conventional QLL architecture (Figure 5.16 (a)), the Vctrl generated by the phase 
detector and the low pass filter is used to set delay of both the delay elements (A and B). A simple 
but crucial change can allow us to add deskew capability in the QLL. Instead of using the Vctrl 
to control the delay of both the delay elements, we use it only for only one of the elements (Figure 
5.16 (b)). The delay of the other delay element is kept outside the QLL loop and controlled 
externally. The external control over the delay of one of the delay elements is used to add 
asymmetry to the delay of the two delay elements. So instead of having a delay of d each as in 
Figure 5.16 (a), they have a delay d1 and d2. This forced asymmetry in delays is used for 
deskewing. In the stable state in the conventional QLL architecture, the oscillator is locked to finj 
and quadrature error reaches zero (4.4), i.e. fo=finj. Combining this with (2.3) suggests that in this 
case the phase difference between the injected and the locked output signal ‘θ ’ is zero, so there 
is no deskew in this case. However, in the modified QLL structure the locked state is different, 
  
95 
due to asymmetry in the delay stages. Again, the oscillator is locked to finj and the quadrature 
error reaches zero. To ensure zero quadrature error the phase delay across the delay element, B 
must be π/2. In time domain this implies  
 
 
𝑑2 =
𝜋
2
×
1
2𝜋𝑓𝑖𝑛𝑗
=
1
4𝑓𝑖𝑛𝑗
 (5.5) 
 
Thus we can alter (5.5) to 
 
𝑓𝑖𝑛𝑗 =
1
4𝑑2
 (5.6) 
 
Figure 5.16 (a) Conventional QLL architecture. (b) Modified QLL architecture to add 
deskew. 
 
  
96 
 
The natural frequency of oscillation (fo) is dependent on the total delay of A and B (Figure 
5.16 (b)) 
 
𝑓𝑜 =
1
2(𝑑1 + 𝑑2)
 (5.3) 
 
Thus from (5.2) and (5.3) we infer that 𝑓𝑜 ≠ 𝑓𝑖𝑛𝑗 if 𝑑1 ≠ 𝑑2, thus we have the deskew angle 
from (2.4) as 
 
𝑑𝑒𝑠𝑘𝑒𝑤 = sin−1 (
𝑓𝑜 − 𝑓𝑖𝑛𝑗
𝑓𝑙
) (5.6) 
 
Independent control over d1 allows us to vary fo (5.5) thus control deskew (5.6).  It is also 
instructive to calculate the MQPE in this modified QLL architecture, and   
 
 𝐷𝑒𝑙𝑎𝑦𝐼𝑄(𝜃) = 𝑑2 × 2𝜋𝑓 (5.7) 
 
Using (5.7) and defining 𝑚 = 𝑑2 (𝑑1 + 𝑑2)⁄ , we have 
 
 
𝐷𝑒𝑙𝑎𝑦𝐼𝑄(𝜃) =
𝜋
2
× 2𝑚 ×
𝑓
𝑓
𝑜
 (5.8) 
 
Using the same steps as in Chapter 4, we can derive the MQPE in the locked state as 
 
 
𝑀𝑄𝑃𝐸 =
𝜋
2
[2𝑚 ×
𝑓𝑖𝑛𝑗
𝑓𝑜
− 1] (5.9) 
 
And in the unlocked state as 
 
𝑀𝑄𝑃𝐸 =
𝜋
2
[2𝑚 × (
𝑓𝑖𝑛𝑗
𝑓𝑜
+
𝑓𝑏
𝑓𝑜
) − 1] (5.10) 
  
where is fb is same as in (4.6). It should be noted that (5.9) and (5.10) reduce to (4.4) and (4.9) 
for d1=d2, i.e., m=1/2. Figure 5.17 shows a plot of MQPE vs. fo for the regular (no deskew) and 
modified QLL (with deskew). The modified QLL has the familiar locked and unlocked regions 
like those of the regular version, but there are some marked differences. In the locked region the 
slope of the linear line is higher for the modified QLL, and in the unlocked region, instead of 
  
97 
going asymptotically to zero, the MQPE keeps increasing as |𝑓𝑜 − 𝑓𝑖𝑛𝑗|. This is expected, 
because for the modified QLL fo is varied by only changing d2, while keeping d1 fixed at 1/(4finj). 
So as |𝑓𝑜 − 𝑓𝑖𝑛𝑗| increases in the locked region, the MQPE increases both due to injection locking 
dynamics and increased asymmetry between d1 and d2 which is taken into account by m in (5.9). 
Thus leads to an increased slope to that of the regular QLL. In the unlocked case the injection 
locking dynamics cause MQPE to reduce as |𝑓𝑜 − 𝑓𝑖𝑛𝑗| increases. However, the inherent 
asymmetry increases further with increased |𝑓𝑜 − 𝑓𝑖𝑛𝑗|, and overshadows the decrease in MQPE 
due to injection.  As in the locked case, the inherent asymmetry is represented by m in (5.10). 
 
 
Figure 5.17: MQPE for the QLL without deskew and with deskew. 
 
The increased MQPE in the modified QLL is further exemplified in transient simulations. 
As shown in Figure 5.18, the modified QLL locks faster than the regular QLL for the same initial 
states. The injected frequency was set at 7GHz, and in case (a) the intial frequency (finit) was set 
to 5.75GHz and in the second case it was set to 8.4 GHz. In both cases the QLL with deskew 
locks faster than the regular QLL. 
 
  
98 
 
Figure 5.18: Transient locking characteristics of the modified QLL and regular QLL. (a) 
Initial frequency (finit)=5.75GHz (b) finit=8.4GHz 
 
As shown in Figure 5.19, once the modified QLL reaches the stable state (A), deskew can be 
performed by varying d1. If d1 is decreased, fo increases and the new equilibrium with zero 
quadrature error is achieved at point B, leading to both I and Q phases having a positive phase 
shift. Similarly, to initiate a negative skew, d1 is increased. An important advantage of QLL 
based deskewing compared to deskewing in a simple ILO is that, in the latter case, there is 
quadrature mismatch in the I and Q phases with deskewing, but in the former, the loop nullifies 
the quadrature mismatch, thus the I and Q phases move together. 
  
99 
   
 
Figure 5.19: Deskewing by changing d1, in the modified QLL. 
 
   
  
100 
Chapter 6: VCSEL Modelling and 
Equalization  
6.1  Background 
As the bandwidth demand for traditionally electrical wireline interconnects has accelerated, 
optics has become an increasingly attractive alternative for interconnects within computing 
systems. Multi-Gb/s optical links exclusively use coherent laser light due to its low divergence 
and narrow wavelength range. Modulation of this laser light is possible by directly modulating 
the laser intensity through changing the laser’s electrical drive current (Figure 6.2). A popular 
coherent laser light source used in optical transmitters is the Vertical-Cavity Surface-Emitting 
Laser (VCSEL). 
 
 
Figure 6.1: Cross-section of a VCSEL 
 
A VCSEL is a semiconductor laser diode which emits light perpendicular from its top surface 
(Figure 6.1). VCSELs have important practical advantages compared with edge-emitting 
  
101 
semiconductor lasers. They can be tested and characterized directly after growth, i.e. before the 
wafer is cleaved. Furthermore, it is possible to combine a VCSEL wafer with an array of optical 
elements (like collimator lenses) and then dice the composite wafer instead of mounting the 
optical elements individually for each VCSEL. This allows for low cost mass production of laser 
products. The most common emission wavelengths of VCSELs are in the range of 750-980nm 
[33] [34], as obtained with the GaAs/AlGaAs material system. While VCSELs appear to be the 
ideal source due to their ability to both generate and modulate light, serious inherent bandwidth 
limitations do exist. As data-rates scale, designers have begun to implement transmitter 
equalization circuitry to compensate for VCSEL bandwidth constraints. However, traditional 
equalization techniques do not take into account the non-linearity in the VCSEL’s response, 
leading to suboptimal performance. A VCSEL modelling and equalization technique that takes 
into account the inherent non-linearity in its high speed response is introduced. 
This chapter is organized as follows: Section II describes the speed limitations in the VCSEL. 
In Section III we describe the proposed VCSEL modelling technique. Section IV evaluates the 
model for accuracy. A new VCSEL equalization methodology that takes into account the 
inherent non-linearity of the VCSEL is presented in Section V. Section VI discusses the 
simulated improvement based on the equalization technique.  The circuit implementation for 
the VCSEL transmitter is presented in Section VII. Hardware measurement results for the 
optical transmitter are presented in Section VIII. Finally, the chapter is concluded in Section IX. 
 
 
Figure 6.2: VCSEL L-I curve 
 
  
102 
6.2 Speed Limitations 
VCSEL bandwidth is limited by a combination of electrical parasitics and the electron-photon 
interaction described by a set of second-order rate equations.  
VCSEL optical bandwidth is regulated by two coupled differential equations which describe 
the interaction of the electron density, N, and the photon density, Np [67]. The rate of the electron 
density change is set by the number of carriers injected into the laser cavity volume, V, via the 
device current I, and the number of carriers lost via desired stimulated and non-desired 
spontaneous and non-radiative recombination: 
 
 𝑑𝑁
𝑑𝑡
=
𝐼
𝑞𝑉
−
𝑁
𝜏𝑠𝑝
− 𝐺𝑁𝑁𝑝 (6.1) 
 
where τsp is the non-radiative and spontaneous emission lifetime and G is the stimulated 
emission coefficient. Photon density change is governed by the number of photons generated by 
stimulated and spontaneous emission and the number of photons lost due to optical absorption 
and scattering: 
 
 𝑑𝑁𝑝
𝑑𝑡
= 𝐺𝑁𝑁𝑝 + 𝛽𝑠𝑝
𝑁
𝜏𝑠𝑝
−
𝑁𝑝
𝜏𝑠𝑝
 (6.2) 
 
where βsp is the spontaneous emission coefficient and τp is the photon lifetime. Combining 
the two rate equations and performing the Laplace transform yields the following second-order 
low-pass transfer function of optical power Popt for a given input current: 
 
 𝑃𝑜𝑝𝑡(𝑠)
𝐼(𝑠)
=
ℎ𝑣𝑣𝑔𝛼𝑚
𝑞
×
𝐺𝑁𝑝
𝑠2 + 𝑠 (𝐺𝑁𝑝 +
1
𝜏𝑠𝑝
) +
𝐺𝑁𝑝
𝜏𝑝
 
(6.3) 
 
where vg is the light group velocity and αm is the VCSEL mirror loss coefficient.  
Rewriting (6.3) in terms of empirical parameters and defining H(f) =Popt(jf)/I(jf) we have:    
 
  
103 
 
𝐻(𝑓) = 𝑐𝑜𝑛𝑠𝑡 ×
𝑓𝑟
2
𝑓𝑟
2 − 𝑓2 + 𝑗(
𝑓
2𝜋)𝛾
 (6.4) 
 
(6.4) is a second-order low-pass transfer function with peaking. The VCSEL relaxation 
oscillation frequency fr, which is related to the effective bandwidth, is equal to: 
 
 
 𝑓𝑟 =
1
2𝜋
√
𝐺𝑁𝑝
𝜏𝑝
 (6.5) 
 
The photon density (Np) is directly proportional to the amount of injected current above 
threshold [67], thus: 
 
  𝑓𝑟 = 𝐷√𝐼 − 𝐼𝑡ℎ (6.6) 
 
 In (6.6) D (also called D-factor) denotes the rate at which the resonance frequency increases 
with bias current (I) [67]. The damping factor (γ) is proportional to the square of the resonance 
frequency [67]: 
 
  𝛾 = 𝐾𝑓𝑟
2 + 𝛾𝑜 (6.7) 
 
The K in (6.6) is called the K-factor. It sets the maximum intrinsic modulation bandwidth of 
the VCSEL. γo is called the damping factor offset. From (6.6) and (6.7), it is evident that, with 
increasing bias current, there is an associated increase of the resonance frequency and therefore 
also of the damping factor. Initially, the modulation bandwidth increases with current, but 
eventually, the damping factor becomes large and the system becomes critically damped, which 
sets an upper limit to the modulation bandwidth (Figure 6.8 (b)). In addition to the intrinsic 
limitation of the VCSEL modulation bandwidth due to damping, there are extrinsic limitations. 
One such limit is the thermal limit caused by the heating of the active region induced by the bias 
current passing through the resistive elements of the VCSEL, which causes the output power to 
saturate [68].  
 
  
104 
Another extrinsic bandwidth limitation comes from the capacitance of the VCSEL, which in 
combination with the series resistance (which is mainly determined by the resistance of the 
DBRs), forms a low-pass RC filter that shunts the modulation current outside the active region 
at frequencies above the bandwidth of the filter. We can account for the effect of the low pass 
parasitic by adding a pole at fp in (6.4): 
 
 
𝐻(𝑓) = 𝑐𝑜𝑛𝑠𝑡 ×
𝑓𝑟
2
𝑓𝑟
2 − 𝑓2 + 𝑗(
𝑓
2𝜋)𝛾
×
1
1 + 𝑗 (
𝑓
𝑓𝑝
)
 
(6.8) 
 
Dependence of resonance frequency (fr) and damping factor (γ) on the bias current (I) (Figure 
6.3) makes the effective frequency response of the VCSEL non-linear when used for data 
modulation. Due to large change in I between the zero (I0) and one (I1) values (of data), the small 
signal assumption breaks down and the bandwidth of the VCSEL instead of being fixed, varies 
according to the data sequence. Thus the VCSEL, ceases to be a linear time invariant (LTI) 
system.  
 
 
Figure 6.3: VCSEL small signal AC characteristics [45]. 
 
  
105 
6.3 VCSEL Modelling for Simulation 
In order to aid the design process, exact modelling of the non-linearity of the VCSEL’s response 
is essential. Previous approaches have used the small signal assumption, in which the 
modulation response for a particular bias current is used for both ones and zeros [69]. However, 
this linearization leads to inaccuracies for large extinction ratios (i.e. large I1/I0). At the other 
end of the spectrum, exact rate equation based VCSEL modelling [70], although accurate, is 
difficult to simulate. A dynamic model based on (6.4-6.8), which takes into account the variation 
in bias current proves most efficient. 
6.3.1 Simplified Approach 
An intuitive (but not exact) approach to understanding the effect of non-linearity in the VCSEL 
response is shown in Figure 6.4. Suppose the VCSEL is modulated with a data sequence with I0 
and I1 being the bias currents at the zero and one levels, respectively. We also assume that rising 
and falling edges of the data sequence are infinitely fast. In this case, due to the finite response 
time of the VCSEL each rising edge will “see” a modulation response (H0(f)) given by (6.8) with 
I set to I0. Similarly, each falling edge will see a modulation response (H1(f)) given by (6.8) with 
I set to I1. With this assumption the response for the rising step (R’(t)) and falling steps (F’(t)) 
can be calculated. The incoming data stream (D(t)) can be expressed in terms of the summation 
of the rising (R(t))  and falling (F(t)) steps separated in time:  
 
 𝐷(𝑡) = ∑ 𝐵(𝑛)𝑅(𝑡 − 𝑛𝑇𝑏) + (1 − 𝐵(𝑛))𝐹(𝑡 − 𝑛𝑇𝑏) (6.9) 
 
In (6.9), B(n) represents the value of the nth bit (0 or 1) and Tb is the bit period. Assuming the 
response of R(t) is R*(t) and that of F(t) is F*(t), the total VCSEL response to the input data 
sequence D(t) can be calculated simply as 
 
 𝐷∗(𝑡) = ∑ 𝐵(𝑛)𝑅∗(𝑡 − 𝑛𝑇𝑏) + (1 − 𝐵(𝑛))𝐹
∗(𝑡 − 𝑛𝑇𝑏) (6.10) 
 
This simplified approach, although intuitive, is not practical as actual data sequences have 
finite rise and fall times and the assumption of infinite slope does not hold. 
  
106 
 
 
Figure 6.4: Simplified, non-linear VCSEL modeling.  
 
6.3.2 Electrical Model 
For accurate modelling of VCSEL characteristics, we separate the intrinsic optical dynamics and 
extrinsic electrical parasitics. Figure 6.5 shows the electrical model of the VCSEL. Cj and Rj 
represent the junction capacitance and resistance, respectively. In addition to the junction 
resistance, there is also a significant series resistance due to the large number of distributed Bragg 
reflector (DBR) mirrors used for high reflectivity. This is represented by Rs in Figure 6.5. Cp and 
Rp represent the pad capacitance and resistance formed between the p-bond pad and the 
conducting n-side. In Figure 6.5, some of the total current (I) gets diverted to the parasitic 
capacitors Cj and Cp: the actual amount of useful current is represented by the current flowing 
into the junction resistance (IRj). The typical values of these parameters in modern VCSELs [44] 
are listed below: 
 
Parameter Value 
Junction Capacitance (Cj) 110-117fF 
Junction Resistance (Rj) 180-150Ω 
DBR Resistance (Rs) 50Ω 
Pad Capacitance (Cp) 10fF 
Pad Resistance (Rp) 1Ω 
 
Table 6.1: Typical VCSEL electrical parasitics values [44]. 
  
107 
 
The values of the junction capacitance (Cj) and junction resistance (Rj) are bias dependent 
but due to their small variation range (Table 6.1), their average values are used in the model. 
 
 
Figure 6.5: VCSEL electrical parasitics. 
 
6.3.3 Optical Model 
The second order nature of the VCSEL optical dynamics (6.4) allows us to model them as a 
series RLC circuit [71]. However, unlike [71], we make our model dynamic such that it takes 
into account the non-linearity inherent in (6.6) and (6.7). Figure 6.6 shows the proposed optical 
model consisting of a series RLC (RVL, LVL and CVL) circuit and driven by voltage source of value 
η(I-Ith), with η representing the slope efficiency and Ith the threshold current of the VCSEL. The 
voltage of the capacitor (CVL) is used as the output (Pout). The transfer function from the voltage 
source and the output can be easily calculated (Figure 6.6):  
 
 𝑃𝑜𝑢𝑡(𝑓)
𝜂(𝐼 − 𝐼𝑡ℎ)(𝑓)
=
1
1 − 𝐿𝑉𝐿𝐶𝑉𝐿 (
𝑓
2𝜋)
2
+ 𝑗 (
𝑓
2𝜋) 𝑅𝐶
 
(6.11) 
 
(6.4) has two independent variables and (6.11) has two, so we (arbitrarily) fix the value of CVL 
to 100fF and calculate the values of LVL and RVL based on (6.4-6.7). LVL and RVL can be shown 
to be equal to 1/{4π2CVLD2(I-Ith)} and (Kfr2+γ0)LVL, respectively.  As expected, the values of LVL 
and RVL are dependent on the bias current flowing through the VCSEL (I). This takes into 
  
108 
account the inherent non linearity of the VCSEL. VerilogA based dynamic models of LVL and 
RVL are used in the simulation. The typical values of the constants in the expressions of LVL and 
RVL, in modern VCSELs [45] are tabulated below. 
 
Parameter Value 
Threshold current (Ith) 0.6mA 
Slope efficiency (η) 0.78mW/mA 
D-factor (D) 7.6GHz/mA0.5 
K-factor (K) 0.25ns 
Damping factor offset (γo) 37ns-1 
 
Table 6.2: Typical VCSEL optical modelling parameters [45]. 
 
 
Figure 6.6: Optical model of a VCSEL. 
 
6.3.4 Complete Model 
The complete dynamic model of the VCSEL is shown in Figure 6.7. The electrical and optical 
models are combined by changing the voltage source of the optical model to a current dependent 
voltage source and replacing I with IRj (the current flowing in the junction resistance). To use 
this model in a circuit simulator, the modulated current is provided to the input of the electrical 
model and the output of the optical part generates the effective optical power (Pout).  
 
  
109 
 
Figure 6.7: Combined model for simulating a VCSEL. 
 
6.4 Model Evaluation 
We built our model based on the VCSEL parameters listed in Table 6.1 and 6.2. To evaluate the 
accuracy of our VCSEL modelling we generated the modulation response (H(f)) for different 
bias currents and compared it against  the measured modulation response from [45]. As shown 
in Figure 6.8, the simulated modulation response matches closely with the shape and bandwidth 
of the measured response for different bias currents. For example, the measured bandwidth for 
a 11.5mA bias current is 20GHz and that predicted by the model is 19.89GHz. 
In addition, the measured results in [45] suggest that the maximum bandwidth is achieved at 
11.5mA and then bandwidth diminishes as current increases. To verify if the model also predicts 
the same we plotted the bandwidth, based on simulation of the model, for different bias currents. 
As shown in Figure 6.9 the bandwidth reaches a maximum of 19.89GHz at 11.5mA and then 
  
110 
decreases as current is increased further. This behavior is also in line with the discussion in 
section 6.2. 
  
 
Figure 6.8: VCSEL modelling: comparing the measured (top) and simulated (bottom).  
 
 
  
111 
 
Figure 6.9: Simulated modulation bandwidth variation with bias current. 
 
 
6.5 VCSEL Equalization Methodology 
Bandwidth limitations in the VCSEL’s optical response limits the speed of optical transmitters. 
In addition, better power efficiency (Figure 6.8) and mean time to failure (MTTF) [72] demands 
the biasing of the current at a lower bias current and thus, lower bandwidth. As data rates scale, 
there is an increased need to have equalization circuitry to compensate for the VCSEL 
bandwidth restrictions. Previous designers have relied on established electrical transmitter 
equalization techniques [63], [73]; for example, finite impulse response (FIR) based pre-
emphasis. 
6.5.1 Conventional FIR-Based Pre-Emphasis 
Equalization eliminates the problem of frequency-dependent attenuation by filtering the 
transmitted or received waveform so that the overall system exhibits a flat frequency response. 
For instance, in a transmitter equalizer, if the transfer characteristics of the channel is expressed 
  
112 
by A(z), the transmitter equalization transfer function, P(z), should be designed such that 
A(z)xP(z) = 1 or P(z) = 1/A(z), as shown in Figure 6.10. Often times it is not possible to 
implement the exact required P(z); however, there are techniques to closely approximate the 
target transfer function. Transversal filters (FIR filters) are mainly used to perform the 
transmitter equalization [74]. The transfer function, H(z) can be written as 
 
 𝐻(𝑧) = 1 + 𝑎1𝑧
−1 + ⋯ +  𝑎𝑛𝑧
−𝑛 (6.12) 
 
where ai’s are called the tap coefficients (or taps in short) and n is the total number of 
equalization taps. N determines how well H(z) matches the target transfer function P(z). The 
larger the number of taps in the equalizer, the better the approximation of P(z) is achieved. 
Figure 6.11 illustrates how an FIR-based transmitter reduces ISI. This technique is very well 
suited for digital communication techniques, in which generating a delay is very straightforward 
through use of latches and flip-flops as shown in Figure 6.12. 
 
 
Figure 6.10: Transmitter equalization boosts the high frequency component to achieve a 
flat response. 
 
  
113 
 
Figure 6.11: Pulse response of channel (right) before and after pre-emphasis. 
 
 
 
Figure 6.12: Block diagram of a transmitter with n-tap FIR-based equalization. 
 
  
114 
6.5.2 Proposed Equalization Technique 
Conventional pre-emphasis technique is designed to efficiently equalize linear time invariant 
channels (eg. electrical copper traces). However, a VCSEL does not have a linear frequency 
response. Figure 6.13 (a) and (b) show the responses of isolated one and zero pulses generated 
from our model. The responses are superimposed after flipping the zero response. Figure 6.13 
(c) shows that responses are not equivalent. 
  
 
Figure 6.13: VCSEL pulse response for (a) isolated 1, (b) isolated 0, (c) responses 
superimposed. 
 
  
115 
 
 The asymmetry becomes more pronounced as the bias current is reduced. Figure 6.14 shows 
the pulse responses for isolated one and an isolated zero, for two cases. In Figure 6.14 (a), I0 is 
set at a high value, (4mA) whereas in Figure 6.14 (b) the I0 is set at a lower value (2mA). For the 
same extinction ratio (ER), there is greater difference between the one and the zero responses 
for the lower current case. The conventional FIR based transmitter equalization would be 
“blind” to this asymmetry, i.e. it would equalize an isolated one pulse in the same manner as 
the isolated zero, leading to sub-optimal performance.  
 
 
 
Figure 6.14: VCSEL pulse responses for different bias currents. 
 
Figure 6.15: Proposed equalization technique. 
  
116 
The fundamental cause of the asymmetry between isolated one and zero responses is that the 
non-linearity in the VCSEL causes, it to respond differently to rising and falling edges of data. 
To take this effect into account, we propose a modification to the conventional pre-emphasis 
equalization. We detect the rising and falling edges and equalize them differently, based on the 
response of the VCSEL to an isolated zero and isolated one (Figure 6.13). Figure 6.15 shows the 
architecture of the proposed equalization technique. Input data (Din) is delayed by an 
equalization delay of teq. Unlike conventional digital FIR-transmitter pre-emphasis the teq is not 
set to be a multiple of the bit period. Simulations based on the VCSEL model show that the 
effect of the proposed equalization technique is to cancel the peaking in the typical second order 
response of the VCSEL. The minimum of this “anti-peak” occurs at 1/2teq. Thus, we set the teq 
based on the position of the peak of the VCSEL’s modulation response. This response itself is 
dependent on the bias current (Figure 6.8) and independent of the data rate. 
 
 
Figure 6.16: Proposed method for selecting teq. 
 
 
6.6 Simulated Results 
To investigate the efficacy of the proposed equalization technique, we performed two sets of 
simulations to generate optical eye-diagrams using the VCSEL model (for a PRBS15 data 
sequence). In the first case no equalization was used and in the second case we used the proposed 
technique. 
  
117 
 
Figure 6.17: Simulated optical eye-diagrams with and without equalization. (a) 20Gb/s 
high current, (b) 20Gb/s low current, (c) 30Gb/s. 
 
  
118 
Figure 6.17 (a) and (b) show the simulated eye-diagram for 20Gb/s with the VCSEL biased 
at (Ibias) 4mA and 2mA respectively. Figure 6.17 (c) shows the simulated eye-diagrams for 
30Gb/s with an Ibias of 4mA. The extinction ratio was fixed to 2dB for all three cases. The 
percentage improvement in the vertical and horizontal eye opening and the required tap 
strengths (Ir/I and If/I) and teq (Figure 6.15) are presented in Table 6.3. 
 
Data 
Rate 
Ibias 
Rise Tap 
(Ir/I) 
Fall Tap 
(If/I) 
teq 
% vertical 
improvement 
% horizontal 
improvement 
20 
Gb/s 
4mA 0.25 0.19 33ps 16% 22% 
20 
Gb/s 
2mA 0.45 0.25 45ps 70% 38% 
30 
Gb/s 
4mA 0.19 0.28 33ps 10% 33% 
 
Table 6.3: Summary of simulated improvement by the proposed VCSEL equalization 
technique. 
 
Three important facts are suggested by Table 6.3. Firstly, for efficient VCSEL equalization 
the rise and fall taps must be asymmetric. Secondly the proposed technique is more effective 
when the VCSEL is biased at a low current. And finally, the teq delay is independent of the data 
rate and is dependent on the bias current (Ibias). 
 
6.7 Circuit Implementation  
Figure 6.18 shows the circuit architecture of the proposed VCSEL equalization scheme. In order 
to generate a (pseudo) random, an on-chip high-speed quarter-rate PRBS-15 transmitter is used. 
Quarter-rate architecture is chosen to relieve the speed requirement of the PRBS generator. A 
high-speed, 16bit shift register is also integrated to enable the application of arbitrary patterns to 
the transmitter for testing and debugging purposes. A QLL based front-end is used for converting 
the low swing input clock (~100mV) to the rail-to-rail digital domain. The QLL also enables the 
generation of quadrature phase clocks for the quarter-rate PRBS. Conventional clock front-ends 
  
119 
[75] use power hungry CML-to-CMOS convertors (Figure 6.19). The quadrature phases are 
provided eternally [75] which requires the usage of two CML-CMOS convertors thereby 
doubling the power consumption.  
 
 
Figure 6.18: Circuit architecture. 
 
In contrast, the QLL based clocking (Figure 6.20) uses the inherent high voltage gain of 
injection locking [53] to generate rail-to-rail clock from the low amplitude analog clock input at 
a low power overhead. The quadrature error tracking loop ensures a large locking range (3-
8GHz) and accurate quadrature phase generation from a single phase of clock. 
 
  
120 
 
Figure 6.19: Conventional CML-to-CMOS structure used for digital clock generation from 
an analog input. 
 
 
Figure 6.20: QLL based CML-to-CMOS conversion and quadrature phase generation. 
 
  
121 
The equalization delay (teq) is implemented by a four stage differential delay stage with analog 
and digital delay controls for fine and coarse delay controls. It has a total delay range of 25ps to 
40ps. The rising and falling edge detectors are implemented via digital CMOS gates. A typical 
VCSEL output driver, with a differential stage steering current between the VCSEL and a 
dummy load, and an additional static current source (Ibias), to bias the VCSEL sufficiently above 
the threshold current, is used. The rise and fall taps are implemented by adding additional 
differential pairs to the output driver. The tail current sources for all the differential pairs are 
implemented using the low voltage cascode structure. The tail currents are controlled externally 
to control the strength of the taps. The output stage is designed for a higher voltage supply (2.5V) 
due to the typical VCSEL diode knee voltage (1.7V) exceeding normal CMOS supplies (1V). 
 
 
Figure 6.21: Chip micrograph and layout details. 
  
122 
6.8 Experimental Results 
The test chip was fabricated in a 32nm SOI CMOS process. The die micrograph and core detail 
are presented in Figure 6.21. Core area is 100μm x 60μm, in a 1mm x 1mm die. The VCSEL is 
wire-bonded to the test chip. 
6.8.1 Optical Measurement Setup 
An essential part of the optical measurement setup is coupling the light from the VCSEL to the 
optical fiber. Light can be simply coupled by appropriately positioning a cleaved bare optical 
fiber near the surface of the VCSEL (butt coupling).  However, due to the divergence angle out 
of the VCSEL being larger than the acceptance angle into the fiber, there is a loss of about -3dB. 
In addition, vibrations (due to air) in the bare fiber translates to optical noise. Figure 6.22 shows 
the setup for bare fiber coupling.  
 
 
Instead of butt coupling the measurement setup shown in Figure 6.23 is used. The setup 
relays the image of the VCSEL onto the surface of the fiber, with the magnification of 2x. With 
a magnification of 2x the 4μm  diameter VCSEL spot gets imaged to an 8μm diameter at the 
 
Figure 6.22:  Butt coupling proves too lossy and noisy for VCSEL measurements. 
  
123 
surface of the fiber, while at the same time the divergence angle going into the fiber is divided by 
two relative to the divergence angle directly out of the VCSEL, leading to more efficient 
coupling. Two lenses with a ratio of two in focal lengths are used to achieve this. A 6mm 
(A110TM-B) lens is used to collimate the light coming from the VCSEL and an 11mm 
(F220APC-780) lens is used to focus the collimated light into the fiber. An angle-polished multi-
mode fiber is connected to the lens setup via a standard APC connector. The fiber is polished at 
an 8o angle to avoid optical feedback. This setup helped in reducing the coupling loss to -0.5dB.  
 
 
Figure 6.23: Optical measurement setup. 
 
6.8.2 Measured Eye-Diagrams 
An Anritsu clock generator is used to a supply single phase clock to the QLL frontend. The QLL 
frontend is used to generate the quadrature phase clocks for the high speed quarter-rate PRBS-
15 generator. The QLL has a locking range of 3-8GHz; correspondingly the PRBS generator has 
measured working range of 15-32Gb/s.  
In order to establish the efficacy of the proposed equalization technique, VCSEL outputs 
were measured for four cases at a data–rate of 16Gb/s at a low current bias (Ibias).  As shown in 
Figure 6.24, without any equalization the eye is open but there is an asymmetry of the one and 
  
124 
zero levels. The optical noise is greater for the zero level than for the one level. The rise tap 
proves more effective in countering this asymmetry than the fall tap Figure 6.24. The optimum 
symmetrical eye is achieved when the rise and fall taps have a ratio of 2:1. The teq was set to its 
maximum value (45ps). Average optical DC power was fixed at 1.5mW for all four 
measurements.   
 
 
Figure 6.24: Measured VCSEL optical output at 16Gb/s (PRBS-15), with and without 
equalization.   
 
A maximum data rate of 20Gb/s is achieved (Figure 6.25 (b)). The optimum optical eye at 
20Gb/s is achieved with an extinction ratio of 2dB and 65% horizontal opening. To show the 
improvement achieved by the proposed equalization technique, the unequalized eye at 20Gb/s 
is also shown in Figure 6.25 (a). The ratio of the rise and fall taps is again 2:1 and teq is set to its 
lowest setting 25ps. Single-ended operation is used to save power. The VCSEL output stage 
  
125 
draws 5.5mA from a 2.5V power supply. The rest of the equalization circuit consumes 1.6mW 
from a 1V supply. This translates to an ultra-low power efficiency of 0.77pJ/bit. The maximum 
data rate is essentially limited by the bandwidth of the VCSELs (Figure 6.25 (a)). 
 
 
Figure 6.25: Measured optical eye-diagram for PRBS-15 data at 20Gb/s. (a) Unequalized 
(b) Equalized. 
 
  
126 
6.9 Summary 
We presented a novel modelling technique that takes into account the inherent non-linearity in 
the VCSEL’s frequency response. The time domain optical responses for a one and a zero were 
used to arrive at an optimum equalization strategy. The rising and falling edges were equalized 
separately and the equalization delay was selected based on the bias current of the VCSEL. The 
equalization technique was used to achieve ultra-low power efficiency of 0.77pJ/bit at a data-
rate of 20Gb/s. The ideas generated could be easily be integrated in to the next generation of 
VCSEL based optical transmitters. 
 
  
  
127 
Chapter 7: Conclusion 
Over the past decade, wireline I/O has been instrumental in enabling the incredible scaling of 
computer systems, ranging from handheld electronics to supercomputers. In part, this increase 
in bandwidth is enabled by expanding the number of I/O pins per component. As a result, I/O 
circuitry consumes an increasing amount of area and power on today’s chips. Increasing 
bandwidth has also been enabled by rapidly accelerating the per-pin data-rate. As shown in 
Figure 7.1, this trend is anticipated to continue, according to the International Technology 
Roadmap for Semiconductors (ITRS) [76]. However, enabling this rather amazing trend for I/O 
scaling will require more than just Moore’s law scaling [77] of transistor sizes. Significant 
advances in both energy efficiency and signal integrity are required in order to enable the next 
generation of low-power and high-performance computing systems. To this effect, this 
dissertation presents high performance design techniques for the three fundamental components 
of a high-speed link, namely, transmitter, receiver, and clocking.  
 
 
Figure 7.1: Constant growth of the required I/O bandwidth according to ITRS. 
 
  
128 
At the clocking front, injection locking is fast emerging as an ultra-low power alternative to 
the conventional PLL and DLL based approaches. As shown in Figure 7.2, number of 
publications (in ISSCC) based on injection locked clocking for wireline applications has 
increased steadily throughout the past decade. However, there are still some challenges that have 
not been successfully tackled by previous publications. Among them the most important is the 
limited locking range of injection locked oscillators. A small locking range makes injection 
locking less suitable for wideband application (e.g. transceivers in modern FPGAs [54]). In 
addition this also makes the injection locking based system prone to PVT variations. In this 
dissertation we introduced two architectures that tackle this issue. In addition, we also used these 
architectures for low-power quadrature phase generation, a prerequisite for energy efficient 
quarter-rate clocking architectures. 
 
 
In the first part of this dissertation we described a wideband injection locking scheme in an 
LC oscillator. PLL and injection locking elements were combined symbiotically to achieve wide 
locking range while retaining the simplicity of the latter. The method does not require a phase 
frequency detector or a loop filter to achieve phase lock. A mathematical analysis of the system 
was presented and puts the technique on a firm theoretical footing. A locking range of 13.4 GHz–
17.2 GHz (25%) and an average jitter tracking bandwidth of up to 400 MHz were measured in 
 
Figure 7.2: Number of injection locking based wireline publications in International Solid-
States Circuits Conference (ISSCC) across a decade. 
  
129 
a high-Q LC oscillator. This architecture was used to generate quadrature phases from a single 
clock without any frequency division. It also provides high frequency jitter filtering while 
retaining the low frequency correlated jitter essential for forwarded clock receivers.  
To improve the locking range of an injection locked ring oscillator, QLL (quadrature locked 
loop) was introduced. We mathematically proved that the phase mismatch in the outputs of a 
quadrature ring ILO contains information about the difference between its natural and injected 
frequencies, in both locked and unlocked states. The phase mismatch was measured and used 
to track the injected frequency dynamically and increase the effective locking range. The 
technique improves an ILO’s locking range from 5.5% (7-7.4GHz) to 90% (4-11GHz) without 
using a phase frequency detector (PFD). The dynamics of the system were derived and were 
shown to have first order characteristics. This guarantees stability without peaking, in contrast 
to a second order IL PLL.  The system was used to generate accurate quadrature phases, without 
any frequency division, from a single phase of reference clock input, supplied electrically or 
optically. A power efficient two stage ring oscillator combined with the low jitter performance 
of the ILO allows us to achieve the best (jitter and power) FOM. This technique could be easily 
extended to be used for wideband injection in injection locking based frequency multipliers [17] 
and CDR [39].  
As the bandwidth demand for traditional electrical wireline interconnects has accelerated, 
optics has become an increasingly attractive alternative for interconnects within computing 
systems. The negligible frequency dependent loss of optical channels provides the potential for 
optical link designs to fully utilize increased data rates provided through CMOS technology 
scaling without excessive equalization complexity. However, this can only be a viable solution 
if significant power benefits can be achieved. To address future optical interconnects power 
consumption requirement, we proposed QLL based low power clocking circuit for a four 
channel quarter-rate optical receiver and a low-power VCSEL based optical transmitter. 
The QLL was used to generate accurate clock phases for a four channel optical receiver using 
a forwarded clock at quarter-rate. The QLL drives an ILO at each channel, without any 
repeaters, for local quadrature clock generation. Each local ILO has deskew capability for phase 
alignment. The optical receiver uses the inherent frequency-to-voltage conversion provided by 
the QLL to dynamically body bias its devices. The wide locking range of the QLL helps to 
achieve a reliable data-rate of 16-32Gb/s and adaptive body biasing aids in maintaing an ultra-
low power consumption of 153fJ/bit. Measured results validated the feasibility of the QLL and 
the optical receiver for ultra-low-power, high data-rate, and massively parallel optical links. 
  
130 
We ended the dissertation by presenting a novel modelling technique that takes into account 
the inherent non-linearity in the VCSEL’s frequency response. The modelling provided an 
important insight. The conventional FIR-based pre-emphasis works well for LTI electrical 
channels but is not optimum for the non-linear optical response of the VCSEL. The time domain 
optical responses for a one and q zero were used to derive an optimum equalization strategy. 
The rising and falling edges were equalized separately, and the equalization delay was selected, 
based on the bias current of the VCSEL. The equalization technique was used to achieve an 
ultra-low power efficiency of 0.77pJ/bit at a data-rate of 20Gb/s. The ideas generated in this 
dissertation could be easily be integrated into the next generation of VCSEL based optical 
transmitters.  
  
  
131 
List of Abbreviations 
AWGN                Additive white Gaussian noise 
BB                         Body biasing 
BER                      Bit error rate 
BOX                     Buried oxide layer 
CDR                     Clock-data recovery 
CMOS                  Complementary meta-oxide-semiconductor 
DJ                         Deterministic jitter 
DLL                      Delay-locked loop 
DFE                      Decision-feedback equalizer 
EC                         Embedded clock 
FC                         Forwarded clock 
FDSOI                  Fully depleted silicon on insulator 
FIR                        Finite impulse response 
FLL                       Frequency locked loop 
FOM                     Figure of merit 
FPGA                   Field-programmable gate array  
Gb/s                     Gigabit-per-second  
IC                          Integrated circuit 
I/O                        Input/Output 
IL                          Injection locking 
ILO                       Injection locked loop 
IL PLL                  Injection locked phase locked loop  
ISI                          Inter symbol interference 
LBW                     Low bandwidth  
LPF                       Low-pass filter 
LTI                        Linear time invariant 
MQPE                  Mean quadrature phase error 
PFD                      Phase frequency detector 
PI                          Phase interpolator 
  
132 
PLL                       Phase-locked loop 
PRBS                    Pseudo-random bit sequence (usually appended with a number indicating its 
                              length; for example, a PRBS-7 is 27-1 = 127 bits long) 
PVT                      Process, voltage, temperature                  
QLL                      Quadrature locked loop 
RJ                          Random jitter 
RO                         Ring oscillator 
RMS                      Root mean square 
SNR                      Signal-to-noise ratio 
TIE                        Time interval error 
UI                          Unit interval (one bit-time in a data stream) 
VCDL                   Voltage-controlled delay line 
VCO                     Voltage-controlled oscillator 
VCSEL                 Vertical-cavity surface-emitting laser 
WDM                   Wavelength division multiplexing 
  
  
133 
Bibliography 
[1]  W. Knight, "Two Heads Are Better Than One," IEEE Review, Sep. 2005.  
[2]  "ISSCC 2012 Trends," [Online]. Available: isscc.org/doc/2012/2012_Trends.pdf. 
[3]  "ISSCC 2014 Trends," [Online]. Available: isscc.org/doc/2014/2014_Trends.pdf. 
[4]  H. Sugita, K. Sunaga, K. Yamaguchi and M. Mizuno, "A 16Gb/s 1st-Tap FFE and 3-Tap 
DFE in 90nm CMOS," IEEE ISSCC Dig. Tech. Papers, pp. 162-163, Feb. 2010.  
[5]  X. Lin, S. Saw and J. Liu, "A CMOS 0.25-μm Continuous-Time FIR Filter With 125 ps 
Per Tap Delay as a Fractionally Spaced Receiver Equalizer for 1-Gb/s Data 
Transmission," IEEE J. Solid State Circuits, vol. 40, no. 3, pp. 593-602, Mar. 2005.  
[6]  T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, A. Prati, D. Gardellini, M. Brandli, M. 
Kossel, P. Buchmann, P. A. Francese and T. Morf, "A 2.6mW/Gbps 12.5Gbps RX with 
8-tap Switched-Cap DFE in 32nm CMOS," IEEE Symp. on VLSI Circuits Digest, pp. 210-
211, Jun. 2011.  
[7]  W. Dally and J. Poulton, "Transmitter Equalization for 4-Gbps Signaling," IEEE Micro, 
vol. 17, no. 1, pp. 48-56, Jan. 1997.  
[8]  R. A. Morgan, "Vertical-cavity surface-emitting lasers: present and future," Proc. SPIE, vol. 
14, 1997.  
[9]  K. Iga, "Surface-emmiting laser-its birth and generation of new optoelectronics field," IEEE 
J. of Selected Topics in Quantum Electronics, vol. 6, no. 6, pp. 1201-1215, Dec. 2000.  
[10]  A. L. Lentine, K. W. Goosen, J. A. Walker, L. M. F. Chirovsky, L. A. D'Asaro, S. P. Hui, 
B. T. Tseng, R. E. Leibenguth, D. P. Kossives, D. W. Dahringer, D. D. Bacon, T. K. 
Woodward and D. A. .. B. Miller, "Array of optoelectronic switching nodes comprised of 
flip-chip bonded MQW modulators and detectors on silicon CMOS circuitry," IEEE 
Photon. Tech. Lett., vol. 8, no. 2, pp. 221-223, Feb. 1996.  
[11]  M. Cai, G. Hunziker and K. Vahala, "Fiber-optic add-drop device based on a silica 
microshphere-whispering gallery mode system," IEEE Photon. Tech. Lett., vol. 11, no. 6, pp. 
686-687, Jun. 1999.  
  
134 
[12]  C. Kromer, G. Sialm, C. Berger, T. Morf, M. L. Schmatz, F. Ellinger, G. -L. Bona and H. 
Jackel, "A 100-mW 4x10Gb/s Transciever in 80-nm CMOS for High-Density Optical 
Interconnects," IEEE J. of Solid-State Circuits, vol. 40, no. 12, pp. 2667-2679, Dec. 2005.  
[13]  A. Emami-Neyestanak, D. Liu, G. Keeler and N. H. a. M. Horowitz, "A 1.6Gb/s, 3mW 
CMOS Receiver for Optical Comminication," IEEE Symp. on VLSI Circuits, Jun. 2002.  
[14]  M. Nazari and A. Emami-Neyestanak, "A 24-Gb/s Double-Sampling Receiver for Ultra-
Low-Power Optical Communication," J. of Solid State Circuits, vol. 48, no. 2, pp. 344-357, 
Feb. 2013.  
[15]  L. Zhang, B. Ciftcioglu, M. Huang and H. Wu, "Injection-Locked Clocking: A New GHz 
Clock Distribution Scheme," IEEE Custom Integrated Circuits Conference, pp. 785-788 , 2006.  
[16]  J. Jaussi, B. Casper, M. Mansuri, F. O’Mahony, K. Canagasaby, J. Kennedy and R. 
Mooney, "A 20 Gb/s embedded clock transceiver in 90 nm CMOS," ISSCC Dig. Tech. 
Papers, pp. 340-341, Feb. 2006.  
[17]  M. Hossain and A. Carusone, "A 6.8mW 7.4Gb/s clock-forwarded receiver with up to 
300MHz jitter tracking in 65nm CMOS," IEEE Int. Solid-State Circuits Conf. Dig. Tech. 
Papers, pp. 158-159, Feb. 2010.  
[18]  JEDEC Standard, "DDR 3 SDRAM Specification," document no. JESD79-3E, Jul. 2010. 
[19]  HyperTransport Consortium, "HyperTransport I/O Link Specification," document no. 
HTC20051222–0046-0035, 2010. 
[20]  N. Kurd, J. Douglas, P. Mosalikanti and R. Kumar, "Next generation Intel® micro- 
architecture (Nehalem) clocking architecture," in IEEE Symp. VLSI Circuits Dig., Jun. 2008.  
[21]  K. Lee, S. Kim, Y. Shin, D.-K. Jeong, G. Lim, B. Kim, V. Da Costa and D. Lee, "A jitter-
tolerant 4.5 Gb/s CMOS interconnect for digital display," IEEE ISSCC Dig. Tech. Papers, 
pp. 310-311, Feb. 1998.  
[22]  G. Shanbag and N. Balamurugan, "Modeling and mitigation of jitter in multi-Gbps source-
synchronous I/O links," Proc. 21st Int. Conf. Computer Design, pp. 254-260, 2003.  
[23]  K. Hu, T. Jiang, J. Wang, F. O'Mahony and P. Chiang, "A 0.6 mW/Gb/s, 6.4–7.2 Gb/s 
Serial Link Receiver Using Local Injection-Locked Ring Oscillators in 90 nm CMOS," 
IEEE J. of Solid-State Circuits, vol. 45, no. 4, pp. 899-908, Apr. 2010.  
  
135 
[24]  S. Shekhar, G. Balamurugan, D. Allstot, M. Mansuri, J. Jaussi, R. Mooney, J. Kennedy, 
B. Casper and F. O'Mahony, "Strong Injection Locking in Low- Q LC Oscillators: 
Modeling and Application in a Forwarded-Clock I/O Receiver," IEEE Trans. Circuits Syst. 
I, Reg. Papers, vol. 56, no. 8, pp. 1818-1829, Aug. 2009.  
[25]  T. Shibasaki, H. Tamura, K. Kanda, H. Yamaguchi, J. Ogawa and T. Kuroda, "20-GHz 
Quadrature Injection-Locked LC Dividers With Enhanced Locking Range," IEEE J. Solid-
State Circuits, vol. 43, no. 3, pp. 610-618, Mar. 2008.  
[26]  P. Kinget, R. Melville, D. Long and V. Gopinathan, "An injection-locking scheme for 
precision quadrature generation," IEEE J. Solid-State Circuits, vol. 37, no. 7, p. 845–851, Jul. 
2002.  
[27]  C. K. K. Yang, "Design of High-Speed Serial Links in CMOS," Ph.D. Dissertation, 
Stanford University, Dec. 1998. 
[28]  J. G. Proakis and M. Salehi, Digital Communications, McGraw-Hill, 2007.  
[29]  M. Hossain and A. Carusone, "CMOS Oscillators for Clock Distribution and Injection-
Locked Deskew," IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2138-2153, Aug. 2009.  
[30]  A. Bhatt, Creating a PCI Express Interconnect, PCI-SIG White Paper, 2002.  
[31]  "International Telecommunication Union G-Series Specifications," [Online]. Available: 
www.itu.int. 
[32]  K. Odaka, "Method and apparatus for encoding binary data". United States of America 
Patent 4456905, Oct. 1984. 
[33]  "Telcordia Technologies, Synchronous Optical Network (SONET) Transport Systems: 
Common Generic Criteria (A Module of TSGR, FR-440) GR-253-CORE," vol. 3, Sep. 
200.  
[34]  "Fibre Channel—Methodologies for Jitter and Signal Quality Specification," Jun. 2004. 
[Online]. Available: www.t11.org/index.ht,T11/04-101v5. 
[35]  A. E. Siegman, Lasers, Mill Valley, CA: University Science Books, 1986.  
[36]  R. R. Ward, The Living Clocks, New York: Alfred Knopf, 1971.  
[37]  H. Rategh and T. Lee, "Superharmonic injection-locked frequency dividers," IEEE J. Solid-
State Circuits, vol. 34, no. 6, pp. 813-821, Jun. 1999.  
  
136 
[38]  X. Zhang, X. Zhou, B. Aliener and a. A. S. Daryoush, "A study of sub- harmonic injection 
locking for local oscillators," IEEE Microw. Guided Wave Lett., vol. 2, no. 1, pp. 97-99, Mar. 
1992.  
[39]  J. Lee and M. Liu, "A 20-Gb/s Burst-Mode Clock and Data Recovery Circuit Using 
Injection-Locking Technique," IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 1818-1829, 
Mar. 2008.  
[40]  R. Adler, "A study of locking phenomena in oscillators," Proc. IEEE, vol. 61, pp. 1380-
1385, Oct. 1973.  
[41]  B. Razavi, "A study of injection locking and pulling in oscillators," IEEE J. Solid-State 
Circuits, vol. 39, no. 9, pp. 1415-1424, Sep. 2004.  
[42]  D. Bossert, D. Collins, I. Abey, J. B. Clevenger, C. J. Helms, W. Luo, C. X. Wang and H. 
Q. Hou, "Production of High-Speed Oxide Confined VCSEL Arrays for Datacom 
Applications," Proc. SPIE, vol. 4649, no. 142, pp. 142-151, Jun. 2002.  
[43]  D. Vez, S. Eitel, S. G. Hunziker, G. Knight, M. Moser, R. Hoevel, H.-P. Gauggel, M. 
Brunner, A. Hold and K. H. Gulden, "10 Gbit/s VCSELs for Datacom: Devices and 
Applications," Proc. SPIE, vol. 4942, pp. 29-43, Apr. 2003.  
[44]  Y. Ou, J. S. Gustavsson, P. Westbergh, A. Haglund, A. Larsson and A. Joel, "Impedance 
characteristics and parasitic speed limitations of high-speed 850-nm VCSELs," IEEE 
Photon. Technol. Lett., vol. 21, pp. 1840-1842, 2009.  
[45]  P. Westbergh, J. S. Gustavsson, Å. Haglund, M. Sköld, A. Joel and A.Larsson, "High 
speed, low current density 850 nm VCSELs," IEEE J.Sel. Top. Quantum Electron, vol. 15, 
no. 3, pp. 694-703, 2009.  
[46]  "Serdes Framer Interface Level 5 Phase 2 (SFI-5.2): Implementation Agreement for 
40Gb/s Interface for Physical Layer Devices," Inter Optical Internetworking Forum, Oct. 
2006.  
[47]  S. Levantino, C. Samori, A. Bonfanti, S. L. J. Gierkink, A. Lacaita and V. Boccuzzi, 
"Frequency dependence on bias current in 5 GHz CMOS VCOs: impact on tuning range 
and flicker noise upconversion," IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. 1003-1011, 
Aug. 2002.  
  
137 
[48]  B. Razavi, Design of Integrated Circuits for Optical Communications, New york: 
McGraw-Hill, 2002.  
[49]  J. Sewter and A. Carusone, "A 3-Tap FIR Filter With Cascaded Distributed Tap 
Amplifiers for Equalization Up to 40 Gb/s in 0.18- μm CMOS," IEEE J. Solid-State Circuits, 
vol. 41, no. 8, pp. 1919-1929, Aug. 2006.  
[50]  M. Raj and A. Emami-Neyestanak, "A wideband injection locking scheme and quadrature 
phase generation in 65nm CMOS," IEEE Radio Freq. Integr. Circuits Symp, pp. 261-264, Jun. 
2013.  
[51]  M. Raj and A. Emami, "A Wideband Injection-Locking Scheme and Quadrature Phase 
Generation in 65-nm CMOS," IEEE Transactions on Microwave Theory and Techniques, vol. 
62, no. 4, pp. 763-772, Apr. 2014.  
[52]  B. M. Helal, C.-M. Hsu, K. Johnson and M. Perrot, "A Low Jitter Programmable Clock 
Multiplier Based on a Pulse Injection-Locked Oscillator With a Highly-Digital Tuning 
Loop," IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1391-1400, May 2009.  
[53]  L. Zhang, A. Carpenter, B. Ciftcioglu, A. Garg, M. Huang and H. Wu, "Injection-Locked 
Clocking: A Low-Power Clock Distribution Scheme for High-Performance 
Microprocessors," Dig. Symp. VLSI Circuits, vol. 16, no. 9, pp. 1251-1256, Sep. 2008.  
[54]  J. Savoj, K. Hsieh, P. Upadhyaya, F.-T. An, J. Im, X. Jiang, J. Kamali, K. W. Lai, D. 
Wu, E. Alon and K. Chang, "Design of high-speed wireline transceivers for backplane 
communications in 28nm CMOS," Proc. CICC, pp. 1-4, Sep. 2012.  
[55]  W. Deng, A. Musa, T. Siriburanon, M. Miyahara, K. Okada and A. Matsuzawa, "A 
0.022mm2 970μW dual-loop injection-locked PLL with -243dB FOM using synthesizable 
all-digital PVT calibration circuits," ISSCC Dig. Tech. Papers, pp. 248-249, Feb. 2013.  
[56]  J. Chien, P. Upadhyaya, H. Jung, S. Chen, W. Fang, A. Niknejad, J. Savoj and K. Chang, 
"A pulse-position-modulation phase-noise-reduction technique for a 2-to-16GHz injection-
locked ring oscillator in 20nm CMOS," ISSCC Dig. Tech. Papers, pp. 52-53, Feb. 2014.  
[57]  T. Lee and A. Hajimiri, "Oscillator Phase Noise: A Tutorial," IEEE J. Solid-State Circuits, 
vol. 35, no. 3, pp. 326-336, Mar. 2000.  
  
138 
[58]  M. Raj, S. Saeedi and A. Emami, "A 4GHz-11GHz Injection-Locked Quarter-Rate 
Clocking for an Adaptive 153fJ/bit Optical Receiver in 28nm FD SOI CMOS," ISSCC 
Dig. Tech. Papers, Feb. 2015.  
[59]  M. Raj and A. Emami, "Quadrature-based injection locking of ring oscillators". United 
States of America Patent 883439-01-US-REG (Open), 18 Sep 2014. 
[60]  C. L. Schow, F. E. Doany, C. Chen, A. V. Rylyakov, C. W. Baks, D. M. Kuchta, R. A. 
John and J. A. Kash, "Low-Power 16x10 Gb/s Bi-Directional Single Chip CMOS Optical 
Transceivers Operating at < 5 mW/Gb/s/link," IEEE J. of Solid-State Circuits, vol. 44, no. 
1, pp. 301-313, Jan. 2009.  
[61]  D. Kucharski, D. Guckenberger, G. Masini, S. Abdalla, J. Witzens and S. Sahani, 
"10Gb/s 15mW Optical Receiver with Integrated Germanium Photodetector and Hybrid 
Inductor Peaking in 0.13μm SOI CMOS Technology," IEEE ISSCC Dig. Tech. Papers, pp. 
235-248, Feb. 2010.  
[62]  S. Saeedi and A. Emami, "A 25Gb/s 170μW/Gb/s Optical Receiver in 28nm CMOS for 
Chip-to-Chip Optical Communication," IEEE Radio Freq. Integr. Circuits Symposium (RFIC), 
pp. 283-286, Jun. 2014.  
[63]  S. Palermo, A. Emami-Neyestanak and M. Horowitz, "A 90 nm CMOS 16 Gb/s 
Transceiver for Optical Interconnect," J. of Solid State Circits, vol. 43, no. 5, May 2008.  
[64]  B. Casper and F. O. Mahony, "Clocking Analysis, Implementation and Measurement 
Techniques for High-Speed Data Links-A Tutorial," IEEE Trans. on Circuits and Systems-I 
Regular Papers, vol. 56, no. 1, pp. 17-39, 2009.  
[65]  T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, L. Yong and Y. Matsuoka, "A 4× 25-
to-28Gb/s 4.9mW/Gb/s -9.7dBm High-Sensitivity Optical Receiver Based on 65nm 
CMOS for Board-to-Board Interconnects," ISSCC Dig. Tech. Papers, pp. 118-119, Feb. 2013.  
[66]  T. Huang, T. Chung, C.-H. Chern, M.-C. Huang, C.-C. Lin and F.-L. Hsueh, "A 28Gb/s 
1pJ/b shared-inductor optical receiver with 56% chip-area reduction in 28nm CMOS," 
ISSCC Dig. Tech. Papers, pp. 144-145, Feb. 2014.  
[67]  L. A. Coldren and S. W. Corzine, Diode Lasers and Photonic Integrated Circuits, Wiley-
Interscience, 1995.  
  
139 
[68]  R. Safaisini, J. Joseph, D. Louderback, X. Jin, A. Al-Omari and K. Lear, "Temperature 
dependence of 980-nm oxide-confined VCSEL dynamics," IEEE Photon. Technol. Lett., vol. 
20, no. 14, pp. 1272-1275, Jul. 2008.  
[69]  S. Palermo and M. Horowitz, "High-Speed Transmitters in 90nm CMOS for High-Density 
Optical Interconnects," IEEE European Solid-State Circuits Conference, Feb. 2006.  
[70]  P. V. Meena, J. J. Morikuni, S.-M. Kang, A. V. Harton and K. W. Wyatt, "A Simple Rate-
Equation-Based Thermal VCSEL Model," J. Lightwave Tech., vol. 17, no. 5, pp. 865-872, 
1999.  
[71]  S. Palermo, Design of High-Speed Optical Interconnect Transceivers, Ph.D. Dissertation, 
Stanford University, Sep. 2004.  
[72]  K. W. Goossen, "Fitting Optical Interconnects to an Electrical World: Packaging and 
Reliability Issues of Arrayed Optoelectronic Modules," IEEE Lasers and Electro-Optics 
Society Annual Meeting, Nov. 2004.  
[73]  A. Kern, A. Chandrakasan and I. Young, "18Gb/s Optical I/O: VCSEL Driver and TIA 
in 90nm CMOS," IEEE Symposium on VLSI Circuits, Jun. 2007.  
[74]  M. Tomlinson, "New Automatic Equalizer Employing Modulo Arithmetic," IEEE 
Electronics Lett., vol. 7, no. 5, pp. 138-139, Mar. 1971.  
[75]  M. H. Nazari, Electrical and Optical Interconnects for High-Performance Computing, 
Ph.D. Dissertation, California Institute of Technology, Apr. 2013.  
[76]  Semiconductor Industry Association (SIA), International Technology Roadmap for 
Semiconductors (ITRS) 2011 Update, 2011.  
[77]  G. E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, vol. 38, 
no. 8, Apr. 1965.  
[78]  I. Young, E. Mohammed, J. Liao, A. Kern, S. Palermo, B. Block, M. Reshotko and P. L. 
D. Chang, "Optical I/O Technology for Tera-Scale Computing," J. Solid-State Circuits, vol. 
45, no. 1, pp. 235-248, Jan. 2010.  
[79]  C. L. Schow, F. E. Doany, C. Chen, A. Rylyakov, C. W. Baks, D. M. Kuchta, R. A. John 
and J. A. Kash, "Low-Power 16 x 10 Gb/s Bi-Directional Single Chip CMOS Optical 
Transceivers Operating at ≪ 5 mW/Gb/s/link," J. Solid-State Circuits, vol. 44, no. 1, pp. 
301-313, Jan. 2009.  
  
140 
[80]  E. Prete, D. Scheideler and A. Sanders, "A 100 mW 9.6 Gb/s transceiver in 90 nm CMOS 
for next-generation memory interfaces," IEEE ISSCC Dig. Tech. Papers, 2006.  
[81]  S. H. Strogatz, Nonlinear Dynamics and Chaos, New York: Perseus Books, 1994.  
[82]  B. Razavi, Design of Integrated Circuits for Optical Communications, McGraw-Hill , 
2002.  
[83]  M. Monge, M. Raj, M. Nazari, H.-C. Chang, Y. Zhao, J. Weiland, M. Humayun, Y.-C. 
Tai and A. Emami-Neyestanak, "A fully intraocular high-density self-calibrating epiretinal 
prosthesis," IEEE Trans. on Biomedical Circuits and Systems, vol. 7, no. 6, pp. 747-760, Dec. 
2013.  
[84]  M. Monge, M. Raj, M. Nazari, H.-C. Chang, Y. Zhao, J. Weiland, M. Humayun, Y.-C. 
Tai and A. Emami-Neyestanak, "A fully intraocular 0.0169mm2/pixel 512-channel self-
calibrating epiretinal prosthesis in 65nm CMOS," ISSCC Dig. Tech. Papers, Feb. 2013.  
[85]  M. Loh, Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing, 
Ph.D. Dissertation, California Institute of Technology, May 2013.  
[86]  M. Raj and A. Emami, "Non-linear vertical-cavity surface-emitting laser equalization". 
United States of America Patent 883438-01-US-REG (Open), 6 Oct. 2014. 
 
