#### 저작자표시-비영리-변경금지 2.0 대한민국 #### 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게 • 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. #### 다음과 같은 조건을 따라야 합니다: 저작자표시. 귀하는 원저작자를 표시하여야 합니다. 비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다. 변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다. - 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. - 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다. 저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다. #### Ph.D. Dissertation # Design of High-Speed Optical Receiver with All-Digital Clock and Data Recovery 올 디지털 클럭 및 데이터 복원 회로를 적용한 고속 광 수신기 설계 by Sang-Hyeok Chu August, 2016 School of Electrical Engineering and Computer Science College of Engineering Seoul National University ABSTRACT I **Abstract** This thesis presents a 22- to 26.5-Gb/s optical receiver with an all-digital clock and data recovery (ADCDR) fabricated in a 65-nm CMOS process. The receiver con- sists of an optical front-end and a half-rate bang-bang clock and data recovery circuit. The optical front-end achieves low power consumption by using inverter-based am- plifiers and realizes sufficient bandwidth by applying several bandwidth extension techniques. In addition, in order to minimize additional jitter at the front-end, not only magnitude and bandwidth but also phase delay responses are considered. The ADCDR employs an LC quadrature digitally-controlled oscillator (LC-QDCO) to achieve a high phase noise figure-of-merit at tens of gigahertz. The recovered clock jitter is 1.28 ps<sub>rms</sub> and the measured jitter tolerance exceeds the tolerance mask spec- ified in IEEE 802.3ba. The receiver sensitivity is 106 and 184 $\mu A_{pk\text{-}pk}$ for a bit error rate of $10^{-12}$ at data rates of 25 and 26.5 Gb/s, respectively. The entire receiver chip occupies an active die area of 0.75 mm<sup>2</sup> and consumes 254 mW at a data rate of 26.5 Gb/s. The energy efficiencies of the front-end and entire receiver at 26.5 Gb/s are 1.35 and 9.58 pJ/bit, respectively. **Keywords**: All-digital clock and data recovery (ADCDR), LC oscillator, limiting amplifier (LA), optical receiver, transimpedance amplifier (TIA), quadrature digi- tally-controlled oscillator (QDCO) **Student Number : 2012-30235** CONTENTS # **Contents** | ABSTRACT | I | |---------------------------------------|------| | CONTENTS | П | | LIST OF FIGURES | IV | | LIST OF TABLES | VIII | | CHAPTER 1 INTRODUCTION | 1 | | 1.1 MOTIVATION | 1 | | 1.2 THESIS ORGANIZATION | 5 | | CHAPTER 2 DESIGN OF OPTICAL FRONT-END | 7 | | 2.1 Overview | 7 | | 2.2 BACKGROUND ON OPTICAL FRONT-END | 9 | | 2.2.1 PHOTODIODE | 9 | | 2.2.2 Transimpedance Amplifier | 11 | | 2.2.3 POST AMPLIFIER | 17 | | 2.2.4 Shunt Inductive Peaking | 25 | | 2.3 CIRCUIT IMPLEMENTATION | 29 | | 2.3.1 OVERALL ARCHITECTURE | 29 | | 2.3.2 Transimpedance Amplifier | 31 | | 2.3.3 POST AMPLIFIER | 34 | | 2 1 Noise Analysis | /13 | | 2.4.1 PHOTODIODE | 43 | |------------------------------------------------|-----| | 2.4.2 OPTICAL FRONT-END | 44 | | 2.4.3 Sensitivity | 46 | | CHAPTER 3 DESIGN OF ADCDR FOR OPTICAL RECEIVER | 48 | | 3.1 Overview | 48 | | 3.2 BACKGROUND ON PLL-BASED ADCDR | 51 | | 3.2.1 PHASE DETECTOR | 51 | | 3.2.2 DIGITAL LOOP FILTER | 54 | | 3.2.3 DIGITALLY-CONTROLLED OSCILLATOR | 56 | | 3.2.4 ANALYSIS OF BANG-BANG ADCDR | 67 | | 3.3 CIRCUIT IMPLEMENTATION | 70 | | 3.3.1 OVERALL ARCHITECTURE | 70 | | 3.3.2 PHASE DETECTION LOGIC | 75 | | 3.3.3 DIGITAL LOOP FILTER | 77 | | 3.3.4 LC QUADRATURE DCO | 78 | | CHAPTER 4 EXPERIMENTAL RESULTS | 82 | | CHAPTER 5 CONCLUSION | 90 | | BIBLIOGRAPHY | 92 | | 초 록 | 101 | LIST OF FIGURES IV # **List of Figures** | FIG. 1.1 CONVENTIONAL OPTICAL INTERCONNECTION. | 2 | |----------------------------------------------------------------------------------|-----| | FIG. 1.2 ROADMAP OF VARIOUS OPTICAL COMMUNICATION STANDARDS | 3 | | FIG. 1.3 (A) HYBRID AND (B) MONOLITHIC OPTICAL RECEIVER FRONT-END. | 4 | | FIG. 2.1 (A) PHOTODIODE SMALL SIGNAL EQUIVALENT CIRCUIT AND (B) SIMPLIFIED ONE | 9 | | FIG. 2.2 TOTAL CAPACITANCES OF COMMERCIAL PHOTODIODES. | 10 | | Fig. 2.3 Simple resistor for current-to-voltage conversion. | 11 | | Fig. 2.4 Common-gate amplifier. | 12 | | Fig. 2.5 Regulated cascode. | 13 | | Fig. 2.6 Regulated cascode transimpedance amplifier. | 14 | | Fig. 2.7 Common-gate feedforward TIA. | 15 | | Fig. 2.8 Shunt resistive feedback CMOS TIA. | 16 | | Fig. 2.9 DC transfer function of (a) an LA and (b) an AGC amplifier | 18 | | Fig. 2.10 Optical receiver (a) without and (b) with equalizers. | 19 | | Fig. 2.11 An n-stage amplifier. | 20 | | Fig. 2.12 Required cell $GBW$ as a function of the number of stages $N[31]$ | 21 | | FIG. 2.13 CONVENTIONAL OFFSET CANCELLATION SCHEME. | 22 | | Fig. 2.14 Effect of offset cancellation in (a) frequency and (b) time domains [3 | 1]. | | | 22 | | Fig. 2.15 Cherry-Hooper amplifier | 23 | | FIG. 2.16 (A) SIMPLIFIED CHERRY-HOOPER AMPLIFIER AND (B) ITS EQUIVALENT SMALL- | | | SIGNAL MODEL | 24 | | FIG. 2.17 COMMON SOURCE AMPLIFIER (A) WITHOUT AND (B) WITH SHUNT PEAKING | 26 | |------------------------------------------------------------------------------------------------|-------------| | FIG. 2.18 NORMALIZED FREQUENCY RESPONSES OF SHUNT-PEAKED AMPLIFIER: (A) | | | AMPLITUDE AND (B) PHASE [35]. | 28 | | FIG. 2.19 PROPOSED OPTICAL FRONT-END ARCHITECTURE. | 29 | | FIG. 2.20 COMPUTED TRANSIMPEDANCE GAIN VERSUS SUPPLY CURRENT AT TARGET | | | BANDWIDTH OF 20 GHZ. | 32 | | FIG. 2.21 IMPLEMENTED TIA WITH PHOTODIODE MODEL. | 33 | | Fig. 2.22 Simulated frequency response of TIA with different $C_{PD}$ values | 34 | | Fig. 2.23 Single-to-differential converter. | 35 | | Fig. 2.24 Limiting amplifier. | 35 | | FIG. 2.25 SIMULATED FREQUENCY RESPONSE OF SINGLE-TO-DIFFERENTIAL CONVERTER | (A) | | AMPLITUDE AND (B) PHASE DELAY. | 38 | | Fig. 2,26 Unit amplifier stage of LA. | 39 | | Fig. 2.27 Required cell GBW as a function of stages $N$ for $A_{TOT} = 23$ dB and $BW$ | $r_{TOT} =$ | | 20GHz | 40 | | FIG. 2.28 SIMULATED FREQUENCY RESPONSES OF LIMITING AMPLIFIER AND ENTIRE OPTI- | CAL | | FRONT-END. | 41 | | Fig. 2.29 Simulated eye diagram of the optical front-end with (a) 100 $\mu A_{\text{pk-pk}}$ i | NPUT | | AND (B) $500 \mu A_{PK-PK}$ INPUT | 42 | | FIG. 2.30 NOISE SOURCES IN THE OPTICAL FRONT-END. | 44 | | FIG. 2.31 COMPUTED INPUT REFERRED NOISE OF THE TIA VERSUS SUPPLY CURRENT AT | | | TARGET BANDWIDTH OF 20 GHz. | 45 | | FIG. 2.32 CALCULATED BER VERSUS INPUT OPTICAL POWER. | 47 | | Fig. 3.1 Two types of I/O clocking architectures: (a) Forwarded-clock and (e | 3) | | EMBEDDED-CLOCK ARCHITECTURE | 49 | |----------------------------------------------------------------------------------|----| | FIG. 3.2 PLL-BASED ADCDR [43]. | 50 | | Fig. 3.3 (a) Implementation of Hogge phase detector and (b) its timing diagram | 51 | | FIG. 3.4 TRANSFER CURVE OF HOGGE PHASE DETECTOR. | 52 | | Fig. 3.5 (a) Implementation of Alexander phase detector and (b) its timing | | | DIAGRAM. | 53 | | FIG. 3.6 TRANSFER CURVE OF ALEXANDER PHASE DETECTOR. | 53 | | Fig. 3.7 (a) Analog loop filter and (b) digital loop filter. | 54 | | FIG. 3.8 TRANSFER CURVE OF (A) VCO AND (B) DCO | 56 | | Fig. 3.9 (a) Current controlled and (b) resistor controlled ring-DCO | 57 | | FIG. 3.10 LC-DCO USING VARACTOR ARRAY. | 58 | | Fig. 3.11 (a) PMOS CAPACITOR AND (B) ITS GATE CAPACITANCE VERSUS GATE-SOURCE | | | VOLTAGE. | 58 | | FIG. 3.12 GLITCH-LESS SWITCHING SCHEME. | 59 | | Fig. 3.13 Parallel-coupled quadrature VCO. | 60 | | FIG. 3.14 LINEARIZED BLOCK DIAGRAM OF PARALLEL-COUPLED QUADRATURE OSCILLATOR. | 61 | | FIG. 3.15 MAGNITUDE AND PHASE OF RESONATOR IMPEDANCE: (A) LOSSLESS AND (B) LOSSY | r | | RESONATOR | 62 | | FIG. 3.16 PARALLEL-COUPLED QUADRATURE VCO WITH PHASE SHIFTERS. | 63 | | Fig. 3.17 (a) Top-series-coupled quadrature VCO and (b) bottom-series-coupled | | | QUADRATURE VCO. | 64 | | FIG. 3.18 (A) SMALL SIGNAL CIRCUIT MODEL FOR P-QVCO AND BS-QVCO AND (B) THEIR | | | PHASOR DIAGRAMS | 65 | | FIG. 3.19 BLOCK DIAGRAM OF SECOND-ORDER BANG-BANG ADCDR | 67 | | FIG. 3.20 LINEARIZED MODEL OF BANG-BANG ADCDR. | 69 | |-----------------------------------------------------------------------------------------------------|-----| | FIG. 3.21 FAN-OUT-OF-TWO CML BUFFERS. | 71 | | FIG. 3.22 SIMULATED LARGE SIGNAL GAIN AND AC UNITY GAIN FREQUENCY OF FAN-OUT- | OF- | | TWO CML BUFFERS VERSUS 1X TRANSISTOR SIZE. | 71 | | FIG. 3.23 (A) FOUR-STAGE AND (B) TWO-STAGE RING OSCILLATORS. | 73 | | FIG. 3.24 SIMULATED OSCILLATION FREQUENCIES AND PHASE NOISES OF RING OSCILLATOR | RS | | VERSUS SUPPLY CURRENT. | 73 | | Fig. 3.25 Proposed ADCDR architecture. | 74 | | FIG. 3.26 SAMPLERS AND PHASE DETECTION LOGIC. | 76 | | FIG. 3.27 TIMING DIAGRAM OF PHASE DETECTION LOGIC. | 76 | | Fig. 3.28 Digital loop filter. | 77 | | Fig. 3.29 LC quadrature digitally-controlled oscillator | 79 | | Fig. 3.30 Integral varactor bank. | 80 | | FIG. 3.31 PROPORTIONAL VARACTOR BANK. | 80 | | FIG. 3.32 SIMULATED TRANSIENT RESPONSE OF CDR. | 81 | | Fig. 4.1 Chip micrograph. | 83 | | FIG. 4.2 OPTICAL MEASUREMENT SETUP. | 83 | | Fig. 4.3 Alternative measurement setup. | 84 | | Fig. 4.4 Measured LC-QDCO tuning characteristics. | 84 | | Fig. 4.5 Measured divided-by-32 recovered clock: (a) Jitter Histogram and (b) | | | PHASE NOISE | 85 | | FIG. 4.6 MEASURED JITTER TOLERANCE (12.5 GB/S 2 <sup>7</sup> –1 PRBS) | 86 | | FIG. 4.7 MEASURED SENSITIVITY: (A) USING 2 <sup>7-1</sup> PRBS AND (B) USING 2 <sup>31-1</sup> PRBS | 87 | LIST OF TABLES VIII # **List of Tables** | TABLE 2.1 PERFORMANCE METRICS FOR SHUNT PEAKING [35]. | 28 | |-------------------------------------------------------|----| | TABLE 2.2 REQUIRED GAIN OF LIMITING AMPLIFIER. | 40 | | TABLE 4.1 FRONT-END PERFORMANCE SUMMARY | 89 | | TABLE 4.2 RECEIVER PERFORMANCE SUMMARY. | 89 | ## Chapter 1 ## Introduction ### 1.1 Motivation The rapid growth in computing and storage capabilities demands ever-increasing data rates for chip-to-chip and board-to-board interconnections. Because of package pin-count constraints, the required per-pin data rate also steadily increases, which makes the signal attenuation, distortion, and cross-talk of copper-based interconnections more severe [1]-[3]. This can be mitigated to some extent with pre- and post-equalization techniques such as the feed-forward equalizer (FFE), continuous-time linear equalizer (CTLE), and higher-order decision-feedback equalizer (DFE) [4]-[5]. However, such solutions require considerable power consumption. For these reasons, optical interconnections, as shown in Fig. 1.1, are gaining more interest as candidates for next-generation board-to-board and chip-to-chip interconnections, which can be Fig. 1.1 Conventional optical interconnection. attributed their much higher intrinsic bandwidth of optical fibers. Reflecting this trend, various optical specifications such as 40/100 Gigabit Ethernet (GbE), Infiniband Enhanced Data Rate (EDR), 32GFC, and 100G Coarse WDM Long Range 4 (CLR4) have been proposed for high-speed inter-rack/board interconnections. Fig. 1.2 summarize the roadmap of those optical communication standards. In addition, it is expected that optical specifications for chip-to-chip applications will also appear in the near future. Considering the cost and vision of fully-integrated optical receivers with processors, CMOS processes are more suitable than III-V compound processes in spite of their inferior noise and transition speed characteristics [6], [7]. Accordingly, the CMOS implementations of the 25 Gb/s-class optical receiver have been intensively Fig. 1.2 Roadmap of various optical communication standards. worked recently [8]-[15]. From the cost and signal integrity point of view, the monolithic optical receiver described in Fig. 1.3(b) is the best solution. There have been several attempts to implement optical receiver including photodiode in standard CMOS process [16], [17]. However, their speeds are limited around 20 Gb/s per channel and they exhibit poor sensitivity performance because of the intrinsic and extrinsic limitations of the CMOS photodiode [7]. Consequently, most of the researches on 25 Gb/s-class optical receiver focus on hybrid integration of optical and electrical devices from different processes, which is shown in Fig. 1.3(a). As the technology scales down, the operating speed of a CMOS optical front-end has continuously increased because of the concomitant increase in the transit frequency of MOSFET devices. On the other hand, in the deep-submicron technologies, the conventional clock and data recovery (CDR) circuit based on the charge-pump Fig. 1.3 (a) Hybrid and (b) monolithic optical receiver front-end. phase-locked loop (CP-PLL) encounters many challenges [18]-[20]. First, it is difficult to achieve high performance in the charge-pump because of the reduced voltage headroom, low output impedance of the transistors, and large process variations. Second, the area of a loop filter does not scale down in step with the scaled technologies; a scaled transistor cannot be used as the capacitors in the loop filter because of non- negligible gate leakage current. Third, the reduced tuning voltage range makes the voltage controlled oscillator (VCO) more sensitive to the noise coupled from adjacent blocks. In order to overcome the problems faced by conventional CDRs, many digital approaches to a phase-locked loop (PLL) and CDR have been proposed [19]-[22]. They obviate the use of large loop-filter capacitors and exhibit a loop characteristic that is immune to process, voltage, and temperature (PVT) variations. Additionally, they are noise tolerant because the signal amplitudes at most of the nodes are quantized. Therefore, an all-digital CDR (ADCDR) is the optimal choice for an optical receiver implemented in deep-submicron technology. In this thesis, a 22- to 26.5-Gb/s optical receiver implemented in a 65-nm CMOS technology is presented. The receiver includes both an optical front-end and a PLL-based bang-bang ADCDR, which corresponds to the blocks enclosed by dashed line in Fig. 1.1. ## 1.2 Thesis Organization This thesis is organized as follows. In Chapter 2, the background on building blocks for optical front-end is explained. After that, the overall architecture and detailed circuit implementation of the implemented receiver are described. In addition, noise and sensitivity analyses of the optical front-end are presented. In Chapter 3, the basic theory of ADCDR is introduced to understand the dynamics of ADCDRs. In addition, the nonlinear and pseudo linear analyses of bang-bang ADCDR are introduced. Then the overall architecture and circuit implementation of the implemented receiver are described. Chapter 4 summarizes the results of an experiment with the test chip, and Chapter 5 draws conclusions. ## Chapter 2 ## **Design of Optical Front-End** #### 2.1 Overview The objective of the optical front-end is to convert an incoming photocurrent from the photodiode to a voltage signal with sufficient amplitude and signal-to-noise ratio (SNR) for faithful zero/one detection. From this objective, the required building blocks and design considerations of the optical front-end can be inferred. Firstly the optical front-end requires the building block which converts the current signal to the voltage signal. Secondly, the optical front-end has to deliver the voltage signal with sufficient amplitude for zero/one detection. In order to avoid metastability in the sampling operation, the optical front-end requires amplifiers which providing additional gain. For faithful zero/one detection, both a SNR and a timing jitter should be considered. The SNR can be maximized by minimize the additional noise in the optical front-end. The Friis equation indicates that the noise contributed by each stage decreases as the gain preceding the stage increases [23]. Thus, the overall sensitivity performance of the receiver is mainly determined by that of the first stage in the front-end, which is current-to-voltage converter in optical front-end. In addition, an intersymbol-interference (ISI) is the dominant source of the timing jitter in the optical front-end, which is related to the bandwidth. The limited bandwidth causes the amplitude response and group delay to vary with frequency, which results in deterministic jitter. To be more exactly, the amount of jitter induced by the front-end is determined by the frequency response of the entire chain unless the system is not slew limited. To sum up, two major design targets of the optical front-end are to perform a current-to-voltage conversion with low additional noise and to make the signal amplitude detectable in the CDR with low additional jitter. In later sections, the background on building block for optical front-end will be explained in detail and the overall architecture and circuit implementation of the implemented receiver will be described. In addition, noise and sensitivity of the optical front-end will be investigated. ## 2.2 Background on Optical Front-End #### 2.2.1 Photodiode The optical signal from the optical fiber is received by the photodiode, which generates a current signal in proportion to the responsivity. The characteristics of the current-to-voltage converter of the optical front-end are closely related to the equivalent circuit of the photodiode. Fig. 2.1 shows the small signal equivalent circuit of photodiode and simplified one. $C_{PD}$ includes the junction capacitance $C_j$ and pad capacitance $C_{pad}$ of photodiode. In addition, $C_{PD}$ of the commercial 25 Gb/s-class photodiodes ranges from 80fF to 100fF, which is determined experimentally. Fig. 2.1 (a) Photodiode small signal equivalent circuit and (b) simplified one. Fig. 2.2 Total capacitances of commercial photodiodes. The total capacitances of the commercial photodiodes, which are presented at the website of each manufacturer, are summarized in Fig. 2.2. In later sections the simplified photodiode model shown in Fig. 2.1 is used to investigate the characteristics of the optical front-end and set design parameters. In addition, we will introduce some terminology for optical signal and photodiode characteristic. Optical modulation amplitude (OMA) and extinction ratio (ER) are the difference and ratio between optical power level of the digital signal "1" and "0", respectively. Responsivity is the gain of the photodiode, which is usually expressed in the unit of amperes per watt. They are given by, $$OMA = P_1 - P_0 \tag{2.1}$$ $$ER = P_1/P_0 \tag{2.2}$$ Responsivity = $$I_{PD}/P_{opt}$$ (2.3) where $P_1$ and $P_0$ are the optical power level of the digital signal "1" and "0", respectively #### 2.2.2 Transimpedance Amplifier The simplest implementation which convert current signal to the voltage signal is shown in Fig. 2.3. The transimpedance gain $R_t$ and SNR are derived as follows: $$R_t = \frac{V_{sig}}{I_{in}} = \frac{1}{2\pi \cdot BW \cdot C_T} \tag{2.4}$$ $$SNR = \frac{\left(I_{in}R_{t}\right)^{2}}{kT/C_{T}}$$ $$= \frac{I_{in}^{2}}{4\pi^{2} \cdot kT \cdot BW^{2} \cdot C_{T}}$$ (2.5) where k and T are the Boltzmann constant and the absolute temperature, respectively. $C_T$ is the total capacitance at the input including the photodiode capacitance and parasitic capacitance. The equation (2.5) shows a tight tradeoff between the SNR and the Fig. 2.3 Simple resistor for current-to-voltage conversion. bandwidth. In most cases of the high-speed optical interconnection, the required SNR and bandwidth cannot be met simultaneously with a single resistor, because the typical photocurrent ranges from tens of uA to hundreds of uA. Therefore, transimpedance amplifiers (TIA) whose input impedance is small enough to satisfy the required bandwidth maintaining the same gain are commonly employed. Generally, the capacitance at the TIA input node larger than those at other on-chip nodes, especially in the hybrid integration. Thus, the dominant pole of the TIA generated at the input of the TIA. For those reasons, the traditional TIAs are based on the common gate (CG) amplifier as shown in Fig. 2.4. Since the CG amplifier has small input resistance, dominant pole located at the input of the TIA can be located at relatively high frequency. In addition, because transimpedance gain is determined by the load resistance, transimpedace and bandwidth are optimized separately without tight tradeoff in contrast to the single resistor implementation. The transimpedance gain at Fig. 2.4 Common-gate amplifier. low frequency $R_T$ and input resistance $R_{in}$ of CG amplifier are derived as follows: $$R_{in} = \frac{1}{g_m}, R_T = R_D.$$ (2.6), (2.7) Improved from the CG amplifier, the regulated cascode (RGC) amplifier shown in Fig. 2.5 exhibit much smaller input resistance [24]. An amplifier feedback added between the gate and source of the CG amplifier lower the input resistance as follows: $$R_{in} = \frac{1}{g_{m1}(1 + g_{m2}R_2)} \tag{2.8}$$ where $g_{m1}$ and $g_{m2}$ are transconductances of the M1 and M2, respectively. Because of the greatly reduced input resistance, the dominant pole of the RGC amplifier is generated at the drain of M1. The approximate -3-dB frequency of the RGC amplifier is given by, $$f_{-3dB,RGC} = \frac{1}{2\pi R_1 C_{d1}} \tag{2.9}$$ where $C_{dl}$ is the total drain capacitance of Ml. However, it should be noted that a Fig. 2.5 Regulated cascode. zero appears due to the local feedback stage and this zero causes a peaking in the frequency response at the frequency of $$f_{peak} = \frac{1}{2\pi R_{D2} \left( C_{gs1} + C_{gd2} \right)}$$ (2.10) where $C_{gs1}$ and $C_{gd2}$ are gate-source and gate-drain capacitances of M1 and M2, respectively [25]. Using the RGC amplifier as an input stage, as shown in Fig. 2.6, several RGC TIAs achieve small input resistance, hence high bandwidth [26]. Because of the small input resistance of RGC input stage, large photodiode capacitance can be isolated from the bandwidth determination. The transimpedance gain at low frequency $Z_T$ and the dominant pole generated at the drain of MI are given by, $$Z_T \approx -R_F \tag{2.11}$$ Fig. 2.6 Regulated cascode transimpedance amplifier. $$f_{-3dB,RGCTIA} \approx \frac{g_{m1}}{2\pi R_F \left(C_f + \frac{C_{gd1} + C_{g2}}{1 + \alpha_2 g_{m3} R_3 \alpha_4}\right)}$$ (2.12) where $\alpha_i$ are the low-frequency gains of the source followers, $C_{gdl}$ is the gate-drain capacitance of M1, $C_{g2}$ is capacitance at the gate of M2, and $C_f$ is the parasitic capacitance of the feedback resistor [26]. However, in the conventional RGC amplifier shown in Fig. 2.5, the DC voltage at the drain of M2 and $V_{out}$ are two gate-source and one gate-source plus one drain-source voltage, respectively. Thus, the performance of the RGC amplifier is significantly decreased in the low supply voltage. In order to maintain the performance of the RGC amplifier even in lower supply voltage, a modified RGC TIA is proposed as Fig. 2.7 Common-gate feedforward TIA. shown in Fig. 2.7 [27]. By introducing *M2*, all employing transistors can be biased at a gate-source and a drain-source equal to or higher than 0.5 V. In addition, the bandwidth is further increased by inductive peaking. Nonetheless, considering the voltage drop at the passive load, which is closely related to transimpedance gain, feedback loop gain, and stability, it is steel hard to design the high performance RGC-based TIA in low supply voltage. The TIAs explained so far are based on CG amplifier with feedback loop to lower input resistance. Another approach to the TIA design is utilizing the shunt-shunt feedback to lower the input resistance as shown in Fig. 2.8 [28]. The transimpedance gain at low frequency $R_T$ , the input resistance $R_{in}$ , and the -3-dB frequency $f_{-3dB}$ are derived as follows: $$R_{T} = \frac{r_{o} \left( g_{m} R_{F} - 1 \right)}{1 + g_{m} r_{o}} \approx R_{F} \tag{2.13}$$ $$R_{in} = \frac{R_F + r_o}{1 + g_m r_o} \tag{2.14}$$ Fig. 2.8 Shunt resistive feedback CMOS TIA. $$f_{-3dB} = \frac{1}{2\pi \cdot R_{in} C_{in,tot}}$$ $$= \frac{1 + g_m r_o}{2\pi \cdot C_{in,tot} (R_F + r_o)}$$ (2.15) where $g_m$ , $r_o$ , and $R_F$ are the transconductance and output resistance of the transistors and the feedback resistor, respectively. $C_{in,tot}$ is the total input capacitance at the TIA input. It is assumed that the dominant pole is generated at the TIA input. It is worth noticing that there is a tradeoff between the bandwidth and the transimpedance gain as single resistor implementation shown in Fig. 2.3. However, the bandwidth of the shunt-feedback (SF) TIA is extended by a factor of CMOS intrinsic gain, thus, the tradeoff between them are greatly loosened compared with the single resistor implementation. In contrast to the RGC-based TIAs, it is suitable to low power supply, because there is only two drain-source voltage and no voltage drop of passive device between the power supply and the ground. In addition, due to the reuse of current in PMOS and NMOS, it is power-efficient. For these reasons, SF TIA is now widely used for high-performance in scaled technology [8]-[10]. As explained earlier, the additional noise of the TIA is key performance metric as well as bandwidth. It will be investigated in section 2.4 with the noise analysis of the implemented receiver. #### 2.2.3 Post Amplifier With typical values of the received OMA, photodiode responsivity, and TIA gain of -10 dBm, 0.7 A/W, and 100 $\Omega$ , respectively, the signal amplitude of the TIA output is 7 mV<sub>pk-pk</sub>. It is still insufficient for the reliable operation of the CDR. Therefore, an intermediate stages providing additional gain between the TIA and CDR circuits are needed. In later sections, those amplifier stages following the TIA will be referred to as the post amplifier. We start by distinguishing two types of amplifiers for the post amplifier, the limiting amplifier (LA) and the automatic gain control (AGC) amplifier. An LA is an amplifier with no special provisions to avoid occurring clipping or limiting of the output signal. For very small input signals, the LA operates in the linear regime, the output amplitude is proportional to input. For larger signals, it crosses into the limiting regime, output signal has constant amplitude. On the other hand, the AGC amplifier reduces its gain for large input signals and thus manages to stay in the linear regime whereas the LA starts to limit. The DC transfer function of an LA and an AGC amplifier are shown in Fig. 2.9. If the post amplifier is followed directly by a decision circuit as shown in Fig. Fig. 2.10(a), linearity is of little concern and we may even use a limiting amplifier for the Fig. 2.9 DC transfer function of (a) an LA and (b) an AGC amplifier. Fig. 2.10 Optical receiver (a) without and (b) with equalizers. post amplifier. In this case, amplitude distortions do no harm as long as the crossover points of the signal with the decision threshold are preserved [29]. Nevertheless, we have to make sure that the nonlinearity doesn't introduce pulse-width distortions and jitter, which would reduce the horizontal eye opening. If the post amplifier is followed by some type of signal processor, such as the equalizer shown in Fig. 2.10(b), linearity becomes important [29]. In addition, if the optical front-end is part of a receiver for PAM-4, then linearity is of foremost importance. In this case, an AGC amplifier must be used. However, since the optical front-end with equalizers and PAM-4 receiver are not in scope of this thesis, the details of only the LA for the post amplifier will be introduced. In order to archive sufficient gain, the LA for post amplifier composed of multiple stages in general. A cascade of n identical gain cells as shown in Fig. 2.11, each having a bandwidth $BW_c$ , exhibits an overall bandwidth of $$BW_{tot} = BW_c \cdot \sqrt[n]{2^{1/n} - 1} \tag{2.16}$$ where m is equal to 2 for first-order stages and 4 for second-order stages [30]. More generally, for a total gain of $A_{tot}$ , the required cell gain-bandwidth product $GBW_c$ can be written as $$GBW_{c} = \frac{GBW_{tot}}{A_{tot}^{1-1/n} \cdot \sqrt[n]{2^{1/n} - 1}}$$ (2.17) where $GBW_{tot} = A_{tot} \cdot BW_{tot}$ and $GBW_c = A_{tot}^{-1/n} \cdot BW_c$ [31]. The equation (2.17), the required $GBW_c$ for a cascade of n fist-order or second-order gain cells, can be plotted as Fig. 2.12 at a given target $A_{tot}$ and $BW_{tot}$ . The actual second-order plot in Fig. 2.12 shows the approximated values considering the circuit imperfections [31]. For a larger n, a lower gain per stage leads to high noise contributions of each stage to overall input-referred noise. In addition, a larger number of Fig. 2.11 An n-stage amplifier. Fig. 2.12 Required cell *GBW* as a function of the number of stages *n* [31]. stages result in higher power consumption. Therefore, it had better be chosen the minimum number of stages which meet the target $A_{tot}$ and $BW_{tot}$ at given technology. Any mismatches of load resistors or transistors in a differential pair may lead to the disparity between the DC levels of the two outputs and this DC difference is amplified by the subsequent stages [32]. In order to alleviate this problem, an offset cancellation circuit is needed in the post amplifier. A conventional offset cancellation circuit is shown in Fig. 2.13 [29]. The low-frequency components of the output are amplified by the error amplifier and the result is fed back to the input of the first stage. The continuous-time offset cancellation circuits introduce a high-pass cutoff frequency in the transfer function and "droop" in the time domain after long consecutive identical bits as shown in Fig. 2.14(a) and (b) [31]. At the end of the droop period, the signal is shifted with respect to the decision threshold. The lower cutoff frequency Fig. 2.13 Conventional offset cancellation scheme. Fig. 2.14 Effect of offset cancellation in (a) frequency and (b) time domains [31]. should be small enough to minimize this effect. In addition, because the photocurrent from the photodiode is single-ended, unless the TIA convert single-ended input to differential output, a single-to-differential converter (S2D) has to be included in the post amplifier. In the following, a traditional LA, Cherry-Hooper amplifier, is introduced. It has first been proposed by Cherry and Hooper in the early 60's by using a cascade of series-series feedback and shunt-shunt feedback amplifiers as shown in Fig. 2.15 [33]. Later, the technique is modified by several works for LA of the optical front-end. Intuitively, it can be easily known that the series-series feedback of the first stage boosts the high-frequency gain. Therefore, by choosing the proper values of $R_E$ and $C_E$ , the bandwidth of the amplifier can be extended. The effects of shunt-shunt feedback of the second stage are analyzed with simplified circuit as shown in Fig. 2.16(a) and (b). Instead of analyzing complex transfer function of the circuits, the effects of shunt-shunt feedback on the frequency response of the amplifier can be investigated using the method of open-circuit time constants. The low-frequency gain $A_V$ and the Fig. 2.15 Cherry-Hooper amplifier. open-circuit time constants of $C_X$ and $C_Y$ , $\tau_X$ and $\tau_Y$ , are derived as follows: $$A_{v} \approx \left(g_{m2}R_{F} - 1\right) \cdot \left(\frac{g_{m1}R_{Y}}{1 + g_{m2}R_{Y}}\right)$$ $$\approx g_{m1}R_{F}$$ (2.18) $$\tau_{X} = C_{X} \cdot \frac{R_{X}}{1 + \frac{R_{X}}{R_{F}} \left( 1 + g_{m2} R_{Y} \right)}$$ $$\approx C_{X} \cdot \frac{R_{F}}{1 + g_{m2} R_{Y}}$$ (2.19) Fig. 2.16 (a) Simplified Cherry-Hooper amplifier and (b) its equivalent small-signal model. $$\tau_{Y} = C_{Y} \cdot \frac{R_{Y}}{1 + g_{m2}R_{Y}\left(\frac{R_{X} + 1/g_{m2}}{R_{X} + R_{F}}\right)}$$ $$\approx C_{Y} \cdot \frac{R_{Y}}{1 + g_{m2}R_{Y}} \approx \frac{C_{Y}}{g_{m2}}$$ (2.20) where $g_{mi}$ , $r_{oi}$ , $C_{gi}$ , and $C_{di}$ are transconductance, output resistance, total gate capacitance, and total drain capacitance of Mi, respectively. The time constants $\tau_X$ and $\tau_Y$ are reduced by a factor of the voltage gain of the second stage. In addition, if $\tau_Y$ is dominant, the bandwidth and gain of the amplifier are tuned independently each other. However, it should be noted that the voltage gain of the amplifier is reduced by the same factor. Therefore, the bandwidth of the amplifier are extended at the cost of the reduced voltage gain. #### 2.2.4 Shunt Inductive Peaking Many techniques to extend the bandwidth of the amplifier have been proposed. In this section, one of the commonly used techniques, the shunt inductive peaking, and its optimum response will be investigated. An attractive feature of this technique is that the bandwidth enhancement comes without additional power dissipation. Fig. 2.17 shows the simple shunt peaking implementation and the transfer function of the amplifier with shunt peaking are derived as follows: $$A_{v} = \frac{g_{m}(R + j\omega L)}{1 + j\omega RC - \omega^{2}LC}$$ (2.21) Equation (2.21) can be normalized and rewritten with the ratio of the L/R and RC time constants as Fig. 2.17 Common source amplifier (a) without and (b) with shunt peaking. $$A_{v} = \frac{1 + jm\omega}{\left(1 - m\omega^{2}\right) + j\omega}$$ (2.22) where $$m = \frac{L/R}{RC}$$ (2.23) The frequency response of the shunt-peaked amplifier is varied with the coefficient m. The maximum bandwidth can be obtained by finding the value of m maximize the bandwidth extension ratio, $$\frac{\omega}{\omega_{1}} = \frac{1}{m} \sqrt{\left(m^{2} + m - \frac{1}{2}\right) + \sqrt{\left(m^{2} + m - \frac{1}{2}\right)^{2} + m^{2}}}$$ (2.24) where $\omega_I$ is the uncompensated -3-dB frequency. In addition, the maximal flatness can be obtained by choosing the value of m maximizing the number of derivatives of magnitude response, whose value is zero at DC [34]. Another special case of the shunt peaking is minimum phase distortion. If the spectral components of the input signal do not experience equal time delay, distortion in the output signal can occur. The minimum phase distortion can be obtained by choosing the value of m, which makes the group delay response maximally flat [34]. The group delay is defined as follows: $$\tau_g \equiv -\frac{\Phi(\omega)}{d\omega} \tag{2.25}$$ where $\Phi(\omega)$ is the phase response of the amplifier. Three cases of special shunt peaking explained above are summarized in Table 2.1 and their amplitude and phase responses are plotted in Fig. 2.18(a) and (b) [35]. For the digital signals, whose spectral components spread out through the wide frequency range, the optimum group delay case is desirable for optimizing jitter performance. To be more exact, the time delay of each spectral components in the wideband system is not the group delay but the phase delay defined as follows: $$\tau_p \equiv -\frac{\Phi(\omega)}{\omega} \tag{2.26}$$ However, the phase delay variance is also minimized when the group delay variance is minimized, although the amount of them are not exactly same. | m | Normalized @3dB | Response | |------|-----------------|---------------------| | 0 | 1.00 | No shunt peaking | | 0.32 | 1.60 | Optimum group delay | | 0.41 | 1.72 | Maximally flat | | 0.71 | 1.85 | Maximum bandwidth | Table 2.1 Performance metrics for shunt peaking [35]. Fig. 2.18 Normalized frequency responses of shunt-peaked amplifier: (a) amplitude and (b) phase [35]. # 2.3 Circuit Implementation #### 2.3.1 Overall Architecture The overall architecture of the implemented optical front-end is shown in Fig. 2.19 [36]. The optical front-end consists of a TIA, a S2D, an LA, DC offset cancellation circuits, and an output buffer. The TIA converts the incoming photocurrent from a photodiode to a voltage signal. After that, S2D transforms the single-ended TIA output into differential outputs. Following the S2D, the LA provides additional amplification to achieve a signal amplitude sufficient for reliable sampling operation in the CDR. In addition, offset cancelation circuits are inserted between the inputs and out- Fig. 2.19 Proposed optical front-end architecture. puts of the LA, forming a negative feedback loop, and thereby preventing offset induced amplifier saturation. In addition, a current-mode logic (CML) buffer with shunt inductive peaking is employed to drive the large capacitive load of the samplers in the CDR. The optical front-end adopts an inverter-based amplifier as the basic building block to implement the TIA and the LA because this approach allows more small signal gain than NMOS-based amplifiers as a consequence of current reuse [37]. In addition, the front-end utilizes several bandwidth extension techniques that optimize not only the amplitude but also the phase delay response in order to minimize the additional jitter. The power spectrum of the random non-return-to-zero (NRZ) data includes non-zero components that extend out to infinite frequencies. However, the majority (90%) of the total power is concentrated in the spectrum between 0 and 0.7 times the data rate [38]. Therefore, the target bandwidth of the optical front-end is set to 0.7 times the maximum date rate, which is approximately 18.5 GHz. In order to meet the target bandwidth and minimum phase distortion simultaneously, both shunt inductive peaking and negative Miller capacitance are employed at the output node of the S2D. Additionally, to achieve sufficient gain and bandwidth in the LA, shunt feedback resistors are inserted in every other stage, which corresponds to a two-stage CMOS Cherry-Hooper amplifier, as presented in [39]. The implementation of the TIA, S2D and LA is discussed in later sections. ## 2.3.2 Transimpedance Amplifier As explained in section 2.2.2, there have been various approaches to TIA design, and their main target was to reduce the input resistance of the TIA, thereby moving the pole that is introduced at the TIA input node to a higher frequency. However, as the supply voltage scales down, conventional topologies such as CG and RGC amplifiers are becoming more difficult to design be-cause of the reduced voltage headroom. In this situation, the compact but powerful TIA topology shown in Fig. 2.8, a CMOS inverter with the shunt resistive feedback, is now widely used for high-performance optical receivers. In the implemented receiver, the TIA is designed with the same topology presented in [28]. The input resistance $R_{in}$ , transimpedance gain $Z_T$ , and its value at low frequency $R_T$ of the implemented TIA are derived as follows: $$R_{in} = \frac{R_F + r_o}{1 + g_{...}r_o} \tag{2.27}$$ $$Z_{T}(s) = -\frac{\frac{r_{o}(g_{m}R_{F}-1)}{1+g_{m}r_{o}}}{\left(1+s\frac{r_{o}C_{o}+(R_{F}+r_{o})C_{in,tot}}{1+g_{m}r_{o}}+s^{2}\frac{R_{F}C_{in,tot}r_{o}C_{o}}{1+g_{m}r_{o}}\right)}$$ (2.28) $$R_T = -\frac{r_o \left(g_m R_F - 1\right)}{1 + g_m r_o} \approx -R_F \tag{2.29}$$ where $g_m$ , $r_o$ , $C_o$ and $R_F$ are the transconductance, output resistance, and output capacitance of the transistors and the feedback resistor, respectively. $C_{in,tot}$ is the total input capacitance at the TIA input, consisting of the photodiode capacitance $C_{PD}$ , TIA input capacitance $C_{in,TIA}$ , and pad capacitance $C_{pad}$ . Assuming that $C_{in,tot}$ is dominant and thus the two poles of the closed-loop transfer function are far apart, the -3-dB frequency of the implemented TIA is given by, . $$f_{3dB} = \frac{1 + g_m r_o}{2\pi \cdot (r_o C_o + (R_F + r_o) C_{in,tot})}$$ (2.30) To get some design intuition, equation (2.30) can be rewritten with assumption that $C_o$ is much smaller than $C_{in,tot}$ , $$R_F = \frac{(g_m + g_{ds})/2\pi f_{3dB} C_{in,tot} - 1}{g_{ds}}$$ (2.31) $$C_{in,tot} = C_{PD} + C_{pad} + C_{gg}$$ (2.32) where $g_{ds}$ and $C_{gg}$ are the drain-source conductance and intrinsic gate capacitance of the transistors. From (2.31), we can easily know that there are tradeoff between the power consumption and maximum achievable transconductance at given $f_{3dB}$ , $C_{PD}$ , $C_{pad}$ , and technology. Fig. 2.20 plots the computed maximum achievable transimpedance gain versus supply current, where the values of the parameters are from the Fig. 2.20 Computed transimpedance gain versus supply current at target bandwidth of 20 GHz. SPICE simulation of the CMOS inverter in 1-V supply and 65-nm technology. $C_{PD}$ and $C_{pad}$ are assumed to be 120 fF and 100 fF, respectively, and $f_{3dB}$ is set to 20 GHz. From the power efficiency point of view, it would be better to design the supply current of less than 4 mA. On the contrary, from the noise point of view, larger gain of the TIA is preferred. Taking all these considerations into account, the transimpedance gain, approximately feedback resistor, is first set. The following amplifiers make up for the gain requirement of the optical front-end. In practice, several iterations are needed because of the limits of assumptions and bonding wire inductance which is not taken into account in above analysis. The SPICE simulation, including the photodiode model and pad as shown in Fig. 2.21, indicates that the transimpedance gain and -3-dB bandwidth of the implemented TIA are 41 dB $\Omega$ and 30 GHz, respectively, with $C_{PD}$ of 120 fF as shown in Fig. 2.22. The bonding wire inductance and pad capacitance are assumed to be 0.5 nH and 100 Fig. 2.21 Implemented TIA with photodiode model. Fig. 2.22 Simulated frequency response of TIA with different $C_{PD}$ values. fF, respectively. The bandwidth is higher than the target value because of series peaking induced by the bonding wire. However, if the bonding wire inductance is more than 1 nH, the bandwidth and signal integrity deteriorate with in-band peaking. ## 2.3.3 Post Amplifier The post amplifier consists of S2D and LA as shown in Fig. 2.19. The circuit implementations of the S2D and LA are described in Fig. 2.23 and Fig. 2.24, respectively. For a single-to-differential conversion with sufficient bandwidth, a differential amplifier with shunt inductive peaking is used. It provides both single-to-differential conversion and signal amplification. One input of the differential pair is connected to the TIA output, and the other is connected to the output of the passive RC filter that extracts the common-mode voltage of the TIA output. Fig. 2.23 Single-to-differential converter. Fig. 2.24 Limiting amplifier. In order to amplify a broadband signal such as a random binary signal, it is important to extend the bandwidth with the optimal phase delay response. As explained in section 2.2.4, the optimal phase delay performance is attained when m is 0.32, which makes the phase response the best approximation to linear line up to the 3-dB bandwidth [35]. However, theoretically, the bandwidth extension ratio by the shunt inductive peaking with optimal group delay is fixed at 1.60, which is not sufficient to meet the target bandwidth, 18.5 GHz. Thus, negative capacitances are also employed at the output nodes of the S2D. These can reduce the load capacitance, thereby enlarging the original bandwidth without the shunt inductive peaking. Therefore, the target –3-dB bandwidth and optimal phase delay response are achieved simultaneously by using both the shunt inductive peaking and negative capacitances. The inductive loads are realized as the on-chip spiral inductors in series with the biased NMOS loads, as shown in Fig. 2.23. The impedance seen at the output of the S2D is derived as $$Z_L(s) = r_{o2} + sL_s(1 + g_{m2}r_{o2})$$ (2.33) where $g_{m2}$ and $r_{o2}$ are the transconductance and output resistance of M2, respectively [40]. As indicated by (2.33), the equivalent circuit of the load is the output resistance of M2 in series with $L_{eq}$ , which is derived as follows [40]: $$L_{eq} = L_s(1 + g_{m2}r_{o2}) (2.34)$$ This greatly reduces the required inductor size, thus saving chip area. In addition, it makes the signal trace shorter, which results in smaller parasitics. The zero frequency induced by $L_s$ , which provides bandwidth extension, is derived as $$f_z = \frac{r_{o1}r_{o2}}{2\pi \cdot r_{o1} \cdot L_s(1 + g_{m2}r_{o2})} \approx \frac{1}{2\pi \cdot g_{m2}L_s}$$ (2.35) where $r_{ol}$ is the output resistance of M1. Even though the equivalent inductance $L_{eq}$ is a function of $r_{o2}$ , which is sensitive to process variations, the zero frequency is independent of $r_{o2}$ and the function of $g_{m2}$ , which is relatively well controlled by the current bias. In addition, including the gate-source capacitance of M2, (2.33) can be expressed as [40] $$Z_L(s) = r_{o2} + \frac{1 + g_{m2}r_{o2}}{1 + s^2 C_{ox2}L_s} \cdot sL_s$$ (2.36) where $C_{gs2}$ is the gate-source capacitance of M2. Therefore, inductance $L_s$ and the size of M2 should be chosen with care so that the parasitic capacitance does not turn the load impedance into a capacitive value at the frequency of interest. The negative capacitances are implemented by cross-coupling the inputs and outputs of the inverters with series capacitors, as shown in Fig. 2.24 [31]. By the Miller effect, the effective capacitance has a negative value. Considering both the shunt inductive peaking and negative capacitances, the coefficient *m* is expressed as $$m = \frac{L_{eq}}{r_{o2}C_{eff}} = \frac{\left(1 + g_{m2}r_{o2}\right)L_{s}}{r_{o2}^{2}C_{eff}}$$ (2.37) where $C_{eff}$ is the effective capacitance, including negative Miller capacitance, at the output of the S2D. Fig. 2.25 shows the simulated amplitude and phase delay responses of the S2D, which indicate that the single-to-differential gain and -3-dB bandwidth are 10.1 dB and 19.1 GHz. The peak-to-peak phase delay variation is reduced from 6.14 ps to 0.91 ps in the range of 1–20 GHz. In addition, because the first stage of the LA following the S2D is the inverter of the CMOS Cherry-Hooper amplifier, the output common-mode voltage is set to the inverter threshold voltage with a negative feedback loop and an inverter replica whose input and output are connected to each other. Fig. 2.25 Simulated frequency response of single-to-differential converter (a) amplitude and (b) phase delay. In order to achieve low power consumption in LA, the inverter-based amplifier is Fig. 2.26 Unit amplifier stage of LA. adopted for a unit amplifier stage. In addition, to meet the required bandwidth, a feed-back resistor is inserted as shown in Fig. 2.26, which can be thought of as a CMOS Cherry-Hooper amplifier in which the NMOS devices are replaced with pairs of PMOS and NMOS devices [33], [39]. At the initial design point, the sizes of the inverter are set to be same. Thus, when the unit amplifier stages are cascaded, the ratio of the time constants $\tau_X$ and $\tau_Y$ , which are introduced in section 2.2.3, is derived as $$\frac{\tau_{Y}}{\tau_{X}} = \left(\frac{R_{F}C_{X}}{1 + g_{m2}R_{Y}}\right) / \left(\frac{R_{Y}C_{Y}}{1 + g_{m2}R_{Y}}\right) \approx \frac{R_{Y}}{R_{F}}$$ $$(2.38)$$ where $R_Y$ is the output resistance of the second inverter. If the output resistance of the inverter is large enough to separate two poles, the unit amplifier stage can be approximated as first-order system. However, since it is comparable to $R_F$ in the range of our design, the required cell GBW will be somewhere between first-order and second-order lines in Fig. 2.12. In addition, the required gain of the LA can be obtained as shown in Table 2.2. With the target gain and bandwidth of 23 dB and 20 GHz, the required cell gain- | Parameter | Value | |---------------------------------|-------| | OMA [dBm] | -10 | | Extinction Ratio [dB] | 6 | | Photodiode Responsivity [A/W] | 0.7 | | TIA Gain [dB] | 41 | | S2D Gain [dB] | 10 | | Output Driver Gain [dB] | 3 | | Required Vout,diff,pk-pk [mV] | 500 | | Requred Limiting Amp. Gain [dB] | 23 | Table 2.2 Required gain of limiting amplifier. Fig. 2.27 Required cell GBW as a function of stages n for $A_{tot} = 23$ dB and $BW_{tot} = 20$ GHz. bandwidth product $GBW_c$ can be plotted as Fig. 2.27. As explained in section 2.2.3, a larger n leads to not only higher power consumption but also higher noise contributions of each stage to overall input-referred noise. Because the GBW of the unit cell ranges from 95 to 110 GHz and the required GBW will be somewhere between first- Fig. 2.28 Simulated frequency responses of limiting amplifier and entire optical front-end. order and second-order, the number of the stages are set to 2. Fig. 2.28 shows the simulated frequency responses of the LA and the entire optical front-end with the $C_{PD}$ of 120 fF, which indicate that the gains and 3-dB bandwidths of them are 23 dB, 74 dB $\Omega$ , 20.9 GHz, and 23.2 GHz, respectively. In addition, the eye diagrams of the optical front-end from post-layout simulation are shown in Fig. 2.29(a) and (b). Fig. 2.29 Simulated eye diagram of the optical front-end with (a) 100 $\mu A_{pk-pk}$ input and (b) 500 $\mu A_{pk-pk}$ input. # 2.4 Noise Analysis ## 2.4.1 Photodiode The noise of photocurrent can be categorized into the shot noise and the thermal noise. Because the current of photodiode consists of a flow of discrete electrons, fluctuation in the number of electrons creates a fluctuation in the signal current. The variances of those fluctuations $\sigma_{s1}^2$ and $\sigma_{s0}^2$ , which represent the noise power when the data are "1" and "0", respectively, can be expressed as $$\sigma_{\rm sl}^2 = 2e \cdot BW \cdot i_{\rm sl} \tag{2.39}$$ $$\sigma_{s0}^2 = 2e \cdot BW \cdot i_{s0} \tag{2.40}$$ where $i_{sl}$ and $i_{s\theta}$ represent the photocurrent when the data are "1" and "0", respectively, e is electron charge, and BW is the noise measurement bandwidth [41]. The thermal noise is associated with the shunt resistance of photodiode. This is due to the thermal generation of carriers and its power is expressed as $$\sigma_{th}^2 = \frac{4kT \cdot BW}{R_{SH}} \tag{2.41}$$ where k, T, and $R_{SH}$ are the Boltzmann constant, the absolute temperature, the shunt resistance of the photodiode, respectively. By substituting typical value of $R_{SH}$ , 1 G $\Omega$ , it can be shown that the shot noise is much more dominant than thermal noise. Therefore, the thermal noise of the photodiode will be ignored in our sensitivity calculation. ## 2.4.2 Optical Front-End Fig. 2.30 shows the noise sources in the optical front-end, where all of the noise generated in the post amplifier are represented by an equivalent noise source at the input of the post amplifier. The total input-referred noise of the optical front-end is expressed as $$\overline{I_{n,in}^2} = \overline{I_{n,in,nf}^2} + \overline{I_{n,in,na}^2} + \frac{\overline{V_{n,post}^2}}{\left|Z_T(s)\right|^2}$$ (2.42) where $\overline{I_{n,in,nf}^2}$ and $\overline{I_{n,in,na}^2}$ are input referred noises induced by feedback resistor and transistors of the TIA, respectively. A previous work find it convenient to calculate Fig. 2.30 Noise sources in the optical front-end. the noise at the output and to convert it to input-referred noise simply by dividing the result by the mid-band transimpedance [42]. The first two terms of (2.42) derived as follows [42]: $$\overline{I_{n,in,nf}^2} = \frac{A+1}{(AR_F - r_o)^2} \cdot \frac{kT}{C_o} \cdot \frac{A^2 R_F C_o + (A+1)r_o C_T}{r_o C_o + (R_F + r_o)C_T}$$ (2.43) $$\frac{\overline{I_{n,in,na}^2}}{\left(AR_F - r_o\right)^2} \cdot \frac{\gamma kT}{C_o} \cdot \frac{r_o C_o + (A+1)R_F C_T}{r_o C_o + (R_F + r_o)C_T}$$ (2.44) where A is the intrinsic gain of the CMOS inverter, and $C_T$ and $C_o$ are the total capacitances at the TIA input and output, respectively. Equations (2.43) and (2.44) can be computed as shown in Fig. 2.31 with the maximum achievable transconductance at given $f_{3dB}$ , $C_{PD}$ , $C_{pad}$ , and technology, which is plotted in Fig. 2.20. In addition, $C_o$ is assumed to be 70 fF. As explained in section 2.3.2, larger gain of the TIA results in lower noise and lower power efficiency. Therefore, it would be better to design the Fig. 2.31 Computed input referred noise of the TIA versus supply current at target bandwidth of 20 GHz. supply current in the range of 3–4 mA. Calculating all of the noise sources in the post amplifier is cumbersome and complicated. Moreover, because the noise contributed by the post amplifier is divided by a factor of transimpedance gain of the TIA, it is much smaller than the noise contributed by the TIA. From SPICE simulations, input-referred noise of the TIA and the optical front-end are 3.5 $\mu A_{rms}$ and 3.9 $\mu A_{rms}$ , respectively. These results will be used for sensitivity calculation in next section. ## 2.4.3 Sensitivity The sensitivity of the optical receiver can be calculated from the noise sources discussed in previous sections. The total noise power at the receiver input can be expressed as $$\sigma_1^2 = 2e \cdot BW \cdot i_{s1} + \overline{I_{n,in}^2}$$ (2.45) $$\sigma_0^2 = 2e \cdot BW \cdot i_{s0} + \overline{I_{n,in}^2}$$ (2.46) Using the probability density functions obtained from (2.45) and (2.46), the bit error rate (BER) of the received signal can be derived as $$BER = \frac{1}{2} \left[ erfc \left( \frac{i_D - i_{s0}}{\sqrt{2}\sigma_0} \right) + erfc \left( \frac{i_{s1} - i_D}{\sqrt{2}\sigma_1} \right) \right]$$ (2.47) where $i_D$ is the detection threshold [32]. The sensitivity of the optical receiver can be predicted using the simulated input-referred noise current of the optical front-end and the assumptions of ER and photodiode responsivity. Fig. 2.32 shows the calculated BER with respect to the input optical power, assuming the extinction ratio and PD Fig. 2.32 Calculated BER versus input optical power. responsivity are given as 6 dB and 0.7 A/W, respectively. It indicates that the input optical power for a BER of $10^{-12}$ corresponds to -10.3 dBm. # Chapter 3 # Design of ADCDR for Optical Receiver # 3.1 Overview In digital communications, the received signal should be sampled to be converted to digital bits. Therefore, receivers need to know when to sample it. In other words, sampling clock with proper frequency and phase is needed. We start by distinguishing two types of I/O clocking architectures, forwarded-clock and embedded-clock shown in Fig. 3.1(a) and (b). In the forwarded-clock architectures, the received clock has the same frequency with that of transmitter clock. However, because there is phase difference due to delay mismatch between the channels, the receiver needs a phase recovery circuit to adjust the skew. In contrast, in the embedded-clock architecture, the Fig. 3.1 Two types of I/O clocking architectures: (a) Forwarded-clock and (b) embedded-clock architecture. clocks generated in the receiver and the transmitter have not exactly the same but almost the same frequencies. This slight difference between two frequencies leads to a drifting phase. Therefore, the embedded-clock receiver needs clock recovery circuits tracking out phase drift as well as phase offset. In general, the clock recovery circuit in embedded-clock receiver is more complex and power hungry than phase recovery circuit in the forwarded-clock receiver. However, the forwarded-clock architecture requires an additional channel for clock including optical fiber and photodiode, which increases the cost significantly. Consequently, most of the optical communications are implemented with the embedded-clock architecture and this thesis will focus on it. Commonly used CDR topologies can be divided into three major categories as Fig. 3.2 PLL-based ADCDR [43]. follows [43]: 1) topologies using feedback control for phase tracking, including PLL, delay-locked loop (DLL), phase interpolator (PI), and injection locked (IL) structures. 2) An oversampling-based topology without feedback control. 3) Topologies tracking phase without feedback control, including gated oscillator and high-Q bandpass filter architectures. An oversampling-based topology is not suited to the high speed application such as optical communication due to burden on the samplers and clock generation circuits. In addition, in topologies without feedback control, the phase alignment between the received data and the recovered clock is sensitive to PVT variations [44]. For those reasons, topologies using feedback control are commonly used in optical communications. In later sections, PLL-based ADCDR shown in Fig. 3.2 will be studied in detail and circuit implementation of the proposed receiver will be described. # 3.2 Background on PLL-based ADCDR ### 3.2.1 Phase Detector In order to track the phase of the received signal, the CDR needs phase detector which generates an output signal in relation to the phase difference of the inputs. Phase detectors can be divided into the linear phase detector and the bang-bang phase detector. The implementation of Hogge phase detector, which is one of the linear phase detector, and its timing diagram are described in Fig. 3.3(a) and (b), respectively [45]. The pulse width of "Y" is proportional to the phase difference of the data and clock, and the pulse width of "X" is fixed to the half period of the clock. Thus, by subtracting X from Y, the phase error signal which has linear dependence between the output and Fig. 3.3 (a) Implementation of Hogge phase detector and (b) its timing diagram. Fig. 3.4 Transfer curve of Hogge phase detector. phase difference can be obtained. The transfer curve of the linear phase detector is shown in Fig. 3.4. Under locked condition, the pulse width of "X" and "Y" are the same, and thus, the rising edge of the clock is aligned to the center of the data eye. Hogge phase detector has the advantages of inherent data retiming and wide frequency acquisition range. In addition, the linear behavior make it easier to analyze the loop dynamics than bang-bang phase detector. However, in locked condition, it causes a transition-dependent jitter because of "triwaves" at the output of following integrator. In addition, it is difficult to implement at high speeds, where the pulse widths are too small to be processed in practical circuit implementations. Fig. 3.5(a) and (b) shows the implementation of Alexander phase detector, which is one of the bang-bang phase detector, and its timing diagram, respectively [46]. Data is sampled at three equidistant points "A", "B", and "C". In steady state, the clock is aligned to the middle of the data eye. Alexander phase detector can measure only the polarity of the phase difference, not the magnitude as shown in the transfer curve of Fig. 3.5 (a) Implementation of Alexander phase detector and (b) its timing diagram. Fig. 3.6 Transfer curve of Alexander phase detector. Fig. 3.6. This bang-bang characteristic leads to highly nonlinear dynamics which invalidate simple Laplace domain analysis. It will be discussed in section 3.2.4. Similar with Hogge phase detector, the retimed data obtained inherently in Alexander phase detector. However, this phase detector has small frequency capture range. Therefore, in order to avoid false locks in a CDR incorporating this phase detector, additional frequency acquisition circuits are necessary. In addition, the output of the bang-bang phase detector can be only +1 or -1, which are already digital signal. This property make it possible to use bang-bang phase detector in ADCDR without an additional analog-to-digital converter circuits. For that reason, the ADCDRs generally incorporate a bang-bang phase detector. ## 3.2.2 Digital Loop Filter A Digital Loop Filter (DLF) is a building block which generates the frequency control word (FCW) for digitally-controlled oscillator (DCO) from phase detector output. The DLF and its analog counterpart, analog loop filter of charge pump PLL Fig. 3.7 (a) Analog loop filter and (b) digital loop filter. (CPPLL), are shown is Fig. 3.7(a) and (b). The DLF is composed of proportional and integral paths, which behave like resistor and capacitor, respectively, of analog loop filter. The transfer functions of the analog loop filter and DLF are derived as, $$H(s) = \frac{V_{ctrl}}{I_{CP}} = \left(R + \frac{1}{sC}\right) = \frac{sRC + 1}{sC}$$ (3.1) $$H(z) = \left(\beta + \frac{\alpha}{1 - z^{-1}}\right). \tag{3.2}$$ By using the bilinear transformation, the transfer function of the DLF can be converted from z-domain to s-domain as follows: $$z = \frac{2 \cdot f_s + s}{2 \cdot f_s - s} \tag{3.3}$$ $$H(z) = \left(\alpha + \frac{\beta}{2}\right) + \frac{f_s \cdot \beta}{s} \tag{3.4}$$ where $f_s$ is a sampling frequency. From (3.1) and (3.4), the proportional gain $\alpha$ and the integral gain $\beta$ are expressed as $$\alpha = R - \frac{1}{2 \cdot C \cdot f_s} \tag{3.5}$$ $$\beta = \frac{1}{C \cdot f_s} \,. \tag{3.6}$$ Therefore, the loop dynamics of the ADCDR can be adjusted by simply changing the multiplication ratio in digital domain. Because the multiplier in DLF can be implemented with a bit-shifter and an adder, the proportional and integral gains can be changed easily in a wide range. ## 3.2.3 Digitally-Controlled Oscillator A DCO is a clock generation circuit whose frequency is controlled by the FCW from the DLF. Fig. 3.8 shows the transfer curves of the VCO and DCO. The output frequency and the phase of the DCO are derived as, $$f_{DCO} = K_{DCO} \cdot FCW[N:1] \quad [Hz] \tag{3.7}$$ $$\Phi_{DCO} = \frac{2\pi}{s} \cdot K_{DCO} \cdot FCW[N:1] \quad [rad]$$ (3.8) From above two equations, the transfer function of the DCO in s-domain is derived as, $$H_{DCO} = \frac{2\pi}{s} \cdot K_{DCO} \quad [\text{rad/LSB}]$$ (3.9) Because of the quantization nature of the digital control, the DCO shows quantized frequency control as shown in Fig. 3.8, which causes a limit cycle-cycling around the intended frequency. Therefore, fine frequency resolution is required for low quantization noise of the DCO. In addition, the DCO should cover wide frequency range to Fig. 3.8 Transfer curve of (a) VCO and (b) DCO. support multi-band application or large PVT variations. In order to meet both fine resolution and wide frequency range, a lot of frequency control elements are required and they cause increased parasitics and power consumption. Another metric for DCO is the step linearity. In general, thermometer control scheme is used in DCO to obtain good step linearity. There are two types of commonly used DCO, a ring-DCO and an LC-DCO. The ring-DCO is composed of odd number of identical inverting stages. The ring-DCO has wide tuning range and occupies small area compared to the LC-DCO. However, it shows relatively low maximum frequency and poor phase noise performance. Consequently, the LC-DCO is preferred in high-frequency applications. In general, the oscillation frequency of the ring-DCO is controlled by supply current [47] or resistors [48], [49] as shown in Fig. 3.9(a) and (b). The current controlled ring-DCO adjust the supply current of the oscillator core by switching the current source elements. The resistor controlled ring-DCO adjust the resistance between the Fig. 3.9 (a) Current controlled and (b) resistor controlled ring-DCO. supply and the supply node of the oscillator core by switching the PMOS resistors in parallel. The LC-DCO composed of an LC-tank and the active circuits to compensate the loss in the tank. The oscillation frequency can be tuned by adjusting capacitance or Fig. 3.10 LC-DCO using varactor array. Fig. 3.11 (a) PMOS capacitor and (b) its gate capacitance versus gate-source voltage. inductance, and capacitance tuning as shown in Fig. 3.10 is more common implementation [50]. The varactor array can be implemented by PMOS capacitors. Fig. 3.11(a) and (b) show the PMOS capacitor and its gate capacitance versus gate-source voltage, where the drain and source are tied together and body is tied to supply. The input digital code, FCW, changes the number of varactors operating in large and small capacitance region, and thus, the frequency of the DCO. Because the phase noise performance of the LC-DCO is generally superior to ring-DCO, the quantization noise Fig. 3.12 Glitch-less switching scheme. is more severe in the LC DCO. However, at given target tuning range, smaller unit varactor leads to more unit varactors required, and thus, the parasitic resistance and capacitance of routing wires are increased. The increased parasitics make Q-factor of the tank get lower, which results in poor phase noise performance. For that reason, varactor array may be implemented with partial thermometer code scheme, which is a combination of binary and thermometer code scheme, at the cost of slightly reduced tuning step linearity [51]. Another design issue of both ring-DCO and LC-DCO is switching noise. Simultaneous switching of unit varactors can cause a glitch, and thus, results in switching noise in the output. In order to solve this problem, [22] proposes the glitch-less switching scheme which guarantees only one unit varactor switching per one LSB change in the FCW. Its switching operation and implementation are described in Fig. 3.12 [22]. In addition, the two identical LC oscillators can be coupled each other to generate quadrature clocks. Fig. 3.13 shows the parallel-coupled quadrature VCO (P-QVCO), Fig. 3.13 Parallel-coupled quadrature VCO. Fig. 3.14 Linearized block diagram of parallel-coupled quadrature oscillator. which is the first and best known implementation [52]. In P-QVCO, two VCOs are coupled by transistors $M_{cpl}$ , which are in parallel with the switching transistors $M_{sw}$ . The operation and the cause of degraded phase noise performance are analyzed in [53]. Fig 3.14 shows the block diagram of the P-QVCO, where two oscillators are modeled by a positive feedback loop with open loop gain of $G(j\omega)$ and coupled by coefficients m and -m. In steady state, the relation between two outputs, X and Y, are expressed as follows [53]: $$(X - mY) \cdot G(j\omega) = X \tag{3.10}$$ $$(Y + mX) \cdot G(j\omega) = Y. \tag{3.11}$$ Using (3.10) and (3.11), it can be proved that X and Y are in quadrature relation as follows: $$X^2 + Y^2 = 0 ag{3.12}$$ $$X = \pm jY. (3.13)$$ Fig. 3.15 Magnitude and phase of resonator impedance: (a) lossless and (b) lossy resonator The new oscillation frequency $\omega$ can be found by substituting (3.13) into (3.10) or (3.11) as follows: $$(1 \pm jm) \cdot G(j\omega) = 1 \tag{3.14}$$ $$\phi(G(j\omega_1)) = -\tan m, \quad \phi(G(j\omega_2)) = \tan m. \tag{3.15}$$ There exist two possible oscillation frequency $\omega_1$ and $\omega_2$ as shown in (3.15) and Fig. 3.15(a). However, for typical resonator with lossy inductor, the magnitude of impedance peaks at a frequency higher than the resonant frequency as shown in Fig. 3.15(b) and expressed as, $$\omega_0 = \frac{1}{\sqrt{LC}} \sqrt{1 - \frac{CR_s^2}{L}} \ . \tag{3.16}$$ Fig. 3.16 Parallel-coupled quadrature VCO with phase shifters. As a consequence, there exist only one solution, $\omega_l$ in (3.15) [53]. However, as shown in Fig. 3.15(b), $\omega_l$ is located at a frequency which is not peak of the resonator impedance magnitude. It leads to degraded Q-factor of resonator, and thus, degraded phase noise performance. To solve this problem, [54] inserts two phase shifters before coupling circuits to make zero phase shift at oscillation frequency from the peak of the resonator impedance magnitude as shown in Fig. 3.16. Phase shift $\theta$ is equal to 90° minus any parasitic phase shift which arises in coupling circuits and $G(\omega)$ [54]. In this way, several works obtain superior phase noise performance [54]-[56]. However, phase shifters increase power consumption. Furthermore, the parasitic phase shift to be compensated is hard to predict exactly and can be varied with PVT variations. Fig. 3.17 (a) Top-series-coupled quadrature VCO and (b) bottom-series-coupled quadrature VCO. Another mechanism destroying the phase noise performance of the P-QVCO is the additional noise generated by the coupling devices [57]. In order to minimize those additional noise, two different coupling schemes are proposed in [58] and [59], which are top-series-coupled quadrature VCO (TS-QVCO) and bottom-series-coupled quadrature VCO (BS-QVCO), respectively. These VCOs are inserts coupling transistors $M_{cpl}$ in series with switching transistors $M_{sw}$ in a cascode fashion as shown Fig. 3.18 (a) Small signal circuit model for P-QVCO and BS-QVCO and (b) their phasor diagrams in Fig. 3.17(a) and (b). These coupling scheme greatly reduces the noise from the coupling devices. In addition, there has been another analysis comparing the P-QVCO and BS-QVCO [60]. Fig. 3.18 shows small signal circuit models for of those VCOs and their phasor diagrams. In this analysis, the phase noise performance comparison is based on the fact that the Q-factor of the LC-tank achieves the maximum value when the resonator phase shift between the current injected into the LC-tank and the output voltage of the LC-tank is zero [61]. In the case of the P-QVCO, $V_X$ is expressed as, [60] $$V_{X} = \frac{\left(g_{m,s} - \omega C_{gs,c}\right) + j\left(g_{m,c} + \omega C_{gs,s}\right)}{\left(g_{m,s} + g_{m,c} + \frac{1}{r_{X}}\right) + j\left(\omega C_{gs,s} + \omega C_{gs,c}\right)} \cdot V_{IM}.$$ (3.17) Assuming $(g_{m,s} + g_{m,c} + 1/r_x)$ is much larger than $(\omega C_{gs,s} + \omega C_{gs,c})$ , the phase shift between $V_X$ and $V_{IM}$ , $\theta_P$ , can be expressed as, $$\theta_{P} = \angle \left(\frac{V_{X}}{V_{IM}}\right) \approx \tan^{-1} \left(\frac{g_{m,c} + \omega C_{gs,s}}{g_{m,s} - \omega C_{gs,c}}\right). \tag{3.18}$$ It can be shown that $\theta_P$ cannot be zero. In addition, the current injected into the LC-tank $I_{LC}$ is expressed as, $$I_{LC} = -g_{m.s}(V_{lM} - V_X) - g_{m.c}(V_{OM} - V_X).$$ (3.19) Because the phase shift between $I_{LC}$ and $V_{IP}$ cannot be zero, the Q-factor of the LC-tank also cannot be maximized. On the other hand, with similar assumption, $V_X$ and $\theta_{BS}$ in the case of the BS-QVCO are expressed as follows [60]: $$V_{X} = \frac{\left(g_{m,s} - \omega C_{gd,c}\right) + j\left(\omega C_{gs,s} - g_{m,c}\right)}{\left(g_{m,s} + \frac{1}{r_{ds,c}}\right) + j\left(\omega C_{gs,s} + \omega C_{gs,c}\right)} \cdot V_{IM} . \tag{3.20}$$ $$\theta_{P} = \angle \left(\frac{V_{X}}{V_{IM}}\right) \cong \tan^{-1} \left(\frac{\omega C_{gs,s} - g_{m,c}}{g_{m,s} - \omega C_{gd,c}}\right)$$ (3.21) $\theta_{BS}$ can converge to zero by appropriately designing the sizes of the switching and coupling transistors. In addition, the current injected into the LC-tank $I_{LC}$ is expressed as, $$I_{LC} = -g_{m,s}(V_{IM} - V_X). (3.22)$$ In contrast to the P-QVCO, $I_{LC}$ in the BS-QVCO is nearly in-phase with $V_{IP}$ so that the Q-factor of the LC-tank can be maximized. #### 3.2.4 Analysis of Bang-Bang ADCDR As explained in section 3.2.1, the bang-bang phase detectors are widely used in the ADCDRs because their circuit implementations are simple, fast, accurate, and amenable to digital implementations [62], [63]. However, because the relation between the input and output of the bang-bang phase detector is varies with the magnitude of input vastly, it is difficult to model it as a simple linear system. There have been several efforts to analyze the bang-bang controlled loops including bang-bang PLLs, DLLs, and CDRs. They can be classified into two categories [63]: the ones that analyze the loop directly as a nonlinear system [62] and the ones that model the system as an equivalent linear system [63]-[69]. When the random noise is negligible compared to the quantization noise, the dithering and slewing determine the majority of the loop's steady-state characteristics. Therefore, the system is best modeled as a nonlinear one. On the other hand, when the random noise is sufficient to eliminate the limit cycle, the system can be modeled effectively as a linear one with stochastic sense. In this section, we will introduce one nonlinear analysis and one linear analysis of the Fig. 3.19 Block diagram of second-order bang-bang ADCDR. bang-bang ADCDR. For simplicity, the transition density of the data assumed to be one in nonlinear analysis. In [62], in order to analyze the loop directly as a nonlinear system, all of the signals in the bang-bang controlled loop, shown in Fig. 3.19, are derived in the time domain not the Laplace domain. The resulting equations are as follows [62]: $$\tau \triangleq \frac{\Delta t}{\beta K_T} \tag{3.23}$$ $$x_0 \triangleq \frac{T_r - T_{v0}}{\beta K_T} \tag{3.24}$$ $$R \triangleq \frac{\alpha}{\beta} \tag{3.25}$$ $$\tau_{k+1} = \tau_k + x_0 - R \cdot \psi_{k-D} - \text{sgn}(\tau_{k-D})$$ (3.26) $$\psi_{k+1} = \psi_k + \text{sgn}(\tau_{k+1}) \tag{3.27}$$ where $\alpha$ , $\beta$ , $K_T$ , and D are the integral path gain, the proportional path gain, the period gain constant of the DCO, and the loop delay, respectively. $T_r$ and $T_{v0}$ are one unit interval of the data and the free-running period of the DCO, respectively. The others are the signals as annotated in Fig. 3.19 and the subscript k on the symbols indicates the value at the kth sampling instant. Based on (3.26) and (3.27), the peak-to-peak and rms jitter in the second-order bang-bang ADCDR are derived as [62], $$\Delta t_{pp} = \beta K_T \left[ 2(1+D) + (1+D) \left( \frac{\alpha}{\beta} \right) + (1+D)^3 \left( \frac{\alpha}{\beta} \right)^2 + O\left( \frac{\alpha^3}{\beta^3} \right) \right]$$ (3.28) $$\sigma_{\Delta t}^2 \approx \frac{\Delta t_{pp}^2}{12} \,. \tag{3.29}$$ The first term of (3.28) is from the proportional control only and dominant source of the jitter when the proportional path gain is much larger than the integral path gain. Therefore, in order to minimize the output jitter, the loop delay, especially that of the proportional path, should be minimized. In addition, the optimum proportional path gain and achievable minimum output jitter are derived as follows [62]: $$\beta_{opt} = (1.3846 + 1.8846D)\alpha \tag{3.30}$$ $$\Delta t_{pp,opt} = 5.2191 \alpha K_T (1+D)^2. \tag{3.31}$$ The other analysis is to model bang-bang phase detector as a linearized gain element followed by an additive quantization noise source as shown in Fig. 3.20. In [63], the necessary condition for the bang-bang controlled system to be sufficiently linearized by the noise is derived as follows: $$K_{PD} < \frac{\pi}{2} \frac{1}{\phi_{bb} \left( N_d + 1 \right)} \tag{3.32}$$ where $\phi_{bb}$ and $N_d$ are the proportional path gain and the loop delay, respectively. When (3.32) is satisfied, the effective linearized gain of the phase detector and the variance of the quantization noise are derived as [63], $$K_{PD} = \sqrt{\frac{2}{\pi}} \frac{\alpha_T}{\sigma_e} \tag{3.33}$$ Fig. 3.20 Linearized model of bang-bang ADCDR. $$\sigma_q^2 = \alpha_T - \frac{2}{\pi} \alpha_T^2 \tag{3.34}$$ where $\phi_e$ and $\alpha_T$ are the phase error and the transition density of the input data, respectively. From the results of (3.33) and (3.34), the jitter transfer (JTRAN) and the total output jitter can be computed from following equations [63]. $$JTRAN = \frac{K_{PD}G(\omega)}{1 + K_{PD}G(\omega)}$$ (3.35) $$S_{\phi out}(\omega) = S_{\phi in}(\omega) \left| \frac{K_{PD}G(\omega)}{1 + K_{PD}G(\omega)} \right|^{2} + S_{\phi VCO,N}(\omega) \frac{1}{\left| 1 + K_{PD}G(\omega) \right|^{2}} + S_{\phi q}(\omega) \left| \frac{G(\omega)}{1 + K_{PD}G(\omega)} \right|^{2}$$ (3.36) where $S_{\phi in}$ , $S_{\phi VCO,N}$ , and $S_{\phi q}$ are power spectral densities of the input random jitter, VCO's phase noise, and bang-bang phase detector's quantization error, respectively. ### 3.3 Circuit Implementation #### 3.3.1 Overall Architecture Before implementing a CDR, some design choices should be considered such as parallelism and the type of the DCO will be used. Firstly, parallelism could relax the stringent speed requirement of the CDR circuits. However, it also introduces other issues such as clock skews and jitters [70]. In general, a preferable choice is depends Fig. 3.21 Fan-out-of-two CML buffers. Fig. 3.22 Simulated large signal gain and AC unity gain frequency of fan-out-oftwo CML buffers versus 1x transistor size. on the transistor characteristics in the given process. In order to deliver the DCO clocks and drive the samplers of the phase detector, the fan-out should be greater than one. A smaller fan-out ratio leads to more stages required to drive the loads while a lager one results in a smaller bandwidth. Simply, we target fan-out-of-two buffers shown in Fig. 3.21. At given fan out ratio, supply voltage, and target voltage swing, the only design parameter is the NMOS width of 1x stage. The simulated large signal gain from DC simulation and the unity gain frequency form AC simulation are plotted in Fig. 3.22. In the simulations, the device models which include gate resistance are used to get the similar results with those of post-layout simulations. Because the buffers cannot meets the unity gain frequency of 25 GHz, the clock buffers cannot deliver the DCO clocks without bandwidth extension techniques. Using bandwidth extension techniques such as inductive peaking in every buffer stage results in not only large area but also layout inefficiencies. Therefore, a half-rate CDR is preferable. In the half-rate CDR, quadrature clocks are required to obtain two data samples and two edge samples during one clock period. There are several ways to generate quadrature clocks. First, ring oscillators could be used to generate the quadrature clocks without additional circuitry or power consumption using evenly delayed signals through the positive feedback loop. Fig. 3.23(a) and (b) show four-stage and twostage ring oscillators with commonly used ratios between the delay and latch cell. The simulated oscillation frequencies and phase noise performances of them can be obtained by sweeping the transistor sizes as shown in Fig. 3.24. Assuming the voltage drop of controlled current elements or controlled resistors is about 0.4 V, the supply node of the cells are tied to 0.8-V supply voltage. In addition, wire capacitances of oscillation nodes are assumed to be 30 fF. The simulation results shows only twostage ring oscillator can oscillates at the frequency of 12.5 GHz. However, its low phase noise performance disqualifies this choice. A more attractive method in terms of the power consumption and phase noise is to couple two symmetrical LC oscillators so that the outputs are in a quadrature relation [52]-[60]. In the implemented receiver, an LC quadrature digitally-controlled oscillator (LC-QDCO) is used for quadrature clock generation. Fig. 3.23 (a) four-stage and (b) two-stage ring oscillators. Fig. 3.24 Simulated oscillation frequencies and phase noises of ring oscillators versus supply current. Fig. 3.25 Proposed ADCDR architecture. The implemented ADCDR is composed of four samplers, two 2-to-64 demultiplexers, phase detection logic, a digital logic block that includes a DLF and a frequency detector (FD), and an LC-QDCO, as shown in Fig. 3.25. The half-rate architecture adopted in the implemented receiver relaxes the required clock frequency. The incoming signal from the optical front-end is sampled by the four parallel samplers. The timing of the four parallel samplers is controlled by the four-phase clocks generated by the LC-QDCO. Under lock conditions, two of the four-phase clocks are aligned to the middle of the data eye, and the other two are aligned to the edge of the data eye. The phases of the clocks are adjusted by proportional and integral controls. As explained in section 3.2.4, it is critical to minimize the feedback loop latency of the proportional control to achieve small dither jitter, good stability, and high jitter tolerance [44], [71]. Hence, the proportional control signals are generated without demultiplexing, separated from the integral control path. The integral control signal is generated in the DLF from the demultiplexed edge and data samples. Additionally, in order to avoid false locks, the FD is implemented with digital logic, which compares the frequencies of the reference clock and divided-by-32 LC-QDCO clock. #### 3.3.2 Phase Detection Logic The implementation of the samplers and phase detection logic is shown in Fig. 3.26. The input data stream from the optical front-end is sampled by four parallel samplers. Then, the XOR operations between consecutive edge and data samples generate proportional control signals, which adjust the phases of the LC-QDCO clocks. The outputs of four XOR gates are latched and delivered to the proportional varactor bank in the LC-QDCO. In principle, because there is a half-bit period timing difference between the consecutive edge and data samples, the outputs of the XOR gates are invalid for a half-bit period as shown in Fig. 3.27. In the implemented receiver, two buffers are added on the earlier path of the XOR gate to align the edge and data samples as described in [71]. In this simple way, we can save four latches, which are the most power-hungry circuits in phase detection logic, for retiming the inputs of the XOR gates. All the circuits for the samplers and phase detection logic are implemented in CML for high-speed operation. Fig. 3.26 Samplers and phase detection logic. Fig. 3.27 Timing diagram of phase detection logic. #### 3.3.3 Digital Loop Filter The DLF is implemented with digital logic as depicted in Fig. 3.28. The DLF performs XOR operations between incoming demultiplexed samples, and integrates the difference of up- and down-bits with the divided-by-32 LC-QDCO clock. Taking advantage of the digital circuits, the integral gain of the loop filter can be adjusted flexibly without compromising the chip area, in contrast with its analog counterpart. A 10-bit frequency control word (FCW) is generated from the 24-bit integrated phase error. The eight MSBs of the FCW are thermometer-decoded to 30-bit coarse-tuning code, and the two LSBs of the FCW are thermometer-decoded to three-bit fine-tuning code. These coarse- and fine-tuning codes are fed to the integral varactor bank in the LC-QDCO. The FD is also implemented in order to make up for the limited capture range of Fig. 3.28 Digital loop filter. the CDR loop. The counter-based FD compares the frequencies of the divided-by-32 LC-QDCO clock and reference clock, as presented in [20]. The FD counts the number of rising edges of the divided-by-32 LC-QDCO clock while the reference clock is logically "high" and subtracts this value from 512, which represents the frequency error. The frequency-lock status is determined from the frequency error with a tolerance of $\pm 1$ . When the CDR loop is not frequency-locked, the proportional gain is set to 0, and the FCW is adjusted to the direction of reducing frequency error. Even after the CDR is frequency-locked, the FD continues to monitor whether the CDR loop is frequency-locked. In addition, the DLF and FD are synthesized with standard logic cells. #### 3.3.4 LC Quadrature DCO In the implemented receiver, four oscillation nodes of two symmetrical LC-DCO are coupled via NMOS transistors located at the bottom of the oscillators as shown in Fig. 3.29, which is bottom-series coupling scheme explained in section 3.2.3 [59]. Fig. 3.29 shows the circuit diagram of the implemented LC-QDCO. Two center-tapped 495-pH inductors are used. The oscillation frequency is tuned via two varactor banks. The integral varactor bank is composed of fine- and coarse-tuning cells as shown in Fig. 3.30 [51]. These fine- and coarse-tuning cells consist of a single PMOS capacitor and four PMOS capacitors in parallel, respectively, which means the tuning step of the coarse one is four times that of fine one. In addition, each coarse-tuning cell is accompanied by a local decoder for a two-dimensional control scheme. As explained earlier, the coarse-tuning code and fine-tuning code are generated from the Fig. 3.29 LC quadrature digitally-controlled oscillator. eight MSBs and two LSBs of the 10-bit FCW, respectively. In this way, the number of tuning cells is greatly reduced, leading to reduced parasitic effects, compared to the method in which FCW is thermometer-decoded to 62-bit tuning code at the expense of a slightly decreased tuning linearity. The 30-bit coarse-tuning code, which is composed of a 15-bit row code and 15-bit column code, is decoded in each coarse-tuning cell. To eliminate glitches during the control code transitions at the row boundaries, the inverted column code is fed to the local decoders in odd rows, as described in [22], [51]. The three-bit fine-tuning code is delivered to the fine-tuning cells. The Fig. 3.30 Integral varactor bank. Fig. 3.31 Proportional varactor bank. proportional varactor bank consists of binary weighted PMOS capacitors in parallel, which are selectively activated for the proportional gain control as shown in Fig. 3.31. It is adjusted by the proportional control signals upb[1:0] and dn[1:0], which are the outputs of the phase detection logic. The center frequency of the LC-QDCO can be adjusted by an auxiliary two-bit code, which switches the metal-insulator-metal (MIM) capacitors. Fig. 3.32 Simulated transient response of CDR. In addition, by implementing the current source of the oscillator using PMOS devices, the common-mode voltage of the output clock can be set for the proper operation of the subsequent CML clock buffers without additional bias circuitry. The simulated transient response of the CDR is shown in Fig. 3.32. In locked condition, FCW dithers between neighboring two codes and there are no error. When an error is generated in the transmitter intentionally, an error is occurred in the receiver too. ## **Chapter 4** # **Experimental Results** The optical receiver prototype was fabricated in a 65-nm CMOS technology. The active die area of the receiver chip is $1 \times 0.75$ mm<sup>2</sup>, and its chip micrograph is shown in Fig. 4.1. The TIA and LA dissipate 12.1 mW from the 1-V supply and the S2D and output buffer dissipate 23.6 mW from the 1.8-V supply. The ADCDR dissipates 218 mW from the 1.2-V supply. From the measured power consumption, the energy efficiencies of the optical front-end and entire receiver at an operating speed of 26.5 Gb/s are 1.35 pJ/bit and 9.6 pJ/bit, respectively. The receiver performances can be tested in the measurement setup shown in Fig. 4.2. A commercial photodiode of Albis Optoelectronics, PDCS20T, is used as optoelectronic converter. However, due to vibrations in the bare fiber and imperfect alignment it is hard to obtain meaningful results. For that reason, the receiver performances are measured by using electrical signals instead of photocurrents from a photodiode. In order to emulate a photodiode current Fig. 4.1 Chip micrograph. Fig. 4.2 Optical measurement setup. source, an off-chip series resistor is placed between the input of the receiver and a voltage signal generator. The measurement setup is depicted in Fig. 4.3. The parasitic capacitance at the TIA-side of the series resistor is about 40 fF by calculation, which Fig. 4.3 Alternative measurement setup. Fig. 4.4 Measured LC-QDCO tuning characteristics. is comparable to the output capacitance of the most advanced photodiode. The simulated bandwidth of the optical front-end using the same setup for measurement is 20.4 GHz. Carrier Freq 390.6 MHz Signal Track Off DANL Off Trig Free 390.624989 MHz Carrier Freq 390.6249888 MHz Carrier Power -25.75 dBm Atten 0.00 dB Mkr1 999.900 kHz 1.03ps 5.00 dB/dz 1.00 Fig. 4.5 Measured divided-by-32 recovered clock: (a) jitter histogram and (b) phase noise The measured tuning range of the LC-QDCO is 11.0-13.4 GHz, as shown in Fig. 4.4. The rms and peak-peak jitter of the recovered clock are measured to be 1.28 ps<sub>rms</sub> Fig. 4.6 Measured jitter tolerance (12.5 Gb/s 2<sup>7</sup>–1 PRBS). and 8.9 ps<sub>pk-pk</sub>, respectively, as shown in Fig. 4.5(a). The measured phase noise of the same clock are -102, -112, -115, and -125 dBc/Hz at 0.01-, 0.1-, 1-, and 10-MHz offsets, respectively, as shown in Fig. 4.5(b). The CDR locking range is 22–26.8 Gb/s, which is limited by the LC-QDCO tuning range. The receiver operates error free over four hours receiving a $2^7$ -1 PRBS signal with amplitude of 250 $\mu$ Apk-pk, corresponding to a BER of $10^{-14}$ with a 97.8 % confidence level. Because of the lack of equipment to verify a jitter tolerance of 26.5 Gb/s, the measurement is performed with a 12.5-Gb/s 2<sup>7</sup>–1 PRBS signal, which is equivalent to a 25 Gb/s signal whose every bit is repeated two times. With this lower-rate signal, the high-frequency tolerance may be over-estimated while the low-frequency tolerance may be under-estimated due to the reduced data transition [71]. Nonetheless, the measured jitter tolerance with a target BER of 10<sup>-12</sup> well exceeds the tolerance mask (dashed-line) specified in IEEE 802.3ba for 40/100 GbE [72], as shown in Fig. 4.6. Fig. 4.7 Measured sensitivity: (a) using $2^{7-1}$ PRBS and (b) using $2^{31-1}$ PRBS. The receiver sensitivity can be measured using a pulse pattern generator and the PRBS verifier implemented in the receiver prototype. The measured values using the $2^7-1$ PRBS signal for a BER of $10^{-12}$ are 106 and 184 $\mu A_{pk-pk}$ at data rates of 25 and 26.5 Gb/s, respectively, as shown in Fig. 4.7(a). In the case of using the $2^{31}-1$ PRBS signal, they are degraded to 192 and 225 $\mu A_{pk-pk}$ at data rates of 25 and 26.5 Gb/s, respectively, as shown in Fig. 4.7(b). It can be inferred that both front-end performance and phase drift in CDR adversely affect the BER performance. Table 4.1 and Table 4.2 summarize the performances of the optical front-end and entire receiver, respectively, in comparison with the recent works implemented in a CMOS process. Table 4.1 Front-End Performance Summary. | | This work | [8]<br>JSSC, 2009 | [9]*<br>ISSCC, 2012 | [9]* [10]<br>ISSCC, 2012 ISSCC, 2014 | [11]<br>JSSC, 2014 | [12]*<br>JSSC, 2015 | [13]*<br>OFC, 2013 | [14]<br>RFIC, 2014 | [15]<br>JSSC, 2015 | |------------------------------------------------------|--------------------------------------|--------------------|---------------------|--------------------------------------|--------------------|---------------------|-----------------------------------------|--------------------|--------------------| | Data rate [Gb/s] | 22 – 26.5 | 25 | 25 | 28 | 25 – 28 | 25 | 28 | 25 | 25 | | Power / ch [mW] | 35.7 | 93 | 44.4 | 28.8 | 137.5 | 68 | 30 | 4.25 | N/A | | Energy-efficiency<br>[pJ/bit] | 1.35 | 3.72 | 1.78 | 1.03 | 4.91 | 2.72 | 1.07 | 0.17 | N/A | | Transimpedance<br>[dBΩ] | 71 | 83 | 78.3 | 76.5 | 76.8 | 72.5 | 91.8 | N/A | N/A | | Sensitivity<br>(BER=10 <sup>-12</sup> )<br>[uApk-pk] | 106** @ 25.0Gb/s<br>184** @ 26.5Gb/s | 59***<br>@ 25 Gb/s | 219***<br>@ 22 Gb/s | 63**<br>@ 28 Gb/s | 86***<br>@ 25 Gb/s | 98***<br>@ 25 Gb/s | 91***<br>@ 25Gb/s<br>138***<br>@ 28Gb/s | 55**<br>@ 25 Gb/s | 41***<br>@ 25 Gb/s | | CMOS technology | 65 nm | 65 nm | 90 nm | 28 nm | 65 nm | 65 nm | 32 nm<br>SOI | 28 nm | 32 nm<br>SOI | <sup>\*</sup> Complete link including transmitter, \*\* Tested electrically, \*\*\* Tested optically and calculated from reported data. Table 4.2 Receiver Performance Summary. | | Chip Area [mm²] 0.75 2.47 | Jitter tolerance [Ul <sub>pp</sub> @ 10 MHz] <b>0.16</b> 0.3 | Recovered clock jitter [ps] 1.28 ps,rms, 8.9 ps,pp 1.01 ps,rms, 6.22 ps,pp | Energy-efficiency [pJ/bit] 9.58 10.4* | Power / ch [mW] 254* 260** | Data rate [Gb/s] $22-26.5$ $4 \times 24.94 - 25.2$ | This work [12] JSSC, 2015 | | |-------|---------------------------|--------------------------------------------------------------|----------------------------------------------------------------------------|---------------------------------------|----------------------------|----------------------------------------------------|---------------------------|--| | 65 nm | 23.3 | N/A | N/A | 8.68* | 217** | 4 × 25 | [73]<br>JSSC, 2011 | | | • | 0.13 | > 0.05 | N/A | 3.9* | 109** | 4 × 25 – 28.05 | [74]<br>JSSC, 2015 | | | 32 nm | 0.06 | 0.1 | N/A | 4.4 | 110 | 25 | [15]<br>JSSC, 2015 | | ### Chapter 5 ### **Conclusion** In this thesis, a 22- to 26.5-Gb/s optical receiver designed in CMOS technology has been presented. By adopting inverter-based amplifiers in the overall front-end architecture, the optical front-end achieves low power consumption. In order to optimize the noise performances and power efficiencies of the TIA and the LA, the appropriate design region is computed from the simulated parameters of a CMOS inverter. In addition, the frequency response of the optical front-end is optimized in terms of the amplitude response and phase delay variation in order to minimize additional jitter. The ADCDR is realized at a high data rate (22–26.5 Gb/s), demonstrating the feasibility of all-digital implementations of PLLs and CDRs for future optical interface circuits. The design choices are considered and the half-rate bang-bang CDR utilizing the LC-QDCO as a quadrature clock generator is proposed. In order to implement the ADCDR at a high data rate, several existing techniques are used such as direct proportional path control [22]. The prototype well satisfies the jitter tolerance specification for 40/100 GbE and exhibits high-sensitivity up to 106 and 184 $\mu A_{pk-pk}$ at data rates of 25 and 26.5 Gb/s, respectively. In addition, because of the inverter-based amplifiers in the optical frontend, the optical front-end achieves low power consumption, which corresponds to energy efficiency of 1.35 pJ/bit at a data rate of 26.5 Gb/s. ## **Bibliography** - [1] J. W. Goodman, F. J. Leonberger, S.-Y. Kung, and R. A. Athale, "Optical interconnections for VLSI systems," *Proc. IEEE*, vol. 72, no. 7, pp. 850–866, Jul. 1984. - [2] D. A. B. Miller, "Rationale and challenges for optical interconnects to electronics chips," *Proc. IEEE*, vol. 88, no. 6, pp. 728–749, Jun. 2000. - [3] I. A. Young, E. Mohammed, J. T. S. Liao, A. M. Kern, S. Palermo, B. A. Block, M. R. Reshotko, and P. L. D. Chang, "Optical I/O technology for terascale computing," *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 235–248, Jan. 2010. - [4] T. Toifl, M. Ruegg, R. Inti, C. Menolfi, M. Brandli, M. Kossel, P. Buchmann, P. A. Francese, T. Morf, "A 3.1mW/Gbps 30Gbps quarter-rate triple-speculation 15-tap SC-DFE RX data path in 32nm CMOS," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp.102–103. - [5] J. F. Bulzacchelli et al., "A 28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32-nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3232–3248, Dec. 2012. - [6] M. Hochberg, N. C. Harris, R. Ding, Y. Zhang, A. Novack, Z. Xuan, and T. Baehr-Jones, "Silicon Photonics: The Next Fabless Semiconductor Industry," *IEEE Solid-State Circuits Magazine*, vol. 5, no. 1, pp. 48–58, winter 2013. - [7] A. C. Carusone, H. Yasotharan, and T. Kao, "CMOS Technology Scaling Considerations for Multi-Gbps Optical Receivers With Integrated Photodetectors," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1832–1842, Aug. 2011. - [8] D. Li, G. Minoia, M. Repossi, D. Baldi, E. Temporiti, A. Mazzanti, and F. Svelto, "A Low-Noise Design Technique for High-Speed CMOS Optical Receivers," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1437–1447, Jun. 2014. [9] J. Proesel, C. Schow, and A. Rylyakov, "25Gb/s 3.6pJ/b and 15Gb/s 1.37pJ/b VCSEL-based optical links in 90nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2012, pp. 418–420. - [10] T.-C. Huang, T.-W. Chung, C.-H. Chern, M.-C. Huang, C.-C. Lin, and F.-L. Hsueh, "A 28Gb/s 1pJ/b shared-inductor optical receiver with 56% chip-area reduction in 28nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2014, pp. 144–145. - [11] T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, Y. Lee, and Y. Matsuoka, "A 25-to-28 Gb/s High-Sensitivity (–9.7 dBm) 65 nm CMOS Optical Receiver for Board-to-Board Interconnects," *IEEE J. Solid-State Circuits*, vol. 49, no. 10, pp. 2259–2276, Oct. 2014. - [12] P.-C. Chiang, J.-Y. Jiang, H.-W. Hung, C.-Y. Wu, G.-S. Chen, and J. Lee, "4×25 Gb/s Transceiver With Optical Front-end for 100 GbE System in 65 nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 573–585, Feb. 2015. - [13] J. E. Proesel, B. G. Lee, C. W. Baks, and C. L. Schow, "35-Gb/s VCSEL-Based optical link using 32-nm SOI CMOS circuits," in Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013, pp. 1–3. - [14] S. Saeedi and A. Emami, "A 25Gb/s 170 μW/Gb/s optical receiver in 28nm CMOS for chip-to-chip optical communication," in 2014 IEEE Radio Frequency Integrated Circuits Symposium, 2014, pp. 283–286. - [15] A. Rylyakov, J. E. Proesel, S. Rylov, B. G. Lee, J. F. Bulzacchelli, A. Ardey, B. Parker, M. Beakes, C. W. Baks, C. L. Schow, and M. Meghelli, "A 25 Gb/s Burst-Mode Receiver for Low Latency Photonic Switch Networks," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3120–3132, Dec. 2015. - [16] D. Lee, J. Han, G. Han, and S. M. Park, "An 8.5-Gb/s Fully Integrated CMOS Optoelectronic Receiver Using Slope-Detection Adaptive Equalizer," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2861–2873, Dec. 2010. [17] S. H. Huang and W. Z. Chen, "A 20-Gb/s optical receiver with integrated photo detector in 40-nm CMOS," in *IEEE A-SSCC Dig. Tech. Papers*, 2013, pp. 225–228. - [18] Y. Frans, N. Nguyen, B. Daly, Y. Wang, D. Kim, T. Bystrom, D. Olarte, and K. Donnelly, "A 1-4 Gbps quad transceiver cell using PLL with gate-current leakage compensator in 90nm CMOS," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2004, pp. 134–137. - [19] P. K. Hanumolu, G.-Y. Wei, U.-K. Moon, and K. Mayaram, "Digitally-Enhanced Phase-Locking Circuits," in *Proc. IEEE CICC*, 2007, pp. 361–368. - [20] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, "A 1.0-4.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control," *IEEE J. Solid-State Circuits*, vol. 46, no. 2, pp. 424–434, Feb. 2011. - [21] R. B. Staszewski, K. Muhammad, D. Leipold, C.-M. Hung, Y.-C. Ho, J. L. Wallberg, C. Fernando, K. Maggio, R. Staszewski, T. Jung, J. Koh, S. John, I. Y. Deng, V. Sarda, O. Moreira-Tamayo, V. Mayega, R. Katz, O. Friedman, O. E. Eliezer, E. de-Obaldia, and P. T. Balsara, "All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2278–2291, Dec. 2004. - [22] D.-H. Oh, D.-S. Kim, S. Kim, D.-K. Jeong, and W. Kim, "A 2.8Gb/s All-Digital CDR with a 10b Monotonic DCO," in *IEEE ISSCC Dig. Tech. Papers*, 2007, pp. 222–598. - [23] H. T. Friis, "Noise Figures of Radio Receivers," *Proc. IEEE*, vol. 32, no. 7, pp. 419–422, Jul. 1944. - [24] E. Sackinger and W. Guggenbuhl, "A high-swing, high-impedance MOS cascode circuit," *IEEE J. Solid-State Circuits*, vol. 25, no. 1, pp. 289–298, Feb. 1990. - [25] S. M. Park and C. Toumazou, "A packaged low-noise high-speed regulated cascode transimpedance amplifier using a 0.6µm N-well CMOS technology," - in *proc. ESSCIRC*, 2000, pp. 431–434. - [26] S. M. Park and H.-J. Yoo, "1.25-Gb/s regulated cascode CMOS transimpedance amplifier for Gigabit Ethernet applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 112–121, Jan. 2004. - [27] C. Kromer, G. Sialm, T. Morf, M. L. Schmatz, F. Ellinger, D. Erni, and H. Jackel, "A low-power 20-GHz 52-dBΩ transimpedance amplifier in 80-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 6, pp. 885–894, Jun. 2004. - [28] T. Nakahara, H. Tsuda, K. Tateno, N. Ishihara, and C. Amano, "High-sensitivity 1-Gb/s CMOS receiver integrated with a III–V photodiode by waferbonding," presented at the LEOS 2000 Spring Meeting. - [29] E. Säckinger, *Broadband Circuits for Optical Fiber Communication*. Hoboken, NJ: Wiley, 2005. - [30] R. P. Jindal, "Gigahertz-band high-gain low-noise AGC amplifiers in fine-line NMOS," *IEEE J. Solid-State Circuits*, vol. 22, no. 4, pp. 512–521, Aug. 1987. - [31] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2138–2146, Dec. 2003. - [32] G.-S. Jeong, H. Chi, K. Kim, and D.-K. Jeong, "A 20-Gb/s 1.27pJ/b low-power optical receiver front-end in 65nm CMOS," in *Proc. IEEE ISCAS*, 2014, pp. 1492–1495. - [33] E. M. Cherry and D. E. Hooper, "The design of wide-band transistor feedback amplifiers," *Proc. IEE*, vol. 110, no. 2, pp. 375–389, Feb. 1963. - [34] T. H. Lee, *The Design of CMOS Radio-Frequency Integrated Circuits*. Cambridge, U.K.: Cambridge Univ. Press, 1998. - [35] S. S. Mohan, M. D. M. Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth extension in CMOS with optimized on-chip inductors," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 346–355, Mar. 2000. [36] S.-H. Chu, W. Bae, G.-S. Jeong, S. Jang, S. Kim, J. Joo, G. Kim, and D.-K. Jeong, "A 22 to 26.5 Gb/s Optical Receiver With All-Digital Clock and Data Recovery in a 65 nm CMOS Process," *IEEE J. Solid-State Circuits*, vol. 50, no. 11, pp. 2603–2612, Nov. 2015. - [37] J. Kim and J. F. Buckwalter, "A 40-Gb/s Optical Transceiver Front-End in 45 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 615–626, Mar. 2012. - [38] HFAN-09.0.1 NRZ Bandwidth-HF Cutoff vs. SNR, Maxim Integrated Products, Inc., 2002. [Online]. Available: http://pdfserv.maximintegrated.com/en/an/AN3455.pdf - [39] T. Maekawa, S. Amakawa, N. Ishihara, and K. Masu, "Design of CMOS inverter-based output buffers adapting the cherry-hooper broadbanding technique," in *European Conference on Circuit Theory and Design*, 2009, pp. 511–514. - [40] Y.-H. Oh and S.-G. Lee, "An inductance enhancement technique and its application to a shunt-peaked 2.5 Gb/s transimpedance amplifier design," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 51, no. 11, pp. 624–628, Nov. 2004. - [41] B. E. A. Saleh, M. C. Teich, *Fundamentals of Photonics*. Hoboken, NJ: Wiley, 2007 - [42] F. Y. Liu, D. Patil, J. Lexau, P. Amberg, M. Dayringer, J. Gainsley, H. F. Moghadam, X. Zheng, J. E. Cunningham, A. V. Krishnamoorthy, E. Alon, and R. Ho, "10-Gbps, 5.3-mW Optical Transmitter and Receiver Circuits in 40-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 9, pp. 2049–2067, Sep. 2012. - [43] M. t Hsieh and G. E. Sobelman, "Architectures for multi-gigabit wire-linked clock and data recovery," *IEEE Circuits and Systems Magazine*, vol. 8, no. 4, pp. 45–57, Fourth 2008. - [44] R. C. Walker, "Designing bang-bang PLLs for clock and data recovery in serial data transmission systems," in *Phase-Locking in High Performance Systems*, B. Razavi, Ed. New York: IEEE Press, 2003, pp. 34-45. [45] Hogge, "A self correcting clock recovery circuit," *IEEE J. Lightwave Technology*, vol. 3, no. 6, pp. 1312–1314, Dec. 1985. - [46] J. D. H. Alexander, "Clock recovery from random binary signals," *Electronics Letters*, vol. 11, no. 22, pp. 541–542, Oct. 1975. - [47] W. Yin, R. Inti, A. Elshazly, and P. K. Hanumolu, "A TDC-less 7mW 2.5Gb/s digital CDR with linear loop dynamics and offset-free data recovery," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2011, pp. 440–442. - [48] D.-S. Kim, H. Song, T. Kim, S. Kim, and D.-K. Jeong, "A 0.3–1.4 GHz All-Digital Fractional-N PLL With Adaptive Loop Gain Controller," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2300–2311, Nov. 2010. - [49] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, "A 1.0–4.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control," *IEEE J. Solid-State Circuits*, vol. 46, no. 2, pp. 424–434, Feb. 2011 - [50] R. B. Staszewski, C.-M. Hung, D. Leipold, and P. T. Balsara, "A first multigigahertz digitally controlled oscillator for wireless applications," *IEEE Trans. Microwave Theory and Techniques*, vol. 51, no. 11, pp. 2154 2164, Nov. 2003. - [51] N. Da Dalt, C. Kropf, M. Burian, T. Hartig, and H. Eul, "A 10b 10GHz digitally controlled LC oscillator in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2006, pp. 669–678. - [52] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, "A 900 MHz CMOS LC-oscillator with quadrature outputs," in *IEEE ISSCC Dig. Tech. Papers*, 1996, pp. 392–393. - [53] T.-P. Liu, "A 6.5 GHz monolithic CMOS voltage-controlled oscillator," in *IEEE ISSCC Dig. Tech. Papers*, 1999, pp. 404–405. - [54] P. van de Ven, J. van der Tang, D. Kasperkovitz, and A. van Roermund, "An optimally coupled 5 GHz quadrature LC oscillator," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2001, pp. 115–118. [55] P. Vancorenland and M. Steyaert, "A 1.57 GHz fully integrated very low phase noise quadrature VCO," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2001, pp. 111–114. - [56] A. Ravi, K. Soumyanath, R. E. Bishop, B. A. Bloechel, and L. R. Carley, "An optimally transformer coupled, 5 GHz quadrature VCO in a 0.18 μm digital CMOS process," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2003, pp. 141–144. - [57] P. Andreani, A. Bonfanti, L. Romano, and C. Samori, "Analysis and design of a 1.8-GHz CMOS LC quadrature VCO," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1737–1747, Dec. 2002. - [58] P. Andreani, "A low-phase-noise low-phase-error 1.8 GHz quadrature CMOS VCO," in *IEEE ISSCC Dig. Tech. Papers*, 2002, vol. 1, pp. 290–466 vol.1. - [59] P. Andreani, "A 2GHz, 17% tuning range quadrature CMOS VCO with high figure-of-merit and 0.6° phase error," in *Proc. ESSCIRC*, Sep. 2002, pp. 815–818. - [60] H.-J. Chang, C. Lim, and T.-Y. Yun, "CMOS QVCO With Current-Reuse, Bottom-Series Coupling, and Forward Body Biasing Techniques," *IEEE Microwave and Wireless Components Letters*, vol. 24, no. 9, pp. 608–610, Sep. 2014. - [61] B. Razavi, "A study of phase noise in CMOS oscillators," *IEEE J. Solid-State Circuits*, vol. 31, pp. 331–343, Mar. 1996. - [62] N. Da Dalt, "A design-oriented study of the nonlinear dynamics of digital bang-bang PLLs," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 52, no. 1, pp. 21–31, Jan. 2005. - [63] M. J. Park and J. Kim, "Pseudo-Linear Analysis of Bang-Bang Controlled Timing Circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 6, pp. 1381–1394, Jun. 2013. [64] Y. Choi, D.-K. Jeong, and W. Kim, "Jitter transfer analysis of tracked over-sampling techniques for multigigabit clock and data recovery," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 775–783, Nov. 2003. - [65] B.-J. Lee, M.-S. Hwang, S.-H. Lee, and D.-K. Jeong, "A 2.5-10-Gb/s CMOS transceiver with alternating edge-sampling phase detection for loop characteristic stabilization," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1821–1829, Nov. 2003. - [66] N. D. Dalt, "Markov Chains-Based Derivation of the Phase Detector Gain in Bang-Bang PLLs," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 53, no. 11, pp. 1195–1199, Nov. 2006. - [67] B. Chun and M. P. Kennedy, "Statistical Properties of First-Order Bang-Bang PLL With Nonzero Loop Delay," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 55, no. 10, pp. 1016–1020, Oct. 2008. - [68] N. Da Dalt, "Linearized Analysis of a Digital Bang-Bang PLL and Its Validity Limits Applied to Jitter Transfer and Jitter Generation," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 55, no. 11, pp. 3663–3675, Dec. 2008. - [69] J. Lee, K. S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, Sep. 2004. - [70] J. Lee and K.-C. Wu, "A 20Gb/s full-rate linear CDR circuit with automatic frequency acquisition," in *IEEE ISSCC Dig. Tech. Papers*, 2009, p. 366–367,367. - [71] J.-K. Kim, J. Kim, G. Kim, and D.-K. Jeong, "A Fully Integrated 0.13-μm CMOS 40-Gb/s Serial Link Transceiver," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1510–1521, May 2009. - [72] *IEEE 802.3ba Standard*, 802.3ba, 2010 [Online]. Available: http://www.ieee802.org/3/ba/index.html [73] G. Ono, K. Watanabe, T. Muto, H. Yamashita, K. Fukuda, N. Masuda, R. Nemoto, E. Suzuki, T. Takemoto, F. Yuki, M. Yagyu, H. Toyoda, M. Kono, A. Kambe, S. Umai, T. Saito, and S. Nishimura, "A 10:4 MUX and 4:10 DEMUX Gearbox LSI for 100-Gigabit Ethernet Link," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 3101–3112, Dec. 2011. [74] H. Won, T. Yoon, J. Han, J.-Y. Lee, J.-H. Yoon, T. Kim, J.-S. Lee, S. Lee, K. Han, J. Lee, J. Park, and H.-M. Bae, "A 0.87 W Transceiver IC for 100 Gigabit Ethernet in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, Feb. 2015. ## 초 록 본 논문에서는 65 nm CMOS 공정에서 올 디지털 클럭 및 데이터 복원회로를 적용한 22-26.5 Gb/s 광 수신기를 제안한다. 수신기는 광 프런트엔드(front-end)와 하프 레이트(half-rate) 뱅뱅 클럭 및 데이터 복원 회로를 포함한다. 광 프런트 엔드는 인버터(inverter) 기반의 증폭기를 이용하여적은 파워를 소모하며, 여러 가지 대역폭 확장 기술을 적용하여 충분한대역폭을 확보하였다. 또한 추가로 발생되는 지터를 최소화 하기 위해 진폭 응답뿐만 아니라 위상 응답 또한 고려되었다. 올 디지털 클럭 및 데이터 복원 회로는 10 GHz 이상의 주파수에서 작은 지터 특성을 얻기 위해 LC 직각 위상 발진기를 사용하였다. 복원된 클럭의 지터는 1.28 psms 로족정되었으며, 지터 내성은 IEEE 802.3ba 에 명시된 사양을 만족하였다. 비트 에러율 10-12 에 해당하는 수신기의 감도는 25 와 26.5 Gb/s 에서 각각 106 과 184 μΑpk-pk 로 측정되었다. 제작된 수신기 칩은 0.75 mm²의 면적을 가지며, 26.5 Gb/s 에서 254 mW 의 파워를 소모하였다. 26.5 Gb/s 에서 광프런트 엔드와 수신기 전체의 에너지 효율은 각각 1.35 와 9.58 pJ/bit 에해당한다. 주요어 : 광 수신기, 디지털 클럭-데이터 복원기, 트랜스임피던스 증폭기, 제한 증폭기, 뱅뱅 위상 검출기, LC 직각 위상 발진기 학 번:2012-30235