# DESIGN OF ENERGY-EFFICIENT A/D CONVERTERS WITH PARTIAL EMBEDDED EQUALIZATION FOR HIGH-SPEED WIRELINE RECEIVER **APPLICATIONS** ### A Dissertation by ### EHSAN ZHIAN TABASY Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of ### DOCTOR OF PHILOSOPHY Chair of Committee, Samuel Palermo Committee Members, Jose Silva-Martinez Henry Pfister Eun Kim Head of Department, Miroslav M. Begovic May 2015 Major Subject: Electrical Engineering Copyright 2015 Ehsan Zhian Tabasy #### ABSTRACT As the data rates of wireline communication links increases, channel impairments such as skin effect, dielectric loss, fiber dispersion, reflections and cross-talk become more pronounced. This warrants more interest in analog-to-digital converter (ADC)-based serial link receivers, as they allow for more complex and flexible back-end digital signal processing (DSP) relative to binary or mixed-signal receivers. Utilizing this back-end DSP allows for complex digital equalization and more bandwidth-efficient modulation schemes, while also displaying reduced process/voltage/temperature (PVT) sensitivity. Furthermore, these architectures offer straightforward design translation and can directly leverage the area and power scaling offered by new CMOS technology nodes. However, the power consumption of the ADC front-end and subsequent digital signal processing is a major issue. Embedding partial equalization inside the front-end ADC can potentially result in lowering the complexity of back-end DSP and/or decreasing the ADC resolution requirement, which results in a more energy-efficient receiver. This dissertation presents efficient implementations for multi-GS/s time-interleaved ADCs with partial embedded equalization. First prototype details a 6b 1.6GS/s ADC with a novel embedded redundant-cycle 1-tap DFE structure in 90nm CMOS. The other two prototypes explain more complex 6b 10GS/s ADCs with efficiently embedded feed-forward equalization (FFE) and decision feedback equalization (DFE) in 65nm CMOS. Leveraging a time-interleaved successive approximation ADC architecture, new structures for embedded DFE and FFE are proposed with low power/area overhead. Measurement results over FR4 channels verify the effectiveness of proposed embedded equalization schemes. The comparison of fabricated prototypes against state-of-the-art general-purpose ADCs at similar speed/resolution range shows comparable performances, while the proposed architectures include embedded equalization as well. To Sousan my mother, my hero #### ACKNOWLEDGEMENTS There are many people who have impacted my life, and so in many ways this dissertation. Unfortunately, I'm prone to have forgotten mentioning some people with some, hopefully low, probability; we are engineers after all, and everything is happening with a probability in this vast universe. First and foremost, I'd like to thank my talented advisor Prof. Samuel Palermo. Thank you for believing in me and my research and keeping me on track. Thank you for showing me how a serious research should be organized and followed, and thank you for molding me into a better engineer. I'd like to thank my friends and colleagues in Sam's group. Ayman, Cheng, Younghoon, Byungho, Osama, Noah, and Shengchang. Thank you Ayman. We started this research together and it would never finish without your help and so many discussions that I will cherish forever and miss for sure. I should thank Prof. Edgar Sanchez-Sinencio, Prof. Jose Silva-Martinez, Prof. Sebastian Hoyos, and Prof. Kamran Entesari in Analog and Mixed-Signal Center, for their teachings. I would also like to thank Prof. Henry Pfister and Prof. Eun Kim for serving on my Ph.D. committee. Also, I appreciate the support of Semiconductor Research Corporation (SRC) and National Science Foundation (NSF) for supporting this research. Many people have made my stay in Analog and Mixed-Signal Center of Texas A&M University memorable. Thank you Hajir for beside being a great friend always let me pick your brain with my nerdy discussions. Thank you Alireza, Masoud and Samira, Vahid, Shokoufeh, Negar, Mohammadhossein, Mohan, Saman, CJ, Shiva, and the list goes on and on. It's been a ride! There are so many other friends in College Station, Texas, that I should thank; Masoud, Kamyar, Sardar, Amirhossein, Ali, Morteza, Armin, and others. I may have forgot to mention many others by name but you know who you are. Thank you. I'd like to thank my previous advisors during Bachelor's and Master's programs as well. Thank you Prof. Lotfi for introducing me to the beautiful world of Analog Integrated Circuit Design many years ago in Ferdowsi University of Mashhad, and thank you Prof. Shoaei, Prof. Kamarei, and Prof. Ashtiani for your teachings in University of Tehran. I'd like to thank my other friends outside Texas A&M University, Ali, Saman and Mahsa. You were the sanctuary I ran to whenever I needed to escape for a short time from my research and everyday life in order to come back all refreshed and focused. At last but not the least, thank you Yaser, Omid and Soudabeh, my brothers and sister for always being there for me, although physically you are thousands of miles away from me. Thank you mom for always believing in me, even at times that I didn't myself! With all my heart, I dedicate this dissertation to you. # TABLE OF CONTENTS | | | | Page | |----|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------| | AF | BSTR | ACT | . ii | | DE | EDIC. | ATION | . iv | | A( | CKNC | OWLEDGEMENTS | . v | | TA | BLE | OF CONTENTS | . vii | | LI | ST O | F FIGURES | . ix | | LI | ST O | F TABLES | . xvii | | 1. | INT | RODUCTION | . 1 | | | 1.1 | Dissertation Organization | . 3 | | 2. | BAC | CKGROUND ON HIGH-SPEED ADC-BASED RECEIVERS | . 6 | | | 2.1 | Time-Interleaving Challenges 2.1.1 Offset Mismatch 2.1.2 Gain Mismatch 2.1.3 Phase Mismatch 2.1.4 Phase Random Jitter High-Speed Track-And-Holds 2.2.1 T/H Basics 2.2.2 Open-Loop T/H Architectures | . 7<br>. 9<br>. 11<br>. 14<br>. 16 | | | 2.3 | High-Speed Sub-ADC Architectures | . 29<br>. 34 | | | 2.4 | High-Speed Link Receivers | . 48 | | 3. | 6-BI | T 1.6-GS/S ADC WITH EMBEDDED REDUNDANT CYCLE DFE | . 54 | | | 3.1<br>3.2 | Embedded Feedback Equalization Modeling | | | | | 3.2.1 | Loop-Unrolled 1-Tap Embedded DFE | 60 | |----|------|--------|---------------------------------------------------------------------|------| | | | 3.2.2 | Redundant-Cycle 1-Tap Embedded DFE | 62 | | | | 3.2.3 | Critical Delay Path | 64 | | | | 3.2.4 | Switched-Capacitor Implementation | 65 | | | 3.3 | ADC I | Design | 67 | | | | 3.3.1 | Time-Interleaved Architecture | 67 | | | | 3.3.2 | Unit ADC with Embedded 1-Tap DFE | 68 | | | | 3.3.3 | Front-End Track-and-Hold $(T/H)$ | 72 | | | | 3.3.4 | On-Die Offset and Clock-Skew Calibration | 73 | | | 3.4 | Measu | rement Results | 77 | | | | 3.4.1 | Core ADC Characterization | 77 | | | | 3.4.2 | Embedded DFE Functionality | 79 | | | 3.5 | Conclu | ısion | 84 | | 4. | 6-BI | T 10-G | S/S ADC WITH EMBEDDED EQUALIZATION | 87 | | | 4.1 | A 6-Bi | it $10 \mathrm{GS/s}$ ADC with Embedded 2-Tap FFE and 1-Tap DFE | 87 | | | | 4.1.1 | Embedded Equalization Modeling | 89 | | | | 4.1.2 | SAR ADC with Low-Overhead Embedded FFE and DFE | 94 | | | | 4.1.3 | ADC Design | 98 | | | | 4.1.4 | Experimental Results | 106 | | | | 4.1.5 | Performance Summary | 117 | | | | 4.1.6 | Conclusion | 118 | | | 4.2 | A 6-Bi | it $10 \text{GS/s}$ ADC with Extended-Range Embedded 3-Tap FFE $$ . | 120 | | | | 4.2.1 | SAR ADC with Extended-Range 3-Tap Embedded FFE $$ | 120 | | | | 4.2.2 | ADC Design | 122 | | | | 4.2.3 | Experimental Results | | | | | 4.2.4 | Performance Summary | 132 | | | | 4.2.5 | 10Gb/s ADC-Based Receiver with Dynamically-Enabled Dig- | | | | | | ital Equalization | 132 | | | | 4.2.6 | Conclusion | 138 | | 5. | CON | NCLUSI | ION AND FUTURE WORK | 140 | | | 5.1 | Concli | ısion | 141 | | | 5.2 | | amendations for Future Work | | | | | 5.2.1 | Hybrid RX with Dynamically-Enabled Front-End ADC | | | | | | · · · · · · · · · · · · · · · · · · · | | | ΡĮ | RATE | ENCES | | 1/17 | # LIST OF FIGURES | FIGUR | E | Page | |-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | 1.1 | A high-speed electrical link system with an ADC-based receiver | 2 | | 2.1 | Simplified block diagram of an N-way time-interleaved ADC | 7 | | 2.2 | Two-way time-interleaved ADC with offset mismatch | 8 | | 2.3 | Simulated output spectrum of a two-way time-interleaved ADC with offset mismatch for two different sets of input frequencies and amplitudes | 9 | | 2.4 | Two-way time-interleaved ADC with gain mismatch | 10 | | 2.5 | Simulated output spectrum of a two-way time-interleaved ADC with gain mismatch for two different sets of input frequencies and amplitudes | s. 11 | | 2.6 | Two-way time-interleaved ADC with phase mismatch | 12 | | 2.7 | Simulated output spectrum of a two-way time-interleaved ADC with phase mismatch for two different sets of input frequencies and amplitudes | 14 | | 2.8 | (a) Simple T/H, b) practical open-loop T/H, and (c) a conventional implementation of closed-loop T/H | 17 | | 2.9 | Source-follower buffer using (a) an ideal current source, and (b) a simple PMOS current source | 19 | | 2.10 | Pseudo-differential source follower based T/H stage using simple NMOS switches | 20 | | 2.11 | Simple switch architectures: (a) Single NMOS or PMOS switch, and (b) complementary MOS switch also known as transmission gate | 21 | | 2.12 | On-resistance of NMOS, PMOS and transmission-gate switches versus input voltage amplitude (Wn = $10\mu m$ , Wp = $20\mu m$ , with minimum length L = $100nm$ , and $V_{DD}=1V$ in 90nm CMOS technology) | 22 | | 2.13 | strapped switch | |------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 2.14 | Pseudo-differential source-follower based buffer using negative capacitance (a) in a positive feedback configuration, and (b) in a feed-forward configuration. | | 2.15 | Differential flipped-voltage follower based buffer architectures: (a) Conventional low-swing FVF, (b) folded FVF case 1, and (c) folded FVF case 2 | | 2.16 | Pseudo differential flipped-voltage follower based buffer with feedback capacitors | | 2.17 | Linearity performance of T/H with FVF-based buffer and $750mV_{pp}$ output swing | | 2.18 | Basic structure of a flash ADC | | 2.19 | Basic structure of a SAR ADC with binary-weighted capacitive DAC. | | 2.20 | (a) SAR energy versus resolution, along with the individual components contribution. (b) Energy comparison between SAR and flash ADCs as a function of resolution | | 2.21 | Schematic of a CML based CMOS comparator | | 2.22 | Schematic of a StrongArm dynamic comparator. (a) Basic schematic, and (b) schematic with extra devices to discharge internal nodes during reset phase for reduced memory effects | | 2.23 | Schematic of (a) the double-tail dynamic comparator proposed by Schinkel, and (b) the two-stage modified dynamic comparator proposed by Goll | | 2.24 | Input-referred noise (a) transient simulation setup, and (b) CDF for a designed Goll two-stage comparator in 65nm CMOS technology | | 2.25 | The simplified operation of a capacitive DAC with merged capacitor switching scheme in a 3-bit SAR ADC: (a) sampling phase, (b) first bit cycle, (c) second bit cycle, and (d) third bit cycle | | 2.26 | A common implementation of the SAR control logic in a 6-bit ADC | | 2.27 | Example of a backplane system cross-section | | 2.28 | (a) Frequency response and pulse response of three channels. (b) Eye diagrams after channels without equalization | 46 | |------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| | 2.29 | (a) Frequency response and equalized pulse response of three channels under study. (b) Eye diagrams after channels with equalization | 48 | | 2.30 | Block diagram of a receiver feed-forward equalizer | 50 | | 2.31 | Block diagram of a receiver decision feedback equalizer with direct feedback taps | 51 | | 2.32 | Simplified block diagram of a 1-tap DFE using (a) direct feedback implementation, and (b) loop-unrolled technique to relax critical delay path | 52 | | 2.33 | Common pulse amplitude modulation schemes in serial links: simple PAM-2 (1 bit/symbol) and PAM-4 (2 bits/symbol) | 52 | | 3.1<br>3.2 | A high-speed link with an ADC-based receiver | 55<br>57 | | 3.3 | (a) Magnitude and (b) 1.6Gb/s pulse responses of three FR4 channels. (c) Impact of including one tap of embedded DFE equalization for different levels of TX-FIR equalization, and (d) impact of ADC resolution with embedded DFE and embedded IIR equalization with no TX FIR equalization over three FR4 channels | 59 | | 3.4 | DFE implementations: (a) direct-feedback, and (b) loop-unrolled | 61 | | 3.5 | Conceptual schematic of a unit SAR ADC with (a) loop-unrolled, and (b) proposed redundant cycle 1-tap embedded DFE | 63 | | 3.6 | Conceptual schematic of a unit SAR ADC (a) with redundant cycle 2-tap embedded DFE, and (b) with loop-unrolled 2-tap embedded DFE. | 65 | | 3.7 | Critical delay path for the redundant cycle 1-tap embedded DFE. The instants when the summation and sampling in the 1-tap embedded DFE occur are shown. | 66 | | 3.8 | SAR ADC with embedded 1-tap DFE: (a) simplified block diagram, operation during the (b) sampling phase, (c) first MSB evaluation, and (d) second MSB evaluation | 68 | | 3.9 | Block diagram of the 16-way time-interleaved SAR ADC with embedded 1-tap DFE | 69 | |------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| | 3.10 | Unit SAR ADC schematic with redundant cycle embedded 1-tap DFE. | 69 | | 3.11 | Schematic of the 4-input comparator with offset calibration current DACs | 71 | | 3.12 | Temperature dependency of residual unit ADC offset calibrated at $27^{\circ}C$ room temperature | 71 | | 3.13 | Front-end T/H: (a) schematic, and (b) bootstrapped switch structure. | 74 | | 3.14 | Simulated front-end T/H buffer frequency response | 75 | | 3.15 | Simplified diagrams of the foreground (a) offset calibration, and (b) clock skew calibration setups | 76 | | 3.16 | Prototype ADC implemented in an LP 90nm CMOS process: (a) chip micrograph, and (b) optimized order of unit ADCs with respect to spacing between each two consecutive ADCs | 78 | | 3.17 | Custom test board for the prototype 1.6GS/s ADC implemented in an LP 90nm CMOS process | 79 | | 3.18 | ADC SNDR/SFDR vs. input frequency at $f_s = 1.6 \mathrm{GHz}.$ | 80 | | 3.19 | The 1.6GS/s ADC normalized output spectrum for $f_{in}=48.437$ MHz. | 81 | | 3.20 | DNL/INL plots with $f_{in}=2.7$ MHz at $f_s=1.6$ GHz | 82 | | 3.21 | Measured DFE tap coefficient range and resolution using a DC input voltage | 82 | | 3.22 | $1.6 \text{Gb/s}$ ADC input generated by $2^{23}-1$ PRBS after a 2-tap FIR with 15dB de-emphasis, and measured digitized 6b ADC output (b) without, and (c) with 1-tap DFE enabled | 83 | | 3.23 | Measured bathtub curves for the (a) 30-inch smooth, (b) 28-inch notch, and (c) 46-inch higher-loss FR4 channels shown in Fig. 3.3, with and without 1-tap embedded DFE for a $2^{10}-1$ PRBS input with $1V_{pp}$ TX swing and no TX equalization | 83 | | | 1, pp 111 5, mg ma no 111 oquanzanom | $\odot$ | | 3.24 | Measured bathtub curves for the (a) 30-inch smooth, and (b) 28-inch notch FR4 channels shown in Fig. 3.3, with and without 1-tap embedded DFE for a $2^{10} - 1$ PRBS input with $300mV_{pp}$ TX swing and no TX equalization | 83 | |------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| | 4.1 | Block diagrams of (a) digital versus embedded DFE, and (b) digital versus embedded FFE | 90<br>91 | | 4.3 | Simulated voltage margin versus ADC resolution with both digital and embedded implementations of a 2-tap FFE + 1-tap DFE equalization structure for channels 1-3 in Fig. 4.2 | 92 | | 4.4 | Impact of including embedded DFE and FFE equalization on (a) voltage margin and (b) timing margin for channels 1-3 in Fig. 4.2, with tap coefficients shown for the embedded equalization. (c) Impact of including embedded DFE and FFE equalization on voltage margin and timing margin in the presence of a front-end CTLE for channel 4 in Fig. 4.2 | 93 | | 4.5 | Conceptual schematic of a unit SAR ADC with the proposed sampled 2-tap embedded FFE and redundant cycle 1-tap embedded DFE | 95 | | 4.6 | Simplified unit SAR ADC with embedded 2-tap FFE and 1-tap DFE: (a) single-ended schematic, and operation during the (b) sampling phase, (c) first MSB evaluation, and (d) second MSB evaluation assuming $B_1B_2B_3B_4B_5 = 10001$ for the FFE | 97 | | 4.7 | Block diagram of the 64-way time-interleaved SAR ADC with embedded FFE and DFE | 99 | | 4.8 | Fully differential schematic of the unit ADC with sampled 2-tap embedded FFE and redundant cycle 1-tap embedded DFE | 100 | | 4.9 | (a) Custom layout of the capacitive DAC with $0.45fF$ MOM unit capacitors. (b) CDAC worst-case 01111 to 11111 transition DNL simulation results using 1000 Monte Carlo iterations | 101 | | 4.10 | Simplified diagram of the foreground offset and clock skew calibrations setup. | 103 | | 4.11 | Temperature dependency of residual unit ADC offset calibrated at $27^{\circ}C$ room temperature. | 104 | | 4.12 | Simplified metastability detection and correction block diagram and algorithm | 105 | |------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | 4.13 | Front-end T/H schematic with dummy OFF switches for high-frequency input feed-through cancellation | 106 | | 4.14 | Front-end T/Hs sampling clocks generation, distribution, and calibration network | 107 | | 4.15 | Prototype ADC chip micrograph and core ADC floorplan | 108 | | 4.16 | Custom test board for the prototype 10GS/s ADC implemented in a GP 65nm CMOS process | 109 | | 4.17 | ADC SNDR and SFDR vs. input frequency at $f_s=10~\mathrm{GHz}.$ | 110 | | 4.18 | 10-GS/s ADC normalized output spectrum for $f_{in} = 2.4994$ GHz using a 16k-point FFT: (a) before calibration, (b) after only offset calibration, and (c) after offset and clock skew calibration | 11 | | 4.19 | DNL/INL plots with $f_{in}=9.746$ MHz at $f_s=10$ GHz | 11: | | 4.20 | Measured tap coefficient range and resolution using DC input voltages for embedded (a) FFE 2nd tap, and (b) 1-tap DFE | 113 | | 4.21 | Embedded equalization characterization test setup | 114 | | 4.22 | Measured digitized 6b ADC output (a) without equalization, (b) with only 1-tap embedded DFE, (c) with only 2-tap embedded FFE, and (d) with both embedded FFE and DFE, for a 10-Gb/s $2^{10}-1$ PRBS input over a 10-inch FR4 channel | 11 | | 4.23 | Measured bathtub curves without and with embedded equalization for a $10\text{-Gb/s}\ 2^{10}-1$ PRBS input over (a) 6-inch FR4, (b) $10\text{-inch}\ FR4$ , and (c) $15\text{-inch}\ FR4$ channels, with channel frequency responses shown in Fig. $4.2(a)$ | 110 | | 4.24 | 10 GS/s ADC power breakdown | 11' | | 4.25 | Simplified unit SAR ADC with limited ISI cancellation range for the embedded FFE equalization due to undesired attenuation at the comparator input for the equalization tap coefficients relative to the main cursor. | 12 | | 4.26 | Simplified unit SAR ADC with embedded 3-tap FFE: (a) single-ended schematic, and operation during the (b) sampling phase, and (c) first MSB evaluation assuming $B_{1,-1}B_{2,-1}B_{3,-1}B_{4,-1}B_{5,-1} = 00010$ for the pre-cursor tap, and $B_{1,1}B_{2,1}B_{3,1}B_{4,1}B_{5,1} = 01001$ for the post-cursor tap | 123 | |------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | 4.27 | Block diagram of the 32-way time-interleaved asynchronous SAR ADC with embedded 3-tap FFE | 124 | | 4.28 | Fully differential schematic of the unit asynchronous SAR ADC with sampled 3-tap embedded FFE | 126 | | 4.29 | Custom layout of the differential capacitive DAC with $1fF$ MOM unit capacitors and 4-bit embedded gain calibration | 127 | | 4.30 | Embedded gain calibration range and resolution for each capacitive DAC | 128 | | 4.31 | Prototype ADC chip micrograph | 128 | | 4.32 | Custom test boards for the prototype 10GS/s ADC implemented in a GP 65nm CMOS process. Two separate boards are designed: bias board and high-frequency board connected with ribbon cables for transferring the bias signals, supply voltages, and scan chain control bits | 129 | | 4.33 | ADC SNDR and SFDR vs. input frequency at $f_s=10$ GHz | 130 | | 4.34 | Measured tap coefficient range and resolution using DC input voltages for embedded (a) FFE pre-cursor tap, and (b) FFE post-cursor tap | 131 | | 4.35 | Embedded equalization characterization test setup | 132 | | 4.36 | (a) FR4 channels under study, and (b) measured bathtub curves with embedded 3-tap FFE for a 10-Gb/s $2^{23}-1$ PRBS input over the three FR4 channels | 133 | | 4.37 | (a) Receiver voltage margin BER bathtub curves with low- and high-<br>loss channels, and (b) simplified block diagram of the proposed hybrid<br>ADC-based receiver. | 135 | | 4.38 | (a) FR4 channels frequency response. (b) Received BER bathtub curves after the front-end ADC using only the embedded 3-tap FFE. Receiver BER bathtub curves with only embedded equalization and combined embedded plus digital equalization for (c) a 35" FR4 channel, and (d) a 40" FR4 channel | 137 | |------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | 4.39 | (a) Hybrid ADC-based receiver digital equalizer power savings vs. channel attenuation (BER $< 10^{-10}$ ), and (b) receiver power breakdown | 138 | | 5.1 | ADC performance comparison against previous general purpose ADCs with 10+GS/s sampling rate | 140 | | 5.2 | Simplified block diagrams for (a) hybrid ADC-based RX with dynamically enabled digital equalizer, and (b) hybrid RX with dynamically enabled front-end ADC and digital equalizer | 145 | # LIST OF TABLES | ΓABLΙ | $\Xi$ | I | Page | |-------|-----------------------------------------------------------|---|------| | 1.1 | Applications with $\geq$ 10Gb/s Data Rate | | 3 | | 3.1 | 16-Way 1.6GS/s 6-Bit ADC Performance Comparison | | 86 | | 4.1 | 64-Way 10GS/s 6-Bit ADC Performance Comparison | | 119 | | 4.2 | Proposed 10GS/s 6-Bit ADCs Performance Comparison | | 134 | | 4.3 | Proposed 10Gb/s ADC-Based Receiver Performance Comparison | | 139 | #### 1. INTRODUCTION With the advance of CMOS technology many applications are formed for wireline communications, and every year new applications are emerging, while standards supporting higher data transmission rates are being proposed for the existing applications. Most high-speed links serialize the parallel data for off-chip transmission due to the limited number of input/output (I/O) pads/pins and density constraints [1]. Examples of serial I/O links exist for interfacing processors to processors such as Intel QPI (6.4Gb/s) and AMD Hypertransport (6.4Gb/s), processors to peripherals such as PCIe (2.5, 5, 8Gb/s) and USB3 (4.8Gb/s), processors to memory such as RDRAM (1.6Gb/s) and XDR DRAM (7.2Gb/s), interfacing to storage units such as SATA (6Gb/s) and Fibre Channel (20Gb/s), and different networking standards such as Ethernet (1, 10Gb/s) for local area network (LAN), and SONET (2.5, 10, 40Gb/s) for wide area network (WAN). As the data rates of wireline communication links increases, channel impairments such as skin effect, dielectric loss, fiber dispersion, reflections and cross-talk become more pronounced. This warrants more interest in analog-to-digital converter (ADC)-based serial link receivers (Fig. 1.1), as they allow for more complex and flexible back-end digital signal processing (DSP) relative to binary or mixed-signal receivers [2–5]. Utilizing this back-end DSP allows for complex digital equalization and more bandwidth-efficient modulation schemes, while also displaying reduced process/voltage/temperature (PVT) sensitivity. Furthermore, these architectures offer straightforward design translation and can directly leverage the area and power scaling offered by new CMOS technology nodes. One key issue with ADC-based receivers is the significant power consumption of Figure 1.1: A high-speed electrical link system with an ADC-based receiver. both the front-end ADC and the subsequent digital equalization and symbol detection at high data rates. Previous works, such as [5], [6], and [7], present techniques to reduce the front-end ADC power by using optimal positioning of threshold voltages, configurable resolution based on the channel characteristics, and mixed-mode pre-equalization. Embedding analog equalization in the ADC is another promising approach to both reduce ADC resolution and digital equalization complexity [8], allowing for improvements in overall receiver power consumption with low-overhead implementations of the common feed-forward equalizer (FFE) and decision-feedback equalizer (DFE) topologies used in wireline receivers [9–12]. This research targets the design of efficient ADC-based receivers with 10Gb/s data rate; however, the ideas proposed in this work can be extended to higher data rates, and they are compatible with (and may even benefit from) CMOS technology scaling. Some of the available current and future application standards with data rates around 10Gb/s and above are listed in Table 1.1. Table 1.1: Applications with $\geq$ 10Gb/s Data Rate | Technology | Application | Data Rate (Gb/s) | |----------------------------------|--------------------------|------------------| | OC-192 | Wide Area Network (WAN) | 9.953 | | OC-256 | Wide Area Network (WAN) | 13.271 | | OC-768 | Wide Area Network (WAN) | 39.813 | | OC-1536 | Wide Area Network (WAN) | 79.626 | | OC-3072 | Wide Area Network (WAN) | 159.252 | | 10 Gigabit Ethernet (10GBASE-X) | Local Area Network (LAN) | 10 | | Infiniband FDR-10 1x | Local Area Network (LAN) | 10.31 | | Infiniband FDR 1x | Local Area Network (LAN) | 13.64 | | Infiniband EDR 1x | Local Area Network (LAN) | 25 | | UPA | Computer Bus | 15.36 | | PCI Express (PCIe) 4.0 (x1 link) | Computer Bus | 16 | | Fibre Channel 16GFC | Storage | 12 | | Fibre Channel 16GFC | Storage | 12 | | Serial Attached SCSI (SAS) 3 | Storage | 12 | | SATA Express 3.2 | Storage | 16 | | Serial Attached SCSI (SAS) 4 | Storage | 24 | | USB 3.1 | Peripheral | 10 | | Thunderbolt | Peripheral | 10 x2 | | Thunderbolt 2 | Peripheral | 20 | ### 1.1 Dissertation Organization The challenges in the design of time-interleaved data converters are covered in Chapter 2. Main high-speed ADC architectures are briefly introduced and successive approximation register (SAR) topology, which is the architecture used in the rest of this work is explained in more details. Also, a brief discussion of high-speed links and receiver equalization techniques implemented in this work, namely feed-forward equalization (FFE) and decision feedback equalization (DFE), are given as a background to the rest of this dissertation. The remainder of this work focuses on the analysis, design and implementation of different techniques to efficiently embed partial equalization inside the front-end high-speed ADC, and hence, improve the efficiency of the full ADC-based receiver. Embedded multi-level DFE, which can be treated as embedded quantized infinite impulse response (IIR) equalization, has also been previously proposed for pipeline ADCs [13]. DFE is a very powerful equalization technique, as it can selectively reduce post-cursor ISI without amplifying noise or cross-talk. However, one important issue in any DFE implementation involves the critical feedback timing path from the decision comparator to the summation circuit that subtracts the post-cursor ISI. Loop unrolling can be employed to resolve this issue, where speculative comparison with a redundant comparator is used [14]. This approach, however, can incur significant hardware overhead [13]. Chapter 3 presents a time-interleaved (TI) SAR ADC architecture with a novel low-overhead 1-tap embedded DFE [15]. Statistical bit error rate (BER) simulation results are discussed, showing performance advantages with embedded DFE, and comparing it against embedded IIR equalization, for different FR4 channels. The novel embedded DFE technique, called redundant cycle DFE, which introduces an additional cycle in the time-interleaved SAR ADC in order to perform the DFE loop-unrolling with minimal hardware overhead, is proposed in this chapter. Experimental results of a 6-bit 1.6GS/s ADC prototype with the proposed embedded 1-tap DFE, fabricated in a low power (LP) 90nm CMOS technology, verifies the effectiveness of the embedded DFE. Feed-forward equalizers are effective in canceling a large amount of inter-symbol interference (ISI) with a relatively small number of taps. A 2-tap version of this equalizer topology has been implemented in a time-interleaved (TI) flash ADC with additional CML input stages that follow the input track-and-holds (T/H) to realize the extra FFE tap [5]. While this approach is effective, significant linearity, speed, and power consumption trade-offs exist with this current-mode approach. FFEs have also been embedded in successive approximation register (SAR) ADCs [16], [17], with charge-sharing in a capacitive digital-to-analog converter (CDAC) performing the signal scaling and summation of multiple input samples, followed by ADC conversion. However, a drawback of this single-CDAC approach is that the main cursor signal is attenuated such that the FFE tap sum is always fixed, similar to transmitter de-emphasis equalization [12]. Chapter 4 presents two 10GS/s 6-bit ADC solutions in 65nm CMOS that efficiently incorporate novel embedded equalization schemes. The first prototype is a 6-bit 10GS/s ADC with embedded 2-tap FFE and 1-tap DFE. The second prototype utilized in a full 10Gb/s receiver, includes a 3-tap embedded FFE, one pre-cursor and one post-cursor taps, with $\sim 100\%$ main cursor amplitude range of operation for pre-cursor and post-cursor FFE tap coefficients. The statistical simulations of ADC-based receivers are carried out that quantify the performance advantages of these embedded equalization structures. The proposed embedded equalization techniques, which allow for flexibility in equalizer tap weighting at minimal hardware and power overhead, are analyzed in the same chapter, and experimental results from general purpose (GP) 65nm CMOS prototypes verify the effectiveness of the proposed embedded equalization structures. Finally, in Chapter 5 the performances of the 10GS/s proposed ADCs are compared against the state-of-the-art general-purpose ADCs with similar resolution and data rates, and concluding remarks are drawn. At last, some recommendations are presented for curious researchers to follow up this work in the future. #### 2. BACKGROUND ON HIGH-SPEED ADC-BASED RECEIVERS This chapter explains briefly the details of two main building blocks in a wireline ADC-based receiver; namely, front-end baud-rate ADC and receiver equalization. First section discusses main building blocks and ADC architecture candidates in high-speed time-interleaved (TI) ADCs. Second section provides an introduction to high-speed link receivers. The main target for this chapter is to prepare the reader for the remainder of this dissertation. Fig. 2.1 shows the block diagram of a generic time-interleaved ADC with N parallel sub-ADCs, where each sub-ADC has a front-end track-and-hold (T/H). In this system, the sample rate of the full ADC is N times the sample rate of each sub-ADC [18]. This enables sampling rates higher than what is limited by the technology. In practice, however, non-idealities arising from differences among the interleaved channels can degrade the full ADC performance compared to the sub-ADCs. Jitter is another important source of performance degradation in high-speed ADCs, which has nothing to do with the time-interleaving and can affect the performance of any converter, since it is an inevitable result of noise in electronic circuits. As it will be discussed later in this chapter, jitter impacts the ADC output signal-to-noise ratio, especially at high input frequencies, which is a problem in most Nyquist-rate time-interleaved ADCs targeting very high sampling rates. #### 2.1 Time-Interleaving Challenges The time-interleaved ADC performance is sensitive to any mismatch among the parallel converter channels, namely, offset, gain, and phase mismatches. Any of these mismatches can cause harmonic distortion, which degrades the ADC performance, and should be calibrated to the desired resolution level. The following sections discuss Figure 2.1: Simplified block diagram of an N-way time-interleaved ADC. each mismatch in detail. #### 2.1.1 Offset Mismatch Offset mismatches among the parallel sub-ADCs introduce a periodic additive pattern to the output of the full ADC. For simplicity, we consider two sub-ADCs in the calculations here as shown in Fig. 2.2; however, the analysis can be extended to more number of parallel channels in general. Assuming a single-tone input, $cos(\omega t + \phi)$ , the outputs of the two sub-ADCs only considering the offset voltages are [19], [20] $$ADC1: y[n] = cos(\omega nT + \phi) + V_{os1} \qquad n = even,$$ (2.1) $$ADC2: y[n] = cos(\omega nT + \phi) + V_{os2} \qquad n = odd,$$ (2.2) where T is the sampling period of the overall ADC. The quantization noise is ignored for simplicity. Combining the two sub-ADC outputs, the overall ADC output can be expressed as $$y[n] = cos(\omega nT + \phi) + V_{os} + (-1)^n \frac{\Delta V_{os}}{2},$$ (2.3) Figure 2.2: Two-way time-interleaved ADC with offset mismatch. where $V_{os} = (V_{os1} + V_{os2})/2$ and $\Delta V_{os} = V_{os1} - V_{os2}$ . Also, $(-1)^n$ can be rearranged as $(-1)^n = cos(\omega_S nT/2)$ , where $\omega_S = 2\pi/T$ is the sampling frequency. Hence, $$y[n] = \cos(\omega nT + \phi) + V_{os} + \frac{\Delta V_{os}}{2} \cos\left(\frac{\omega_S nT}{2}\right). \tag{2.4}$$ As shown by the second and third terms in Eq. 2.4, the offset mismatch results in two error terms for a 2-way time-interleaved ADC; a DC term and a single tone at half the sampling frequency in the overall ADC output. Another important observation is that these error terms are independent of the input amplitude and frequency. The two-way TI ADC is simulated in MATLAB assuming a 6-bit resolution for each sub-ADC. The simulated output spectrum only considering the impact of quantization noise and offset mismatch is shown in Fig. 2.3 for two cases with different input amplitudes and frequencies but similar offset errors. As expected the undesired tones due to offset mismatch are independent of the input amplitude and frequency. A more general analysis shows that for an N-way time-interleaved ADC, the offset mismatch among the parallel channels results in distortion tones inside the ADC Figure 2.3: Simulated output spectrum of a two-way time-interleaved ADC with offset mismatch for two different sets of input frequencies and amplitudes. Nyquist bandwidth of the ADC output spectrum at frequencies $(k/N)\omega_S$ , where k = 0, 1, 2, ..., N/2 [21]. The DC offset term can easily be removed. Regarding the other undesired spectral terms, only the matching of the offset voltage among all unit ADCs is critical. This means, the offset voltages in all unit ADCs do not need to be removed. One ADC can be picked and the offset in all other unit ADCs should be matched to the offset in the reference converter. #### 2.1.2 Gain Mismatch Gain mismatch among the time-interleaved unit ADCs can also degrade the overall ADC performance. Similar to the offset mismatch, if only two parallel unit ADCs with gains $G_1$ and $G_2$ and no other error are considered for simplicity as shown in Fig. 2.4, the unit ADCs outputs for a single-tone sinewave input are $$ADC1: y[n] = G_1 cos(\omega nT + \phi) \qquad n = even,$$ (2.5) Figure 2.4: Two-way time-interleaved ADC with gain mismatch. $$ADC2: y[n] = G_2 cos(\omega nT + \phi) \qquad n = odd.$$ (2.6) By combining the two equations, the overall ADC output is $$y[n] = \left[G + (-1)^n \frac{\Delta G}{2}\right] \cos(\omega nT + \phi), \tag{2.7}$$ where $G = (G_1 + G_2)/2$ and $\Delta G = G_1 - G_2$ . By applying $(-1)^n = \cos(\omega_S nT/2)$ in the previous equation, the ADC output terms can be rearranged as $$y[n] = \left[G + \frac{\Delta G}{2}\cos\left(\frac{\omega_S nT}{2}\right)\right]\cos(\omega nT + \phi)$$ $$= G\cos(\omega nT + \phi) + \frac{\Delta G}{2}\cos\left(\frac{\omega_S nT}{2}\right)\cos(\omega nT + \phi).$$ (2.8) By applying trigonometric identities while keeping only the terms inside the Nyquist band of the overall ADC, Eq. 2.8 is simplified to $$y[n] = G\cos(\omega nT + \phi) + \frac{\Delta G}{2}\cos\left[\left(\omega - \frac{\omega_S}{2}\right)nT + \phi\right]. \tag{2.9}$$ Figure 2.5: Simulated output spectrum of a two-way time-interleaved ADC with gain mismatch for two different sets of input frequencies and amplitudes. The second term in the above equation shows the undesired tone due to gain mismatch. This term depends on the input frequency but is independent of the input amplitude. The MATLAB behavioral model of a 6-bit time-interleaved ADC is used to show the output spectrum in the presence of gain mismatch for two cases with different input frequencies and amplitudes as shown in Fig. 2.5. In the general case of N time-interleaved ADC structure with gain mismatch errors, the undesired distortion tones inside the Nyquist band appear at $\pm \omega + (k/N)\omega_S$ , where k = 1, 2, ..., N/2 [21]. ## 2.1.3 Phase Mismatch Phase mismatch, also known as clock skew, is another challenging issue in the design of time-interleaved ADCs. If the analog input signal is sampled at exactly multiples of overall ADC sampling period $T = 1/f_S$ in parallel unit ADCs, there is no phase mismatch. However, any deviation from the ideal sampling instants due to phase mismatch among the parallel unit ADCs results non-uniform sampling [22]. Let's consider the simplified two-way time-interleaved ADC again, this time with Figure 2.6: Two-way time-interleaved ADC with phase mismatch. only phase mismatch, as shown in Fig. 2.6. To model the phase mismatch, it is considered that ADC1 sampling instant is the reference, and ADC2 sampling instants are deviated from the ideal multiples of T by dt. The outputs of the two sub-ADCs can be expressed as $$ADC1: y[n] = cos(\omega nT + \phi) \qquad n = even,$$ (2.10) $$ADC2: \ y[n] = cos(\omega(nT+dt)+\phi) \quad \ n = odd, \eqno(2.11)$$ where the quantization error is ignored for simplicity. Combining the unit ADC outputs, the overall ADC output is $$y[n] = \cos\left[\omega\left(nT + \frac{dt}{2} - (-1)^n \frac{dt}{2}\right) + \phi\right]. \tag{2.12}$$ Using $(-1)^n = \cos(\omega_S nT/2)$ and the trigonometric identity $\cos(A-B) = \cos(A)$ cos(B) + sin(A)sin(B), Eq. 2.12 can be simplified as $$y[n] = \cos\left[\omega\left(nT + \frac{dt}{2}\right) + \phi\right] \cos\left(\frac{\omega dt}{2}\right) + \sin\left[\omega\left(nT + \frac{dt}{2}\right) + \phi\right] \cos\left(\frac{\omega_S nT}{2}\right) \sin\left(\frac{\omega dt}{2}\right). \tag{2.13}$$ Using the trigonometric identity $sin(A)cos(\omega_S nT/2) = sin[A - (\omega_S nT/2)]$ , the previous expression can be modified as $$y[n] = \cos\left(\frac{\omega dt}{2}\right) \cos\left[\omega\left(nT + \frac{dt}{2}\right) + \phi\right] + \sin\left(\frac{\omega dt}{2}\right) \sin\left[\omega\left(nT + \frac{dt}{2}\right) - \frac{\omega_S nT}{2} + \phi\right], \tag{2.14}$$ which after rearrangement can be expressed as $$y[n] = \cos\left(\frac{\omega dt}{2}\right) \cos\left[\omega\left(nT + \frac{dt}{2}\right) + \phi\right] + \sin\left(\frac{\omega dt}{2}\right) \sin\left[\left(\omega - \frac{\omega_S}{2}\right)nT + \omega\frac{dt}{2} + \phi\right].$$ (2.15) The first term represents the desired input with small amplitude modulation due to phase mismatch. The second term, however, represents the undesired tone at $(\omega - \omega_S/2)$ frequency due to phase mismatch. Interestingly, the unwanted tone is exactly at the same frequency that the tone due to gain mismatch would appear but with 90° phase shift. Also, note that the phase mismatch error depends on both input frequency and amplitude. Assuming that the sampling instant deviation dt is much smaller than the sampling period T, $cos(\omega dt/2) \approx 1$ and $sin(\omega dt/2) \approx \omega dt/2$ . Hence, Eq. 2.15 can be simplified to a more intuitive form as $$y[n] \approx \cos\left[\omega\left(nT + \frac{dt}{2}\right) + \phi\right] + \left(\frac{\omega dt}{2}\right)\sin\left[\left(\omega - \frac{\omega_S}{2}\right)nT + \omega\frac{dt}{2} + \phi\right].$$ (2.16) Figure 2.7: Simulated output spectrum of a two-way time-interleaved ADC with phase mismatch for two different sets of input frequencies and amplitudes. The output spectrum of a 6-bit 10GS/s two-way time-interleaved ADC in the presence of phase mismatch for two cases with different input frequencies and amplitudes is shown in Fig. 2.7, which verifies the previous analysis. ### 2.1.4 Phase Random Jitter Another important challenge in high-speed data converters is the impact of random jitter in front-end sampling clock before quantization, which introduces aperture uncertainty at the sampling instants. At high input frequencies the effect of jitter exacerbates, which can limit the achievable SNR of a high data rate ADC in return. Hence, the maximum jitter specifications at maximum input frequency should be clearly calculated in order to derive the design requirements of the clock generator. It can be proven that the A/D converter's SNR in the presence of sampling clock jitter for a generic input is calculated as [23] $$SNR = 10\log_{10}\left(\frac{R_x(0)}{-R_x''(0) \cdot R_{tj}(0)}\right) dB, \qquad (2.17)$$ where $R_x$ and $R_{tj}$ are the autocorrelations of input and timing jitter, respectively. This equation can be simplified for two common cases: a sinusoidal input, and a random signal input, which is more applicable to this research. For the case of a sinusoidal input, assuming $x(t) = A\sin(\omega t)$ , the input autocorrelation function is $$R_x(t) = \frac{A^2}{2}\cos(\omega t). \tag{2.18}$$ By substituting this in 2.17, the well-known equation for SNR as a function of aperture jitter on the sampling instant of a sinusoidal signal can be achieved $$SNR = 20\log_{10}\left(\frac{1}{\omega\sigma_{tj}}\right) dB. \tag{2.19}$$ ## 2.1.4.2 Random Signal Input with Rectangular Spectrum For the case of a random signal input with rectangular power spectrum, $S_x(f) = rect(f/2f_B)$ , where $f_B$ is the signal bandwidth, the input autocorrelation function can be derived as $$R_x(t) = 2f_B \cdot \frac{\sin(\omega_B t)}{\omega_B t} \,. \tag{2.20}$$ By substituting this in 2.17, the SNR as a function of aperture jitter on the sampling instant of a random signal can be achieved $$SNR = 20 \log_{10} \left( \frac{\sqrt{3}}{\omega_B \sigma_{tj}} \right) dB.$$ (2.21) Comparing 2.19 and 2.21, it shows that the sampling time jitter is about 1.7 times relaxed for applications with random-type signals with rectangular power spectrum compared to applications with sinusoidal inputs and similar maximum input frequen- cies. This is specifically important in high data rate ADC-based wireline receivers, similar to this research, that one of the main design challenges is to meet the jitter requirements at full Nyquist bandwidth. ### 2.2 High-Speed Track-And-Holds Most analog to digital converters have a front-end sampler. In high-speed timeinterleaved structures, a front-end sampler can relax the timing accuracy requirements in the following stages. Basically, a track-and-hold $(T/H)^1$ consists of a switch and a load capacitor as shown in Fig. 2.8(a). However, in practice this structure can be used only for low-to-medium speed and/or low-resolution applications. The main issue with this simple structure is the kick-back from output to input. Besides, achieving a high input bandwidth becomes challenging for large load capacitance, especially in new CMOS technologies where the on-resistance of $SW_1$ can be as large as hundreds of Ohms, and it changes as a function of input signal. This can result in non-linearity issues. Therefore, usually a closed-loop or open-loop active T/H topology is used to isolate input/output terminals and achieve higher linearity, as shown in Fig. 2.8(b) and (c). In closed-loop T/H configuration, shown in Fig. 2.8(c), the sampling switch $SW_1$ is located inside the feedback loop. So, this switch experiences a voltage swing much smaller than input and output swings; hence, the nonlinearity of sampling switch is reduced in contrast to open-loop topologies. The main limitation of closed-loop T/H circuits is speed considerations [24]. In tracking mode, circuit operates as a two- <sup>&</sup>lt;sup>1</sup>Also sometimes referred as sample-and-hold (S/H) in the literature. However, in practice usually this structure tracks the input voltage during one operation phase and holds it during the next phase. Hence, track-and-hold seems a more suitable term and is used throughout this dissertation. Although beyond the scope of this research, there are other circuits that actually perform as a sample-and-hold. Figure 2.8: (a) Simple T/H, b) practical open-loop T/H, and (c) a conventional implementation of closed-loop T/H. stage opamp with $C_H$ as a Miller capacitance. Another drawback of this structure is signal path from $V_{in}$ to $V_{out}$ through input capacitance of $A_1$ opamp. This path introduces hold-mode high-frequency feed-through that affects the overall linearity. In summary, this structure is suitable for high-accuracy applications, however, low-to-medium speeds [25]. In contrast to closed-loop structures, an open-loop topology, as shown in Fig. 2.8(b), can potentially achieve the highest possible speed in a given technology. Besides, by using a good buffer the kickback problem and input-to-output feed-through issues related to simple structure of Fig. 2.8(a) can be alleviated. Consider the basic T/H circuit shown in Fig. 2.8(a). In order to achieve a signal-to-noise ratio (SNR) >40dB (= 6.35 effective number of bits) using this simple circuit, the maximum input-referred noise and the minimum load capacitance (for $V_{in,pp} = 1V$ ) can be calculated as $$SNR = \frac{v_{in,pp}^2/8}{v_{nrms,in}^2} = \frac{v_{in,pp}^2/8}{kT/C_L} > 40 \, dB \,,$$ (2.22) which results in $v_{nrms,in} < 3.5 mV_{rms}$ , and $C_L > 0.33 fF$ . It can be concluded that the T/H stage performance is not limited by the sampling noise for the target applications of this research. For BW > 5GHz and $C_L = 200fF$ , maximum switch on-resistance can be found $$BW = \frac{1}{2\pi R_{on}C_L} > 5 GHz \quad \Rightarrow \quad R_{on} < 159 \,\Omega.$$ (2.23) While this value for switch on-resistance may seem trivial, it should be noted that as shown later for large input swings and low supply voltages this constraint proves to become stringent and even impossible for a simple NMOS or PMOS switch. Based on these results, it is obvious that the linearity of the T/H with the mentioned required specifications is very important. In this section, different blocks in a high-speed open-loop T/H are analyzed briefly. The simplest CMOS buffer can be realized by a source-follower (SF) stage. Since most today technologies are N-well processes, our discussions are focused on using the PMOS source-follower buffer in order to remove the non-desirable MOS body effects by connecting the transistor body terminal to its source. Fig. 2.9 shows two basic implementations of a single-ended source-follower based buffer with approximately unity gain. At first look, an ideal current source with high output impedance, for example a cascode current source, may seem a better implementation for having a larger output impedance and more constant current. However, it is not the optimum choice for this particular application. The linearity of source-follower buffer depends on the linearity of $M_1$ transconductance $g_{m1}$ , which can be approximated by the following equation using a square-law MOS device behavior. $$g_m \approx \frac{2I_D}{(V_{GS} - V_{th})} \,. \tag{2.24}$$ Figure 2.9: Source-follower buffer using (a) an ideal current source, and (b) a simple PMOS current source. Since these buffers are working in open-loop configuration, the input transistor sees large voltage swing, and hence $V_{GS}$ can change, while $I_D$ is almost constant for a cascode current source. This results in a variable $g_m$ depending on the input voltage, and hence output voltage distortion [26], [27]. However, for the simple buffer in Fig. 2.9(b), as $V_{GS}$ varies, $I_D$ changes in the same direction which results in a more constant input transconductance; therefore better linearity. Besides, Fig. 2.9(b) usually has a larger output voltage swing compared to Fig. 2.9(a). Based on the previous discussion, the basic pseudo-differential open-loop T/H structure is shown in Fig. 2.10. The main advantages of this structure are its simplicity, and large output swing compared to higher stacked buffers. Note that the dummy NMOS transistors in series with input NMOS switches, where the source terminal is shorted to the drain, are for clock feed-through and charge-injection cancellations. Figure 2.10: Pseudo-differential source follower based T/H stage using simple NMOS switches. ## 2.2.2.1 Switch Design As mentioned before, for high input voltage swing and/or low supply voltage, achieving a linear switch may be challenging. Fig. 2.11 shows three basic switch topologies: single NMOS, single PMOS, and transmission gate (TG), also known as CMOS switch. The on-resistance $R_{on}$ of each topology as a function of input voltage is shown in Fig. 2.12. As expected, NMOS switch works better at switching of small input voltages, while PMOS switch is suitable for large input voltage switching. The range of $R_{on}$ variation is approximately $100\Omega - 100M\Omega$ . The on-resistance of a TG switch on the other hand is always equal to the parallel combination of the two NMOS and PMOS switches, which is less than a few kilo Ohms for the whole input voltage range. Therefore, this topology is extensively used in high-speed applications. In the past two decades, many other modified switch topologies have been proposed. Most of them can be categorized into two basic families: (1) Clock-boosting switch, and (2) bootstrapped switch. A common approach for achieving improved linearity switch is by boosting the clock amplitude, as shown in Fig. 2.13(a) [28,29]. This technique is generally known as "clock boosting". This technique is fast and usually implemented by boosting the nominal clock amplitude through a charge- Figure 2.11: Simple switch architectures: (a) Single NMOS or PMOS switch, and (b) complementary MOS switch also known as transmission gate. pump stage. However, it introduces some reliability issues, since for small input voltages close to zero, gate-source voltage of NMOS switch $M_{NSW}$ can become larger than $V_{DD}$ , which in return can cause breakdown of the switch transistor. Another common switch modification is bootstrapped technique shown in Fig. 2.13(b) for a well-known implementation [30]. In this structure, when the clock signal CLK is low, rail-to-rail supply voltage is placed over $C_{os}$ capacitor. When CLK goes high, $C_{os}$ is placed between the gate and source terminals of the main switch transistor $M_{NSW}$ . This way, ideally the gate-source voltage of $M_{NSW}$ is always equal to the supply voltage $V_{DD}$ independent of the input voltage, which results in a small and constant $R_{on}$ for the whole range of operation. Therefore, it performs very linearly. However, the main tradeoff is the large area required for $C_{os}$ . In this section, many buffer topologies suitable for high-speed applications are reviewed. All these architectures are originated from the simple source-follower stage. The simplest source-follower (SF) buffer, as discussed earlier, is shown in Fig. 2.9(b). Figure 2.12: On-resistance of NMOS, PMOS and transmission-gate switches versus input voltage amplitude (Wn = $10\mu m$ , Wp = $20\mu m$ , with minimum length L = 100nm, and $V_{DD}=1V$ in 90nm CMOS technology). This structure has a large output swing. The main issue is that the output impedance is approximately $1/g_{m1}$ in this topology. This means that a very large current is required to achieve bandwidth in giga-Hertz range, especially for driving a large load capacitance. Therefore, in the following sections some modifications of this basic structure is analyzed. The bandwidth of a SF-based buffer is dependent on the time constant of the dominant pole at its output node as $BW \approx 1/(2\pi R_{out}C_L)$ , where $R_{out} \approx 1/g_{m1}$ . Hence, for achieving a large bandwidth output resistance and/or output capacitance should be decreased. In a conventional SF-based buffer with a fixed load capacitance, this can be accomplished only by increasing the current consumption. Recently, negative impedance converter (NIC) topologies have been used to cancel part of the load capacitance using a negative capacitance, and therefore increase the bandwidth and/or power efficiency [31,32]. A basic NIC structure is shown in Fig. 2.14(a) used Figure 2.13: Modified switch topologies: (a) Clock-boosting switch, and (b) bootstrapped switch. at the output node of a SF-based buffer [32]. The output impedance of NIC circuit can be calculated as $$Z_{out,NIC} = -\frac{1}{sC_C} \cdot \frac{g_{m7,8} + s(C_{gs7,8} + 2C_C)}{g_{m7,8}}, \ s = j\omega \ll 2\pi f_T$$ (2.25) Although Fig. 2.14(a) topology has been used previously in many different applications, it introduces some reliability issues due to the positive feedback loop. Therefore, oscillation may occur due to process/temperature variations. A more robust design for negative-impedance implementation in a fully-differential structure is shown in Fig. 2.14(b) [33]. In this new structure, a replica source-follower stage is used in order to remove the undesirable feedback from differential outputs to each other. In other words, the capacitance cancellation is performed using feed-forward paths. Although the NIC-based structures work pretty well at low frequencies, unfortunately their performance improvement fades away at high frequencies. As mentioned earlier, for a single-pole buffer $BW \approx 1/(2\pi R_{out}C_L)$ . This means Figure 2.14: Pseudo-differential source-follower based buffer using negative capacitance (a) in a positive feedback configuration, and (b) in a feed-forward configuration. the buffer bandwidth can be increased either by decreasing the load capacitance, as performed in the NIC structures, or by decreasing the output resistance. Flipped-voltage follower (FVF) technique delivers a smaller output resistance by the order of $g_m r_o$ compared to a conventional source-follower stage [34], [35]. The differential version of conventional FVF-based buffer is shown in Fig. 2.15(a). The topology of Fig. 2.15(a), however, has a very limited input voltage swing range, as shown below $$V_{DD} - |V_{GS3}| - |V_{th1}| < V_{in} < |V_{th1}| + |V_{th3}| - |V_{GS1}|.$$ (2.26) A modified version of this structure called folded FVF is shown in Fig. 2.15(b). In this circuit the input swing range has increased to $$V_{GS7} - |V_{th1}| < V_{in} < V_{DD} - |V_{OV3}| - |V_{GS1}|. (2.27)$$ This range is large enough for most applications. In order to use single NMOS switches before buffer in the T/H stage, the minimum possible input CM voltage should be used. However, Fig. 2.15(b) topology input voltage should be larger than $\sim V_{OV7}$ (for equal NMOS and PMOS threshold voltages). A modified folded FVF structure is shown in Fig. 2.15(c). This new architecture improves the lower bound of input voltage swing as $$V_{OV5} - |V_{th1}| < V_{in} < V_{DD} - |V_{OV3}| - |V_{GS1}|. (2.28)$$ However, note that since a common-gate folded branch is used, a sign inversion is required for negative feedback. Thanks to symmetric differential structure, this sign-inversion can be utilized by cross-coupling the source terminals of $M_7 - M_8$ in Fig. 2.15(c). The output impedance and voltage gain of this structure can be calculated as follows. $$R_{out} = \frac{g_{o1} + g_{o2}}{(q_{m1} + q_{o1}).(q_{m7} + q_{mb7} + q_{o7} + q_{o5}) + (q_{o7} + q_{o3}).(q_{o5} + q_{o1})}, \qquad (2.29)$$ Figure 2.15: Differential flipped-voltage follower based buffer architectures: (a) Conventional low-swing FVF, (b) folded FVF case 1, and (c) folded FVF case 2. | Device | Size | |----------------|------------| | $M_{1,2}$ | 30µm/60nm | | $M_{3,4}$ | 30µm/60nm | | $M_{5,6}$ | 72µm/300nm | | C <sub>F</sub> | 60fF | | $R_{B}$ | 85kΩ | Figure 2.16: Pseudo differential flipped-voltage follower based buffer with feedback capacitors. and $$A_V = \frac{V_{out}}{V_{in}} = \frac{g_{m1}.(g_{o5} + g_{m7} + g_{mb7} + g_{o7})}{(g_{m1} + g_{o1}).(g_{m7} + g_{mb7} + g_{o7} + g_{o5}) + (g_{o7} + g_{o3}).(g_{o5} + g_{o1})}.$$ (2.30) Assuming $g_o \ll g_m$ , (2.30) can be simplified to $A_V \approx g_{m1}/(g_{m1} + g_{o1})$ . Therefore, this topology can ideally achieve a lower output impedance and an improved voltage gain closer to unity. The input voltage range of the conventional FVF based buffer in Fig. 2.15(a) can also be expanded as shown in Fig. 2.16 by introducing a feedback capacitor $C_F$ . The DC bias of the top PMOS transistors is set by $V_{bp}$ through large bias resistors. The linearity performance of a T/H with bootstrapped sampling switches and this FVF-based buffer in 65nm CMOS is shown in Fig. 2.17. The T/H output total harmonic distortion (THD) remains better than -40dB for more than 5GHz input bandwidth and $750mV_{pp}$ output voltage swing. Figure 2.17: Linearity performance of T/H with FVF-based buffer and $750mV_{pp}$ output swing. # 2.3 High-Speed Sub-ADC Architectures There are many different ADC architectures available ranging from integrating and discrete-time (DT) sigma-delta ADCs for low bandwidth very high-resolution applications, to successive approximation register (SAR) and cyclic ADCs for medium bandwidth and medium resolution, to flash and pipelined ADCs for low-to-medium resolution and high bandwidth applications [36]. Traditionally, for high-speed applications, which is the focus of this research, pipelined and flash topologies have been the top choices. However on one hand, flash ADC's hardware complexity grows exponentially with its resolution, which makes it unattractive for many new emerging applications. On the other hand, the advance of CMOS technology has made the design of analog amplifiers and buffers required in a traditional pipelined ADC more challenging. Consequently, these issues have forced ADC designers to come up with advance techniques and/or hybrid architectures to overcome the shortcomings in pipelined and flash architectures. Another direction for the past decade has been to invest in time-interleaving more efficient but lower speed topologies such as a successive approximation ADC, that scale well with CMOS technologies, and calibrate the issues arising from mismatches among the parallel unit ADCs in digital domain, which also benefits from CMOS technology scaling. ### 2.3.1 ADC Architecture Selection The cost of an ADC architecture can be translated into its power and area consumptions. In most cases comparing the energy of two systems, which is power divided by sampling frequency, can give the designer a better insight. In order to roughly compare the energy of flash and SAR ADC architectures, we can develop their simplified energy consumption models as follows. A more comprehensive intuitive model has been presented in [37]. ## 2.3.1.1 Flash Energy Model An N-bit flash ADC [36] is basically composed of $2^N - 1$ comparators (neglecting the over-range detection comparators), a reference resistor ladder, and a thermometer-to-binary encoder to convert the $2^N - 1$ bits thermometer code at the output of comparators to N bits binary output as shown in Fig. 2.18. The reference resistor ladder and thermometer-to-binary encoder energies scale roughly as $2^N$ , but are usually less than the total comparator energy, and therefore their contribution in the total flash ADC energy is neglected here. The comparator is usually composed of a linear pre-amplifier and a regenerative (dynamic) latch. Neglecting the pre-amplifier for simplicity, the energy per conversion for the comparator can be calculated as $$E_{latch} = C_{latch} V_{DD}^2. (2.31)$$ Figure 2.18: Basic structure of a flash ADC. Hence, the energy per conversion for a flash ADC only considering the comparators is [37] $$E_{Flash} = (2^N - 1).[C_{latch} V_{DD}^2].$$ (2.32) 2.3.1.2 SAR Energy Model Time-interleaving of SAR ADC should be used to achieve the same data rate as the flash ADC. For simplicity, we can assume that the speed and structure of comparators used in flash and SAR ADCs are the same. Hence, a SAR ADC need N+1 periods of comparator clock to sample the input and successively approximate the digital output in N following cycles. This means that a time-interleaving factor of N+1 should be used in an N-bit SAR ADC to achieve the same sampling rate of its N-bit flash ADC counterpart. The basic SAR ADC consists of a comparator, a digital-to-analog converter (DAC), and a SAR control logic as shown in Fig. 2.19. The energy per conversion of comparators can be calculated similar to a flash ADC. Figure 2.19: Basic structure of a SAR ADC with binary-weighted capacitive DAC. The capacitive DAC is a set of N binary scaled capacitors and an extra unit capacitor. The conventional switching method for a SAR ADC with capacitive DAC is mentioned in many references such as [36]. During the bit-cycling, some amount of charge proportional to the size of the capacitive DAC and the full-scale input voltage is switched onto the array. Assuming that this charge is supplied by a linear regulator or buffer connected to the analog supply voltage $V_{DDA}$ , the total array energy per conversion is $$E_{DAC} = 2\eta \left( 2^{N} C_{u} V_{DDA} V_{FS} \right), \tag{2.33}$$ where $C_u$ is the DAC's unit capacitor. The total energy consumption is input-signal dependent, which can be modeled using a coefficient $\eta$ in the above equation. Setting $\eta = 0.7$ is a reasonable approximation [37]. The factor of 2 in Eq. 2.33 arises from the differential structure. The unit capacitor $C_u$ is chosen to meet the required linearity specification of the ADC. The expected worst-case linearity error of DAC occurs at the MSB transition, with a ratio error of $$\frac{\Delta C}{C} = \frac{1}{\sqrt{2^{N-1}}} \cdot \frac{\Delta C_u}{C_u} \,, \tag{2.34}$$ where $\Delta C_u$ represents the standard deviation of the unit capacitor due to mismatch and process variation. In order to maintain this error below the level of the least significant bit (LSB), $\Delta C_u/C_u$ is proportional to $1/2^{N/2}$ . Noting that $\Delta C_u/C_u \approx$ $aC_u^{-\zeta}$ , where $\zeta$ equals 3/4 or 1/2 if the capacitance mismatch is dominated by edge effects or oxide variation, respectively [38]. Hence the total array energy for one conversion is $$E_{DAC} = 2\eta \, 2^{(1+1/2\zeta)N} \, \frac{C_u'}{2^{N'/2\zeta}} \left( V_{DDA} V_{FS} \right), \tag{2.35}$$ where $C'_u$ is the process-dependent capacitance required for matching to the N'-bit level, and is assumed to be 5fF in the following simulations. The control logic in a SAR ADC is based on a shift register of width N and consumes energy that grows approximately linearly with N. For a given logic style that does not draw static current, e.g., CMOS logic, the total energy consumed by the switching of the control logic over one conversion is $$E_{logic} \approx NC_{SW,eq}V_{DD}^2$$ , (2.36) where $C_{SW,eq}$ is the total switched capacitance in SAR logic normalized to the 1-bit level. Note that in reality, the total energy is expected to grow faster than N. The SAR digital logic directly drives the switches in the capacitive DAC. These switches must increase with the resolution to ensure sufficient settling time of the larger capacitive array. For the sake of simplicity we have ignored this effect here. Summing the energy consumption of different blocks, the total energy per sample conversion of SAR ADC can be calculated as shown below [37] $$E_{SAR} = 2\eta \, 2^{(1+1/2\zeta)N} \, \frac{C_u'}{2^{N'/2\zeta}} \left( V_{DDA} V_{FS} \right) + \left( N+1 \right) \left( C_{latch} + C_{SW,eq} \right) V_{DD}^2 \,. \tag{2.37}$$ Equation 2.37 is plotted versus resolution in Fig. 2.20(a), where two obvious regions are clearly seen. At low resolution, the digital energy in dynamic comparators and SAR logic dominate, and the energy grows linearly with resolution N. At higher resolutions, the growing size and matching requirements of the capacitor array dominate, and the energy grows as $2^{(1+1/2\zeta)N}$ . However, the model is not very reliable at high resolutions, since in this model the effects of noise and other non-idealities are neglected. ## 2.3.1.3 Energy Comparison The total energy consumption of flash and SAR ADCs are compared versus resolution range 1-bit to 7-bits in Fig. 2.20(b) based on the previous analysis. The process-dependent values are set based on a low-power 90nm CMOS technology. It is shown that at low resolutions, a flash ADC presents lower energy compared to a SAR ADC. However, as the resolution increases the number of comparators in a flash ADC increases exponentially, while it increases linearly for a SAR ADC. This makes it inevitable that at some point, the energy efficiency of SAR structure dominates over flash architecture. Based on these simulations, this point relies somewhere between 4-bits to 5-bits. The results of a similar modeling presented in [37] for a $0.18\mu m$ CMOS technology agrees well with this conclusion. At 5 bits resolution, still the energy difference between the two architectures is small and careful choice of SAR logic and capacitor values in the DAC should be devised in order to make sure that SAR achieves a better efficiency. However, as the resolution increases, the superior energy consumption of SAR architecture over flash Figure 2.20: (a) SAR energy versus resolution, along with the individual components contribution. (b) Energy comparison between SAR and flash ADCs as a function of resolution. ADC becomes more apparent. Based on this study, SAR ADC seems a better choice for 6-bit resolution and above, which is the target range for most wireline receiver applications. ## 2.3.2 Successive Approximation ADC Although the flash architecture has been the traditional choice for high-speed A/D converters, time-interleaving or parallelization and consequently advancements in the calibration procedures required for resolving the issues related to time-interleaved structures, have led to utilization of more efficient ADC architectures such as SAR in the multi-GHz bandwidth realm. As shown earlier, beyond 4 bits resolution SAR ADC can achieve a superior energy efficiency over a traditional flash architecture. Different factors should be considered while designing a SAR ADC. Main building blocks of a SAR ADC and their important design characteristics are briefly discussed in the following subsections. Figure 2.21: Schematic of a CML based CMOS comparator. ## 2.3.2.1 Dynamic Comparators Comparators<sup>2</sup> can be divided into two categories: static and dynamic. As suggested by its name, a static comparator consumes static or DC power. Current mode logic (CML) based structure, shown in Fig. 2.21, is one of the most common topologies of static comparators in wireline communications. Although this structure is very fast and suitable for high data rate applications, its main drawback is high power consumption, which makes it not suitable for applications and technologies that lower power topologies without static power are also feasible. Dynamic circuits as opposed to static circuits are another category for comparators, which do not consume any DC power, and their power scales almost linearly with the frequency of operation. Usually dynamic comparators are more energy efficient than static comparators unless they are used at a relatively high data rate compared <sup>&</sup>lt;sup>2</sup>A comparator is sometimes called a "sense amplifier" or a "slicer" based on the application. Figure 2.22: Schematic of a StrongArm dynamic comparator. (a) Basic schematic, and (b) schematic with extra devices to discharge internal nodes during reset phase for reduced memory effects. to the maximum speed of a technology. As the CMOS technology scales and the transition frequency $f_T$ , as a metric for the maximum theoretical speed of the design technology, improves, dynamic comparators become more popular in current high-speed link receivers due to their superior energy efficiency. One of the well-known architectures for a CMOS dynamic comparator is called StrongArm topology [39], shown in Fig. 2.22 for two common variations with and without internal node reset devices. In the past decade some modifications of the traditional dynamic comparator are proposed to achieve improved performance. The double-tail topology proposed in [40] by Schinkel, shown in Fig. 2.23(a), improves the delay and kick-back response compared to the traditional StrongArm architecture by employing a dynamic (charge-steering) first stage amplifier. The two-stage structure in [41], shown in Fig. 2.23(b), proposed by Goll provides improved response compared to StrongArm topology at lower power supply levels. Besides, during the reset phase, CLK is low, output Figure 2.23: Schematic of (a) the double-tail dynamic comparator proposed by Schinkel, and (b) the two-stage modified dynamic comparator proposed by Goll. nodes are connected to $V_{DD}$ while $M_4$ devices' $V_{GS}$ is equal to $V_{DD}$ . This makes the regeneration loop through $M_4$ and $M_5$ work faster as soon as the reset phase is over, since $M_4$ transistors already start in the active region with large gate-source voltage. ## Comparator Noise The thermal noise of a regenerative comparator can be measured using a timedomain simulation by including transient thermal noise in the transistor models. Fig. 2.24 shows the time-domain simulation methodology [42]. DC voltage source, $V_{cm} + V_{in}$ , is applied to the differential input of the clocked comparator, where $V_{cm}$ is the nominal input common-mode voltage and $V_{in}$ is the differential input offset. The output of the comparator is then sampled and the average of all the output 1's and 0's is calculated over a specific period of time (the larger number of cycles the better resolution). $V_{in}$ is then swept across a range of voltages, around the inputreferred offset, to generate a noise cumulative distribution function (CDF). A fit of this CDF assuming Normal distribution yields the input referred noise standard deviation for the comparator by subtracting the differential input resulting in CDF = 50% (translates to the input-referred offset value, which is nominally zero) from the differential input resulting in CDF = 84.134% (translates to the value with one sigma deviation from the input-referred offset). The noise of stages following the regenerative comparator is negligible due to its high gain as a result of positive feedback. ## Comparator Metastability Metastability is one of the normally undesired important phenomena in any circuit with digital output levels. It is specifically important in wireline receiver applications due to very low bit error rate (BER) requirements on the order of lower than $10^{-12}$ or $10^{-15}$ for different standards<sup>3</sup>. Assuming a comparator with positive feedback load has a regeneration time constant of $\tau_{reg}$ , the probability of a metastable state at the comparator output can be calculated as $$P_{e,MET} = \frac{2V_{out,min}}{A_{lin}V_{LSB}} e^{\left(\frac{-T}{\tau_{reg}}\right)}, \qquad (2.38)$$ where $V_{LSB}$ is the quantization step at the comparator input, $V_{out,min}$ is the minimum comparator output voltage which results in a valid logic level, $A_{lin}$ is the comparator unlatched gain, and T is the maximum time allocated for the comparator decision. Assuming $V_{out,min} = 30V_{LSB}$ , $A_{lin} = 4$ , and the maximum allocated time for the comparator to make a decision is half a bit cycle using a regular synchronous <sup>&</sup>lt;sup>3</sup>One main reason for such low BER requirements in wireline standards compared to wireless standards is the lack of error correction coding in such applications for relaxing the link complexity and energy efficiency improvement. Figure 2.24: Input-referred noise (a) transient simulation setup, and (b) CDF for a designed Goll two-stage comparator in 65nm CMOS technology. 100MS/s 6-bit SAR ADC, $T \approx 10ns/14 = 714ps$ , regeneration time constant can be calculated as $\tau_{reg} < 23.5ps$ for achieving a metastability error probability below $10^{-12}$ . This criterion becomes even more stringent at higher conversion rates. Unfortunately, sizing a comparator based on this criterion usually degrades the power efficiency of the time-interleaved ADC, and consequently degrades the whole receiver efficiency. Different metastability detection and correction techniques [43,44] can be employed to relax the sizing requirements, and allow the comparator to be designed based on only the speed requirements in a high-speed system. ## 2.3.2.2 Digital-to-Analog Converter In a SAR ADC, the digital-to-analog converter (DAC) in the feedback estimates the sampled input after every comparison. Usually a capacitive DAC (CDAC) is used to save power compared to current-based DACs, and achieve good linearity in new CMOS technologies compared to resistive DACs. The conventional capacitive DAC switching scheme proposed in [45] is not energy efficient. Many other modified switching schemes have been proposed for capacitive DACs during the past two decades in order to improve the energy efficiency of SAR ADCs; e.g., split-capacitor scheme [46], energy-saving scheme [47], and set-and-down scheme [48]. However, these modified switching schemes achieve improved energy savings at the cost of increased switching complexity, DAC output common-mode variation, and matching requirements. A merged capacitor switching (MCS) scheme is proposed in [49], which reduces the switching energy by 94% (more than any other switching scheme previously reported) and decreases the area by 50% compared to the conventional CDAC scheme. Moreover, MCS keeps the common-mode voltage constant in all bit cycle phases of successive approximation<sup>4</sup>. Fig. 2.25 shows the detailed operation of a 3-bit SAR ADC with MSC scheme<sup>5</sup>. #### 2.3.2.3 Capacitive DAC Linearity In order to calculate the linearity performance of a binary-weighted capacitive DAC, each capacitor in the DAC is considered as the sum of the nominal capacitance and an error term as $$C_n = 2^{n-1}C_u + \epsilon_n \quad , \ n = 1, 2, ..., N \,,$$ (2.39) <sup>&</sup>lt;sup>4</sup>A switching scheme with variable common-mode voltage in different bit cycles makes the comparator offset calibration more complicated and/or requires a preamplifier stage, since usually the comparator offset varies with the input common-mode voltage. <sup>&</sup>lt;sup>5</sup>A single-ended case is shown for simplicity. Figure 2.25: The simplified operation of a capacitive DAC with merged capacitor switching scheme in a 3-bit SAR ADC: (a) sampling phase, (b) first bit cycle, (c) second bit cycle, and (d) third bit cycle. where N is the number of bits, $C_u$ is the DAC unit capacitor value, and $\epsilon_n$ is the error term for capacitor $C_n$ , which are independent from each other and each one has a zero mean and a Gaussian distribution. Hence, the error terms have the variance $$E[\epsilon_n^2] = 2^{n-1}\epsilon_0^2$$ , $n = 1, 2, ..., N$ , (2.40) where $\epsilon_0$ is the standard deviation of the unit capacitor. The analog output of a conventional N-bit binary-weighted CDAC can be calculated as [50] $$V_{DAC,out}(x) = \frac{\sum_{n=1}^{N} (2^{n-1}C_0 + \epsilon_n) S_n}{2^N C_0 + \sum_{n=1}^{N} \epsilon_n} V_{REF}, \qquad (2.41)$$ where $V_{REF}$ is the reference voltage of the DAC equal to the full-scale voltage of the ADC, and $S_n$ equals 0 or 1 representing the SAR ADC decision for bit n, i.e., DAC digital input is $D_{DAC,in} = x = \sum_{n=1}^{N} 2^{n-1} S_n$ . Assuming that the term $\sum_{n=1}^{N} \epsilon_n$ in the denominator is negligible compared to $2^N C_0$ , the error term in the DAC output can be found as $$V_{DAC,err}(x) \approx \frac{\sum_{n=1}^{N} \epsilon_n S_n}{2^N C_0} V_{REF}. \qquad (2.42)$$ Hence, the variance of the error is $$E\left[V_{DAC,err}^{2}(x)\right] = \frac{\sum_{n=1}^{N} 2^{n-1} \epsilon_{0}^{2} S_{n}}{2^{2N} C_{0}^{2}} V_{REF}^{2} = \frac{x}{2^{2N}} \frac{\epsilon_{0}^{2}}{C_{0}^{2}} V_{REF}^{2}.$$ (2.43) Since only the total error in the DAC output voltage matters here and all errors in the capacitor values are considered independent identically distributed (i.i.d.), only the number of unit capacitors connected to $V_{REF}$ is important. Based on the previous analysis the differential nonlinearity (DNL) of the capacitive DAC can be calculated for each DAC input code by subtracting the previous code error from the current code error as $$DNL(x) = \Delta V_{DAC,err}(x) = V_{DAC,err}(x) - V_{DAC,err}(x-1), \qquad (2.44)$$ where x is the DAC input code. The worst DNL in a binary weighted capacitor array occurs at the first MSB transition, where all DAC input bits except the MSB are transitioning from 1 to 0, and the MSB transitions from 0 to 1. The variance for worst case DNL can be written as $$E\left[\Delta V_{DAC,err}^{2}(2^{N-1})\right] = E\left[\left(\frac{\epsilon_{N} - \sum_{n=1}^{N-1} \epsilon_{n}}{2^{N}C_{0}} V_{REF}\right)^{2}\right] \approx \frac{\epsilon_{0}^{2}}{2^{N}C_{0}^{2}} V_{REF}^{2}.$$ (2.45) Therefore, for achieving $DNL_{rms,max}$ less than 0.5 LSB $$DNL_{rms,max} \approx \left(\frac{\epsilon_0}{C_0}\right) \frac{V_{REF}}{2^{N/2}} < \left(\frac{1}{2}\right) \frac{V_{REF}}{2^N} \Rightarrow \frac{\epsilon_0}{C_0} < \frac{1}{2^{N/2+1}}.$$ (2.46) This means for a 6-bit ADC, as long as unit capacitor matching satisfies $\epsilon_0/C_0 < 1/16 = 6.25\%$ , the DNL of the capacitive DAC is not going to limit the ADC performance. This limit specifies the minimum capacitor area based on the linearity characteristic. This result is verified with Monte Carlo simulations using a behavioral model for the 6-bit SAR ADC with a binary weighted DAC and $\epsilon_0/C_0 = 5\%$ unit capacitor mismatch. The SAR control logic is the central part of a SAR ADC which controls the operation. Actually, the whole architecture is named based on this block which uses shift registers and latches to control the next move in each bit cycle. Fig. 2.26 shows a common implementation of this part using digital blocks [51]. Although this Figure 2.26: A common implementation of the SAR control logic in a 6-bit ADC. block is entirely digital, using custom design instead of standard cells in a CMOS technology can save a considerable percentage of total power consumption in a SAR ADC [37]. #### 2.4 High-Speed Link Receivers Electrical inter-chip communication bandwidth is limited by high-frequency loss of electrical traces, reflections caused from impedance discontinuities, and adjacent signal crosstalk, as shown in Fig. 2.27 for an example backplane channel. The relative magnitudes of these channel characteristics depend on the length and quality of the electrical channel, which is a function of the application. Common applications range from processor-to-memory interconnections, which typically have short (<10-inch) top-level microstrip traces with relatively uniform loss slopes to server/router and multiprocessor systems, which employ either long (~30-inch) multilayer backplanes or (~10 m) cables, which can both possess large impedance discontinuities and loss. PCB traces suffer from high-frequency attenuation caused by the dielectric loss and wire skin effect. Dielectric loss describes the process where energy is absorbed from Figure 2.27: Example of a backplane system cross-section. the signal trace and transferred into heat due to the rotation of the board's dielectric atoms in an alternating electric field. This results in the dielectric loss term increasing proportional to the signal frequency [52]. The skin effect, which describes the process of high-frequency signal current crowding near the conductor surface, impacts the resistive loss term as frequency increases. This results in a resistive loss term that is proportional to the square-root of frequency [53]. Fig. 2.28(a) shows how these frequency-dependent loss terms result in low-pass channels where the attenuation increases with distance. The high-frequency content of pulses sent across these channels is filtered, resulting in an attenuated received pulse with energy that has been dispersed over several bit periods, as shown in Fig. 2.28(a) for three example channels with different profiles. When transmitting data across the channel, energy from individual bits will now interfere with adjacent bits and make them more difficult to detect. This undesired phenomenon is called intersymbol interference (ISI). The ISI increases with channel loss and can completely close the received data eye diagram, as shown in Fig. 2.28(b). While the eye is fairly open for the short desktop channel, and a slicer (comparator) with threshold level at Figure 2.28: (a) Frequency response and pulse response of three channels. (b) Eye diagrams after channels without equalization. zero can detect the received '0' and '1' signals reliably, the eye is completely closed for longer backplane (BP) channels, which causes errors in the detected signal. Signal interference also results from reflections caused by impedance discontinuities. If a signal propagating across a transmission line experiences a change in impedance $Z_r$ relative to the line's characteristic impedance $Z_0$ , a percentage of that signal equal to [53] $$\frac{V_r}{V_i} = \frac{Z_r - Z_0}{Z_r + Z_0} \tag{2.47}$$ will reflect back to the transmitter, where $V_i$ is the incident voltage amplitude, and $V_r$ is the reflected voltage amplitude. This results in an attenuated or, in the case of multiple reflections, a time-delayed version of the signal arriving at the receiver. The most common sources of impedance discontinuities are from on-chip termination mismatches and via stubs that stem from signaling over multiple PCB layers. The frequency response of the 17" legacy backplane channel shown in Fig. 2.28(a) shows that the capacitive discontinuity formed by the thick backplane via stubs can cause severe nulls in the channel frequency response. Another form of interference comes from crosstalk, which occurs due to both capacitive and inductive coupling between neighboring signal lines. As a signal propagates across the channel, it experiences the most crosstalk in the backplane connectors and chip packages where the signal spacing is smallest compared to the distance to a shield (Fig. 2.27). Crosstalk is classified either as near-end crosstalk (NEXT), where energy from an aggressor (transmitter) couples and is reflected back to the victim (receiver) on the same chip, or far-end crosstalk (FEXT), where the aggressor energy couples and propagates along the channel to a victim on another chip. NEXT is commonly the most detrimental crosstalk, as energy from a strong transmitter (on the order of $\sim 1V_{pp}$ ) can couple onto a received signal at the same chip, which has been attenuated ( $\sim 20 mV_{pp}$ ) from propagating on the lossy channel. Crosstalk, though beyond the scope of this research, is potentially a major limiter to high-speed electrical link scaling, since in common backplane channels the crosstalk energy can actually exceed the through channel signal energy at frequencies near 5GHz, and in practice crosstalk cancellation circuitry should be used to alleviate this issue. Fig. 2.29 shows pulse responses and eye diagrams of the same channels in Fig. 2.28 after pre-cursor and post-cursor ISI terms are reduced using equalization. As expected, the eye diagram for the 7" channel shows improved opening, while the previously closed eye for the 17" refined channel is now open. However, for the 17" Figure 2.29: (a) Frequency response and equalized pulse response of three channels under study. (b) Eye diagrams after channels with equalization. legacy channel with a deep null at 4GHz, the eye stays closed even after this simple equalization. For such complex profiles, more complicated equalization schemes are required to warrant a reasonable performance. ADC-based receivers can provide much more complex and flexible equalizations in the digital domain compared to their mixed-mode receiver counterparts. #### 2.4.1 Receiver Equalization Techniques In order to extend a given channel's maximum data rate, many communication systems use equalization techniques to cancel inter-symbol interference caused by channel distortion. Equalizers are implemented either as linear filters (both discrete and continuous-time) that attempt to flatten the channel frequency response, or as nonlinear filters that directly cancel ISI based on the received data sequence. Depending on system data rate requirements relative to channel bandwidth and the severity of potential noise sources, different combinations of transmitter and/or receiver equalization are employed. Transmit equalization, implemented with a finite impulse response (FIR) filter, is the most common technique used in high-speed links. This TX "pre-emphasis" (or more accurately "de-emphasis") filter attempts to invert the channel distortion that a data bit experiences by pre-distorting or shaping the pulse over several bit times. While this filtering could also be implemented at the receiver, the main advantage of implementing the equalization at the transmitter is that it is generally easier to build high-speed digital-to-analog converters versus receive-side analog-to-digital converters. However, because the transmitter is limited in the amount of peak power that it can send across the channel due to driver voltage headroom constraints, the net result is that the low-frequency signal content has been attenuated down to the high-frequency level. Figure 2.30 shows a block diagram of receiver-side FIR equalization, also called feed-forward equalization (FFE). A common problem faced by linear receiver-side equalization is that high-frequency noise content and crosstalk are amplified along with the incoming signal. Also the implementation of the analog delay elements is challenging at high data rates, which are often implemented through pure analog delay stages with large area passives or by using time-interleaved sample-and-hold stages, also called sampled FFE. Nonetheless, one of the major advantage of receiver equalization is that the filter tap coefficients can be adaptively tuned to the specific channel, which is not possible with transmitter equalization unless a "back-channel" is employed for this purpose. Another type of a very common receiver-side equalizer Figure 2.30: Block diagram of a receiver feed-forward equalizer. is a high-pass filter (HPF) at the receiver front-end, usually referred as continuoustime linear equalizer (CTLE), which is beyond the scope of this research. In general, a CTLE can be implemented either as a fully passive high-pass filter stage or more commonly by embedding the HPF inside an active amplifier. The other equalization topology commonly implemented in high-speed links is receiver decision feedback equalizer (DFE). A DFE directly subtracts ISI from the incoming signal by feeding back the resolved digital data using a slicer to control the polarity of the equalization taps as shown in Fig. 2.31. Unlike linear receiver equalizers, a DFE does not amplify the input signal noise or crosstalk since it uses the quantized input values. However, there is the potential for error propagation in a DFE if the noise is large enough for a quantized output to be wrong. Also, due to the feedback structure, a DFE cannot cancel precursor ISI<sup>6</sup>, which is the reason a DFE structure is almost always combined with some sort of linear equalization. The major challenge in DFE implementation is closing timing on the first-tap <sup>&</sup>lt;sup>6</sup>Otherwise this would result in a non-causal filter, which is not practical. Figure 2.31: Block diagram of a receiver decision feedback equalizer with direct feedback taps. feedback since this must be done in one bit period or unit interval (UI) as shown in Fig. 2.32(a) for a 1-tap DFE system. Direct feedback implementations require this critical timing path to be highly optimized. Loop-unrolling architecture (also known as speculation) relaxes the critical delay path [14,54] as shown in Fig. 2.32(b) for a 1-tap DFE. In this technique, two decisions are made for the only two possible 1-tap DFE coefficients by employing two parallel summers and slicers, and the correct decision is selected using the previous symbol decision by a 2:1 multiplexer (MUX). This way the critical delay path is fairly relaxed by roughly replacing the summer plus slicer delays with the MUX delay. # 2.4.2 Modulation Schemes In High-Speed Link Applications Recently, modulation techniques that provide spectral efficiencies higher than simple binary signaling<sup>7</sup> have also been implemented in order to increase data rates over band-limited wireline channels. Multilevel pulse amplitude modulation (PAM), <sup>&</sup>lt;sup>7</sup>Also known as non-return to zero (NRZ) or PAM-2 signaling. Figure 2.32: Simplified block diagram of a 1-tap DFE using (a) direct feedback implementation, and (b) loop-unrolled technique to relax critical delay path. Figure 2.33: Common pulse amplitude modulation schemes in serial links: simple PAM-2 (1 bit/symbol) and PAM-4 (2 bits/symbol). most commonly PAM-4, is a popular modulation scheme that has been implemented both in academia and in industry. Shown in Fig. 2.33, PAM-4 modulation consists of two bits per symbol, which allows transmission of an equivalent amount of data in half the channel bandwidth. However, due to the transmitter's peak-power limit, the voltage margin between symbols is $3 \times (= 9.5 \text{ dB})$ lower with PAM-4 versus simple binary PAM-2 signaling. Thus, a general rule of thumb exists that if the channel loss at the PAM-2 Nyquist frequency is greater than $\sim 10 \text{ dB}$ relative to the previous octave, then PAM-4 can potentially offer a higher signal-to-noise ratio (SNR) at the receiver. However, this rule can be somewhat optimistic due to the differing ISI and jitter distribution present with PAM-4 signaling. Also, PAM-2 signaling with a nonlinear DFE at the receiver further bridges the performance gap due to the DFE's ability to cancel the dominant first post-cursor ISI without the inherent signal attenuation associated with transmitter equalization. ## 3. 6-BIT 1.6-GS/S ADC WITH EMBEDDED REDUNDANT CYCLE DFE\* ADC-based serial link receivers are being proposed in order to enable operation at high data rates over high-loss channels [2], [55], [56]. In Fig. 3.1, a block diagram of an ADC-based high-speed link receiver is shown which employs an ADC as the receiver front-end followed by a digital signal processing (DSP) block. The use of an ADC-based receiver enables signal equalization to be performed in the digital domain, gaining advantages of area and power scaling with improved CMOS technology. This allows for the efficient implementation of complex equalization and the ability to support bandwidth-efficient modulation schemes, such as PAM4 and duobinary [57]. Despite these advantages, ADC-based receivers are generally more complex and consume higher power than binary receivers. ADC resolutions in the range of 4 to 6 bits are typically used, with flash or successive-approximation register (SAR) architectures as the dominant choices. For many systems where link power efficiency is the key metric, multi-GS/s ADC implementations [56], [29], [58] often display prohibitive power. The digital equalization that follows the ADC can also consume significant power as well, comparable to the power of the ADC. Embedding partial analog equalization in the front-end ADC allows for both a lower ADC resolution and reduced digital equalization complexity at a target bit-error rate (BER) [8], which could translate into an overall lower-power ADC-based receiver implementation. Previously, finite-impulse response (FIR) and infinite-impulse response (IIR) filtering has been embedded in the capacitive DAC of a SAR ADC, at the cost of increased DAC <sup>\*© 2013</sup> IEEE. Part of this chapter is reprinted, with permission, from E. Zhian Tabasy, A. Shafik, S. Huang, N.-W. Yang, S. Hoyos, and S. Palermo, "A 6-b 1.6-GS/s ADC with redundant cycle one-tap embedded DFE in 90-nm CMOS," IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1885–1897, Aug. 2013. Figure 3.1: A high-speed link with an ADC-based receiver. complexity and reduced ADC conversion rate [17]. Embedded multi-level decision-feedback equalization (DFE), which can be treated as embedded quantized infinite impulse response (IIR) equalization, has also been previously proposed for pipeline ADCs [13]. DFE is a very powerful equalization technique, as it can selectively reduce postcursor ISI without amplifying noise or cross-talk. However, one important issue in any DFE implementation involves the critical feedback timing path from the decision comparator to the summation circuit that subtracts the post-cursor ISI. Loop unrolling can be employed to resolve this issue, where speculative comparison with a redundant comparator is used [14]. This approach, however, can incur significant hardware overhead [13]. This paper presents a time-interleaved (TI) SAR ADC architecture with a novel low-overhead 1-tap embedded DFE [15]. In Section 3.1, statistical BER simulation results are discussed, showing performance advantages with embedded DFE, and comparing it against embedded IIR equalization, for three FR4 channels with differing loss profiles. The novel embedded DFE technique, which introduces an additional cycle in the time-interleaved SAR ADC in order to perform the DFE loop-unrolling with minimal hardware overhead, is proposed in Section 3.2. Section 3.3 details the ADC architecture and the main circuit blocks. Experimental results of the ADC with embedded 1-tap DFE, fabricated in an LP 90nm CMOS technology, are shown in Section 3.4. Finally, Section 3.5 concludes this chapter. #### 3.1 Embedded Feedback Equalization Modeling In this section, the performance impact of embedding two types of feedback equalization, DFE and IIR, inside the ADC is analyzed. Utilizing a statistical simulation model, the embedded equalization approaches are compared for different operating conditions such as channel profile, transmitter equalization, and ADC resolution. Fig. 3.2(a) shows a block diagram comparing post-ADC digital DFE and an ADC with an embedded DFE tap. In both cases, the output MSB, which is considered the decision in a conventional 1-tap DFE with binary signaling is fed back, weighted by the DFE coefficient, and subtracted. The advantage of ADC embedded equalization is that unlike digital equalization, where the resolution is limited by the ADC, embedded equalization applies the equalization taps to the un-quantized analog input, allowing for both a lower ADC resolution and reduced digital equalization complexity at a target bit-error rate (BER) [8]. Similarly, Fig. 3.2(b) compares between digital and embedded IIR equalization realizations. In either case, the full ADC output word is scaled by the equalization coefficient and subtracted from the input, where the subtraction is performed with the analog input for the case of embedded equalization and with the quantized input in the case of digital IIR. The embedded IIR offers a potential advantage over embedded DFE, in that the IIR can be optimized to cancel multiple ISI terms, rather than a single post-cursor for the DFE case. However, while an analog value can still be used for the full-scale value, the embedded IIR suffers from the ADC quantization in the feedback, which implies a minimum ADC resolution is necessary to avoid the quantization noise propagating in the feedback system. Figure 3.2: Block diagrams of (a) digital vs. embedded DFE, and (b) digital vs. embedded IIR. High-speed link simulation tools often use statistical modeling approaches to predict performance metrics such as BER without the need for lengthy bit-by-bit transient simulations [59], [60]. This work uses such a statistical framework for ADC-based receivers [8] in order to model the effect of embedded equalization on system performance, with 1.6Gb/s operation assumed over the three FR4 channels shown in Fig. 3.3(a). While the first two channels display a similar 11dB channel loss at the 0.8GHz Nyquist frequency, the first channel has a smooth attenuation profile, in contrast to the second channel, which has a frequency notch near 2GHz. In the time domain 1.6Gb/s pulse response, shown in Fig. 3.3(b), this translates to a reduced main cursor to first post-cursor ratio for the second channel and also some noticeable reflections near the fifth and sixth post-cursors. The third channel has a higher attenuation of about 14dB at Nyquist frequency. This again is reflected in the time domain pulse response, where the main cursor for the third channel is almost half that for the other two channels. The presented results assume 1Vppd transmit swing, 2.5mVrms receiver input-referred thermal noise and 10mV uniform supply noise, and receiver sampling jitter with a 0.02 unit interval (UI) deterministic component (DJ) in the form of duty cycle distortion and a 0.02 UIrms random component (RJ). The impact of including one tap of embedded DFE for each of the channels is shown in Fig. 3.3(c), quantified in terms of receiver voltage margin at 1.6Gb/s and a BER;10-12 for a given number of TX-FIR equalization taps. Without any TX equalization (1 tap), the embedded DFE offers significant performance improvements in all three channels, with the voltage margin in channel 1 and 2 improving by 100mV and 115mV, respectively, and the higher-loss channel 3 displaying a 50mV margin from a previously closed eye. While the loss of channel 1 and 2 are similar, a higher percentage improvement with embedded DFE for the notch-shaped channel 2 is observed due to the cancellation of the first-post cursor that is a higher percentage of the main cursor value. The embedded DFE allows the optimization of the TX FIR taps to ignore the first post-cursor ISI term, which translates into more flexibility in FIR tap weighting to match a specific channel profile with additional taps. In order to have a fair comparison, the values of the TX-FIR taps are optimized separately with and without embedded DFE. Continued margin improvement is observed when TX equalization is introduced, with the embedded DFE offering a relatively constant additional 45 to 50mV for channel 1 and 2 from 2 to 4 TX FIR taps, while for channel 3 this margin increases from 20 to 30mV. Note that for these channels the voltage margin roughly plateaus when TX equalization is introduced due to the majority of the residual ISI being cancelled and the 1Vpp TX peak swing constraint. These three channels are also utilized to compare the performance of embed- Figure 3.3: (a) Magnitude and (b) 1.6Gb/s pulse responses of three FR4 channels. (c) Impact of including one tap of embedded DFE equalization for different levels of TX-FIR equalization, and (d) impact of ADC resolution with embedded DFE and embedded IIR equalization with no TX FIR equalization over three FR4 channels. ded IIR with embedded DFE. Fig. 3.3(d) shows the achievable 1.6Gb/s voltage margin as the ADC resolution is varied, assuming no transmit equalization. While the performance of the embedded DFE is independent of the ADC resolution, the embedded IIR equalization requires at least 4 to 5 bits of resolution to approach the performance of the embedded DFE equalization for all three channels. As the hardware overhead of embedded IIR increases with ADC resolution, due to all the output bits being used for ISI cancellation, these results suggest that for the typical high-speed link ADC resolutions embedded DFE offers potential performance and efficiency advantages. ### 3.2 Redundant-Cycle 1-Tap Embedded DFE While DFE is a very powerful equalization technique, as it can selectively reduce post-cursor ISI without amplifying noise or cross-talk, the feedback structure introduces some challenges in the implementation of this technique in high data rate systems. This section reviews a common loop-unrolling approach to improve the DFE speed and proposes a novel redundant-cycle technique to efficiently embed a DFE tap in a multi-bit SAR ADC. A receiver block diagram with a direct-feedback 1-tap DFE is shown in Fig. 3.4(a). One of the main challenges in a DFE structure involves meeting the 1UI critical feedback delay path $$t_{clk\to QSA} + t_{sum} < T_b = 1 UI, \tag{3.1}$$ where $t_{clk\to QSA}$ is the clock-to-Q delay of the sense-amplifier comparator, tsum is the summer delay which also includes the delay of DFE coefficient generation [12], and $T_b$ is the bit period equal to $1/f_{CLK}$ in a full-rate architecture. The combination of the time required for the summer to settle to a required accuracy level and the comparator delay, which can have a long regeneration time with small input levels, makes this critical timing path often difficult to meet at high data rates. In order to relax the critical delay path of the DFE feedback, loop unrolling or speculation with a redundant comparator may be used to calculate both positive and negative post-cursor cancellation coefficient possibilities simultaneously [14]. As shown in Fig. 3.4(b), a decision is made for both possible options of the DFE tap, $+\alpha$ and $-\alpha$ , and the correct decision is chosen using a 2:1 multiplexer (MUX) controlled Figure 3.4: DFE implementations: (a) direct-feedback, and (b) loop-unrolled. by the previous detected symbol decision. Now the critical feedback delay path is $$t_{clk\to Q} + t_{mux} < T_b = 1 UI, \tag{3.2}$$ where $t_{clk\to Q}$ is the flip-flop clock-to-Q delay and $t_{mux}$ is the MUX delay. This is generally easier to meet, as all of the signals are operating at full logic levels. However, the primary disadvantage of this technique is that the number of comparators and summers is doubled. Fig. 3.5(a) shows a sequential block diagram of this approach with a time-interleaved SAR ADC. After an initial track-and-hold (T/H) cycle, the MSB computation cycle computes both the positive and negative ISI combinations, Vin+ and Vin, in parallel with the two comparators. The MSB of the previous symbol is then used to select the appropriate comparator output. This approach results in a significant circuit area penalty, as the number of comparators and digital-to-analog converters (DACs) present in the SAR ADC is doubled. Two significant power overheads are also incurred with this approach. The first is associated with clocking the extra comparator and DAC. However, this overhead can be minimized by disabling the incorrect DFE tap polarity comparator and DAC after the MSB computation. The second involves the increased capacitive loading from the additional capacitive DACs, assuming a conventional SAR architecture, that the ADC T/H circuit must drive and the reference voltage buffers must charge, resulting in increased T/H and reference buffer power. Moreover, doubling the comparators and DACs results in mismatch between the two paths, which may necessitate additional calibration. ## 3.2.2 Redundant-Cycle 1-Tap Embedded DFE A new technique to more efficiently embed the DFE tap in a time-interleaved SAR ADC is shown in Fig. 3.5(b). Here, instead of a redundant comparator and DAC, a redundant ADC conversion cycle is added to the normal SAR operation. During the first cycle after the T/H cycle, the MSB value is computed with a $+\alpha$ value and latched, followed by the MSB computation with a $-\alpha$ value in the next cycle. This allows the use of only one comparator and DAC, as in a conventional SAR ADC. Both of the MSB computations are stored, and the previous symbol MSB is used to select the correct computation. For a 6-bit ADC, including the sampling cycle and the redundant cycle, eight equal cycles are used for each sample conversion. The decrease in the ADC sampling rate due to the additional cycle can be compensated by increasing the ADC time-interleaving factor. In this work, the proposed redundant cycle method results in an (8/7)X increase in the timeinterleaving factor and the conversion latency, and almost the same increase in the core ADC area of the 6-bit prototype ADC. However, the increase in the total power is even smaller, since only the power of the time-interleaved SAR ADCs has increased, while the power consumption of the front-end T/Hs and the reference voltage buffers remains approximately the same. Although this implementation requires eight equal cycles, similar to a typical 7- Figure 3.5: Conceptual schematic of a unit SAR ADC with (a) loop-unrolled, and (b) proposed redundant cycle 1-tap embedded DFE. bit SAR ADC, the power and area overhead is less. A 7-bit SAR ADC requires 1-bit higher resolution front-end T/Hs, capacitive DACs, and lower offset, gain, and phase mismatches among the time-interleaved channels, which increases its overhead more than (8/7)X compared to a 6-bit ADC without embedded equalization. It should also be noted that the overhead due to the redundant cycle 1-tap DFE decreases with increases in the ADC resolution, as one extra cycle is always required for this method independent of the resolution. It is worth mentioning that the redundant cycle technique can be expanded to allow for a multi-tap DFE by adding additional cycles for extra taps. For example, a redundant cycle 2-tap embedded DFE requires three extra cycles relative to a normal SAR ADC in order to relax the critical path delay for both DFE taps as shown in Fig. 3.6(a). This implies a (10/7)X increase in the time-interleaving factor, latency, and area. However, this overhead is much less than a SAR ADC with fully loop-unrolled 2-tap embedded DFE realization, where the number of comparators and DACs should be quadrupled as shown in Fig. 3.6(b). While the redundant cycle 1-tap embedded DFE adds some latency to the data conversion process, the critical delay path is similar to a loop-unrolled 1-tap DFE. Fig. 3.7 details the critical delay path for two consecutive ADC channels, ADC(n-1) and ADC(n). Here the critical timing path is governed by (n) clocks operating at the sample frequency fs divided by the time-interleaving factor, fs/16 for the prototype discussed in Section IV, which are spaced by one unit interval. At the end of the second bit cycle, the MSB from ADC(n-1) is resolved and sampled by a flip-flop clocked by (n-1) to produce the select MUX signal for the correct MSB of ADC(n). This ADC(n) MUX output must resolve before being sampled by a flip-flop clocked by (n) to produce the select MUX signal for the ADC(n+1). Thus, the critical delay is $$t_{clk\to Q} + t_{mux} < t_{\Phi(n)} - t_{\Phi(n-1)} = T_b = 1 UI,$$ (3.3) which is the same as the conventional loop-unrolled approach. A second critical timing path exists for the $+/-\alpha$ MUX, summer, and comparator in the DFE operation, which should finish before the sampling instant. As shown Figure 3.6: Conceptual schematic of a unit SAR ADC (a) with redundant cycle 2-tap embedded DFE, and (b) with loop-unrolled 2-tap embedded DFE. in Fig. 3.7, this delay should be less than the duration of one bit cycle, which is equal to 2UI. However, this criteria is generally always satisfied because the normal SAR ADC operation requires that the delay of the SAR logic and capacitive DAC settling, whose delay path is similar to the DFE MUX plus summer, and comparator be less than the duration of one bit cycle. ### 3.2.4 Switched-Capacitor Implementation A switched-capacitor topology has previously been shown as an efficient DFE approach for binary receivers [61]. This work modifies this structure to allow for Figure 3.7: Critical delay path for the redundant cycle 1-tap embedded DFE. The instants when the summation and sampling in the 1-tap embedded DFE occur are shown. embedding a 1-tap DFE in a conventional SAR ADC. A switched-capacitor network, shown in Fig. 3.8(a), provides an efficient implementation of the MUX for choosing between $+\alpha$ and $-\alpha$ and the summer connected to $V_{in}$ for performing the redundant cycle 1-tap embedded DFE. Here a simplified single-ended schematic is utilized to illustrate operation during the first three phases of the SAR conversion cycle, the first sampling phase and the two redundant-cycle MSB computations. During the first cycle the input voltage is sampled on the $C_S$ capacitor, and the differential voltage at the input of comparator, $V_X$ , is zero, as shown in Fig. 3.8(b). In the next cycle, the S switches are OFF and the left side of $C_S$ is connected to $-\alpha$ , as shown in Fig. 3.8(c). Hence, the differential voltage at the input of the comparator is $V_{in} + \alpha$ , and the MSB is resolved for this tap polarity. In the next phase shown in Fig. 3.8(d) the MSB is re-evaluated for the opposite tap polarity, as the left terminal of $C_S$ is now connected to $+\alpha$ , resulting in a differential voltage at the comparator input of $V_{in}\alpha$ . The correct MSB decision is then made based on the MSB of the previous ADC channel. For the remaining ADC bit cycles, the correct DFE coefficient is known a priori, and the required switch for selecting $+\alpha$ or $-\alpha$ is fixed till the end of this SAR conversion period. ### 3.3 ADC Design ### 3.3.1 Time-Interleaved Architecture The redundant cycle embedded DFE is implemented in a 1.6GS/s 6-bit ADC, shown in Fig. 3.9, consisting of two time-interleaved sub-ADCs which operate at 0.8GS/s. Each sub-ADC is formed by eight parallel unit ADCs which have eight operation cycles: one for input sampling, six for bit conversion, and one extra cycle for the equalization. While the total time-interleaving factor is 16, two front-end track-and-holds are used for each sub-ADC, allowing for the use of only two critical Figure 3.8: SAR ADC with embedded 1-tap DFE: (a) simplified block diagram, operation during the (b) sampling phase, (c) first MSB evaluation, and (d) second MSB evaluation. sampling phases at 0.8GHz. The ADC includes calibration DACs for comparator offset and sampling clock skew cancellation. #### 3.3.2 Unit ADC with Embedded 1-Tap DFE Fig. 3.10 shows the fully-differential schematic of the 6-bit unit SAR ADC with embedded redundant cycle 1-tap DFE. A 4-input comparator with two differential input pairs allows separation of the input sampling and ISI cancellation path from the successive approximated value at the output of the reference DAC. One input pair is connected to the DAC output, while the other pair forms the input sampling network which also implements the embedded DFE tap. This allows the main DAC to remain similar to a conventional ADC without embedded DFE. The DAC employs a merged capacitor switching (MCS) scheme [49] which allows for very low switching energy compared to the conventional capacitor DAC switching Figure 3.9: Block diagram of the 16-way time-interleaved SAR ADC with embedded 1-tap DFE. Figure 3.10: Unit SAR ADC schematic with redundant cycle embedded 1-tap DFE. proposed in [45] and also saves 50% of the DAC area through removing the MSB capacitor. In this fully-differential structure, the MSB calculation is performed by comparing the sign of the input while all DAC capacitors are connected to common-mode voltage. Hence, there is no need for MSB capacitors, and a 5-bit capacitive DAC can be used for the 6-bit SAR ADC. A 4fF unit capacitor, which is the default minimum metal-oxide-metal (MOM) capacitor in the 90nm CMOS technology, is employed. In selecting this unit capacitor, both matching and noise performance is considered. Based on Monte Carlo simulations, this value provides <0.05LSB maximum DNL error at a 6-bit resolution. Also, assuming a $1V_{pp}$ maximum swing, it is much larger than the 34aF capacitor size required for an additive noise power less than 0.5LSB. Fig. 3.11 shows the 4-input two-stage dynamic comparator [62] with current-based offset calibration. This comparator has a shorter regeneration time constant compared to a conventional StrongArm dynamic comparator, which results in superior metastability performance. The comparator size is scaled to satisfy a target metastability error better than $10^{-12}$ . Two 5-bit current-steering DACs are used to calibrate comparator offsets at 3mV resolution by sinking a current from the comparator internal nodes. This calibration scheme adds small loading to the comparator nodes which is relatively code-independent and results in negligible speed impact. While the current-steering DAC used for the offset calibration is generally more sensitive to supply and temperature variations compared to other approaches, such as a capacitive DAC, simulations show that the impact of temperature variation is $+50\mu V/^{\circ}C$ for the worst calibration code as shown in Fig. 3.12, which is less than $V_{LSB}/2$ for the 6-bit ADC with $1V_{pp}$ input range in the $-40^{\circ}C$ to $100^{\circ}C$ temperature range, and hence, tolerable. The differential DFE tap coefficients $V_{cmi} + \alpha/2$ and $V_{cmi} - \alpha/2$ in Fig. 3.10 Figure 3.11: Schematic of the 4-input comparator with offset calibration current DACs. Figure 3.12: Temperature dependency of residual unit ADC offset calibrated at $27^{\circ}C$ room temperature. are generated using off-chip tunable voltage regulators, and buffered on-chip before driving the unit ADCs. During the normal ADC operation, where $\alpha$ is set to zero, any offset mismatch equal to $\beta$ volts between the two DFE tap coefficient buffers outputs results in $+\beta$ volts and $-\beta$ volts offset error during the current A/D conversion, for a positive and negative previous input sample, respectively. This error translates to a nonlinear harmonic distortion in the ADC performance. However, this mismatch can be simply calibrated out during measurement. After the offset calibration of all unit ADCs is performed, a positive DC input voltage $+V_1$ , larger than $|\beta|$ , is applied to the ADC. Since, the input is always positive, the ADC output code $D_{OUT1}$ will be the 6-bit representation of $V_1 + \beta$ . Then the same procedure is repeated for a $+2V_1$ DC input voltage. In this case, the ADC output code $D_{OUT2}$ is the 6-bit representation of $+2V_1 + \beta$ . If $\beta$ is zero, $2D_{OUT_1}D_{OUT_2} = 0$ , assuming the ADC digital output is shown in a signed format. In practice, $2D_{OUT_1}D_{OUT_2}$ is non-zero and equal to the 6-bit representation of $\beta$ . In this implementation, one of the off-chip regulators is tuned to make the term $2D_{OUT1}D_{OUT2}$ equal to zero. This procedure can be repeated for multiple voltage pairs to make sure the offset mismatch between the DFE tap coefficients is canceled out completely. # 3.3.3 Front-End Track-and-Hold (T/H) A switched capacitor sampling network using a bootstrapped switch followed by an active buffer is used as the front-end T/H in each sub-ADC, as shown in Fig. 3.13 [63]. Bootstrapping improves the bandwidth and high-swing linearity of the sampling network, especially for the low-power CMOS technology with high MOSFET threshold voltages used in this work, and makes the charge-injection error input independent. A simple pseudo-differential PMOS source-follower is employed as the buffer to isolate the input sampling network from the unit ADCs. These buffers have a low frequency gain of -2.3dB and an 8GHz bandwidth as shown in Fig. 3.14. The gain remains fairly constant up to 800MHz, Nyquist bandwidth of the 1.6GS/s ADC, and the phase varies from 0 to -4.7 degrees in this range. Similar PMOS source follower stages with equal attenuation are also used for on-chip buffering of the reference and common-mode voltages which are generated off-chip. Simulation results show that with a 300mV input common-mode voltage and a $1V_{pp}$ input swing, a linearity better than 6 bits is achieved up to a 4GHz input bandwidth with a 0.8GHz sub-ADC sample clock. This front-end T/H architecture allows a very large input sampling bandwidth, as the sampling capacitor is the ~30fF parasitic capacitance at the source-follower input, which is significantly smaller than the 120fF $C_S$ in the unit ADC and the added loading due to the routing to all of the time-interleaved unit ADCs in each sub-ADC. Here the $370\mu V_{rms}$ kT/C noise from the 30fF input sampling network is not a limiting factor for the 6-bit ADC with $1V_{pp}$ input range. ## 3.3.4 On-Die Offset and Clock-Skew Calibration In this work, on-die offset and sampling clock skew calibration schemes are implemented to alleviate the mismatches among the parallel unit ADCs, and improve overall performance. ## 3.3.4.1 Foreground Offset Calibration As the proposed ADC employs 16 parallel unit SAR ADCs, any offset mismatch among them can limit the performance of the overall time-interleaved architecture. The offset voltage in each unit ADC has two main sources: the front-end T/H and the unit SAR ADC's comparator. Monte Carlo simulations show that the total output-referred offset of the front-end T/H is $\sigma = 8.2mV$ and the four input comparator input-referred offset is $\sigma = 11.2mV$ , yielding a total offset at the comparator input in each unit ADC of $\sigma \approx 13.9mV$ . Using the differential offset calibration current- Figure 3.13: Front-end T/H: (a) schematic, and (b) bootstrapped switch structure. steering DAC shown in Fig. 3.11, a correction resolution of 3mV and maximum range of $\pm 90$ mV is achieved which covers more than $\pm 5\sigma$ range of the total offset voltage. Fig. 3.15(a) shows the setup for foreground offset calibration. The ADC differential input is set to zero by connecting both positive and negative inputs to the 300mV input common-mode voltage. A 16-to-1 MUX is then used to choose the MSB of the unit ADC under calibration, and two 5b calibration codes set the correct current in the comparator calibration DAC (Fig. 3.11). The optimum calibration code is determined when the MSB of the unit ADC under test toggles between 0 and 1 with near 50% probability. This procedure is then repeated for all unit ADCs. Figure 3.14: Simulated front-end T/H buffer frequency response. ## 3.3.4.2 Foreground Clock Skew Calibration The phase mismatch calibration of the proposed 16-way time-interleaved ADC is relaxed by utilizing the two front-end T/Hs sampling at $f_s/2$ . Since the T/H outputs are ideally held constant during the hold phase, any small phase mismatch in the unit ADC sampling clock following the T/H will not result in any overall ADC performance degradation. Thus, it is only necessary to calibrate these two critical T/H sampling phases. Monte Carlo simulations show that the clock buffer and distribution network adds a phase mismatch with $\sigma \sim 3.5ps$ between the two front-end T/H complementary sampling phases. The digitally-controlled delay lines in the clock distribution path allow any phase mismatch to be calibrated to less than 1ps with +/11.5ps tuning range, which covers about $\pm 3\sigma$ variation. A foreground calibration procedure is used for cancelling the phase mismatch, Figure 3.15: Simplified diagrams of the foreground (a) offset calibration, and (b) clock skew calibration setups. as shown in Fig. 3.15(b). The ADC output FFT is measured with a sinewave input with frequency $f_{in}$ and the main spur in the frequency response due to the phase mismatch between the two T/H sampling phases, which occurs at $f_s/2f_{in}$ , is observed. By tuning the digitally-controlled MOS capacitor arrays in the clock distribution network, the optimum calibration code results in minimizing this spur amplitude and the best ADC output THD. #### 3.4 Measurement Results Fig. 3.16 shows the chip micrograph of the prototype 6b ADC, which was fabricated in an LP 90nm CMOS process and occupies a total active area of $0.24mm^2$ . The core time-interleaved ADC consists of two sub-ADCs, where each sub-ADC is constructed from 8 parallel unit SAR ADCs. In order to optimize the critical MSB delay path for DFE operation, the unit ADCs are placed in a way that balances the distance between every two consecutive ADCs. Emphasis is placed on maintaining symmetry between the two sub-ADCs by placing both the reference and common-mode voltage buffers and the start generator in the middle. Also, the two front-end T/Hs are distributed symmetrically with the sampling phases routed from the central phase generation and distribution block. The characterization of the core ADC and the embedded redundant cycle 1-tap DFE is discussed next. The custom designed board for testing the 1.6GS/s 6-bit ADC is shown in Fig. 3.17. The 90nm CMOS die is packaged in a $7mm \times 7mm$ open cavity 48-pin QFN package. The chip is soldered on the bottom side of the PCB to directly route the 1.6Gb/s outputs traces to the vertical SMAs without creating an open stub, hence, decreasing the undesired reflections at the interface between PCB trace and SMA connector. ## 3.4.1 Core ADC Characterization The DFE coefficient $\alpha$ is set to zero to characterize the general performance of the 6-bit ADC. For ADC testing the gain and offset errors are calibrated among the 16 time-interleaved unit ADCs, while the two complementary sampling clocks at $f_s/2$ are calibrated for phase mismatch. The dynamic performance of the full timeinterleaved ADC at 1.6GHz sampling frequency is shown in Fig. 3.18 as a function of the input frequency, with a maximum effective number of bits (ENOB) of 4.75 Figure 3.16: Prototype ADC implemented in an LP 90nm CMOS process: (a) chip micrograph, and (b) optimized order of unit ADCs with respect to spacing between each two consecutive ADCs. bits. By using the front-end active T/Hs an ADC effective resolution bandwidth (ERBW) of 1.5GHz is achieved, which is almost twice the Nyquist bandwidth of the 1.6GS/s ADC, i.e. 800MHz. Note that the SNDR/SFDR curves have a local minimum at around 50MHz input frequency, as this is the Nyquist bandwidth of each unit ADC in the time-interleaved structure. At this frequency each unit SAR ADC will experience maximum low-frequency nonlinearity. The frequency spectrum of the 1.6GS/s ADC at 48.437 MHz input frequency after calibration is shown in Fig. 3.19. Here the second and third harmonics are dominant, while the distortion due to the phase mismatch between the two T/H sampling phases, located at $f_s/2 - f_{in}$ , is non-dominant. Although the whole ADC is differential, the large second-order harmonic distortion arises from the phase unbalance in the balun used for single-ended to differential translation of the input signal in test setup, and the pseudo-differential topology of the front-end T/Hs. At high input frequencies the sampling clock jitter limits the overall ADC performance, and the SNDR in Fig. 3.18 drops quickly with Figure 3.17: Custom test board for the prototype 1.6GS/s ADC implemented in an LP 90nm CMOS process. increasing input frequency. Static characterization of the ADC is performed using a sinewave histogram technique [64] and a 2.7MHz input at 1.6GS/s. Maximum DNL and INL values for the 6-bit ADC are +0.67/0.48 LSB and +1.6/1.7 LSB, respectively, as shown in Fig. 3.20. ### 3.4.2 Embedded DFE Functionality In order to extract the range and resolution of the embedded DFE, Fig. 3.21 shows the average time-interleaved ADC output as a function of DFE tap coefficient voltage for two DC input cases of $V_{in} = 0.5V$ and $V_{in} = 0.5V$ , i.e. the extremes of the $1V_{pp}$ input range. For $V_{in} = 0.5V$ , the MSB should resolve to one and the DFE coefficient should subtract from the input voltage. As shown in the right-half of Fig. 3.21, as the DFE coefficient is increased the averaged ADC output code linearly Figure 3.18: ADC SNDR/SFDR vs. input frequency at $f_s = 1.6$ GHz. decreases. A similar process occurs for $V_{in} = 0.5V$ , where the DFE coefficient should effectively add to the input voltage, and in the left-half of Fig. 3.21 the averaged ADC output code linearly increases as the absolute value of the DFE coefficient is increased. This linear transfer characteristic confirms that the embedded DFE coefficient achieves a resolution better than the 6-bit ADC, and has a range as large as the ADC maximum input range. In order to verify the functionality of the embedded 1-tap DFE, a $1.6 \,\mathrm{Gb/s}\ 2^{23}-1$ PRBS input is passed through a two-tap FIR filter $(1-\alpha Z^{-1})$ from a Centellax PCB12500 transmit module to emulate a controlled ISI amount. The ADC input eye diagram with 15dB de-emphasis is shown in Fig. 3.22(a). Using a 1-tap DFE with the same coefficient, this de-emphasis ISI can ideally be completely removed. The mid-point eye opening at the ADC output after reconstruction of the digital output word is shown in Fig. 3.22 with and without embedded DFE enabled. Activating the DFE, ISI subtraction improves the eye opening from 4 LSBs to 27 LSBs. The embedded DFE operation is also verified by measuring the bit error rate Figure 3.19: The 1.6GS/s ADC normalized output spectrum for $f_{in} = 48.437$ MHz. (BER) on the three FR4 channels shown in Fig. 3.3, a 30" channel with a smooth attenuation profile, a 28" channel with a notch-shaped frequency response, and a 46" channel with higher loss profile compared to the other two channels. Here the MSB output of the ADC is fed back to the Centellax PCB12500 in order to produce BER bathtub curves with a $1V_{ppd}$ $2^{10} - 1$ PRBS input without any transmit equalization, as shown in Fig. 3.23. While the eye is already open without embedded equalization at a BER= $10^{-9}$ for the first two channels, the horizontal eye opening improves after applying the 1-tap embedded DFE, with the improvement being more significant for the notch channel. For channel 3 with ~14dB loss at the Nyquist bandwidth, the embedded DFE opens the previously closed eye, and results in 0.2UI timing margin at a BER= $10^{-9}$ . To further investigate the effectiveness of the proposed embedded DFE, the BER performance of the two lower-loss channels are measured for a $300mV_{ppd}$ swing at the transmitter as shown in Fig. 3.24, which forces the Figure 3.20: DNL/INL plots with $f_{in}=2.7$ MHz at $f_s=1.6$ GHz. Figure 3.21: Measured DFE tap coefficient range and resolution using a DC input voltage. Figure 3.22: 1.6 Gb/s ADC input generated by $2^{23}-1$ PRBS after a 2-tap FIR with 15dB de-emphasis, and measured digitized 6b ADC output (b) without, and (c) with 1-tap DFE enabled. Figure 3.23: Measured bathtub curves for the (a) 30-inch smooth, (b) 28-inch notch, and (c) 46-inch higher-loss FR4 channels shown in Fig. 3.3, with and without 1-tap embedded DFE for a $2^{10}-1$ PRBS input with $1V_{pp}$ TX swing and no TX equalization. Figure 3.24: Measured bathtub curves for the (a) 30-inch smooth, and (b) 28-inch notch FR4 channels shown in Fig. 3.3, with and without 1-tap embedded DFE for a $2^{10}-1$ PRBS input with $300mV_{pp}$ TX swing and no TX equalization. notch channel to have a very poor BER performance without any equalization. In the smooth-loss channel, the horizontal opening is improved by more than 0.1UI at a BER= $10^{-9}$ relative to without any DFE, while for the notch channel enabling the embedded DFE allows a dramatic increase in horizontal eye opening to near 0.25UI. The main specifications of the designed ADC are summarized in Table 3.1. The figure of merit (FOM) for the prototype ADC is calculated as $$FOM = \frac{Power}{min\{f_s, 2ERBW\}.2^{ENOB}}(J/conv. - step), \tag{3.4}$$ where $f_s$ is the sampling frequency, and ERBW is the input frequency that SNDR degrades 3dB compared to its low-frequency value. This equation results in a FOM of 0.46 and 0.58 pJ/conv.-step considering the ENOB at low-frequency and Nyquist bandwidth (800MHz), respectively. The ADC performance is also compared to the previously reported similar works. Note that the traditional DFE implementation of this papers design, which utilizes a symbol decision, differs from the multi-level embedded DFE implementation of [13], which does not make a hard symbol decision. To the best of our knowledge, this is the first ADC with a true embedded DFE implementation. The proposed design has significantly better FOM relative to the pipeline design with embedded DFE of [13]. Although the sampling frequency of this work is lower than [13] and [65], this can be improved by increasing the time-interleaving factor further without compromising the overall ADC FOM. This work also shows comparable performance as the designs of [44,65–67], which do not include any equalization functionality. #### 3.5 Conclusion A 1.6GS/s 16-way time-interleaved SAR ADC with embedded 1-tap DFE suitable for high-speed link applications is presented in this chapter. The proposed redun- dant cycle technique allows embedding DFE with low power and area overheads. Embedding this partial equalization inside the front-end ADC can result in lowering the complexity of back-end DSP and/or decreasing the ADC resolution requirement. The 1.6GS/s 6-bit prototype ADC with redundant cycle 1-tap embedded DFE is fabricated in an LP 90nm CMOS process in $0.24mm^2$ area, and consumes 20.1mW total power while achieving a FOM = 0.58pJ/conv.-step. Table 3.1: 16-Way 1.6GS/s 6-Bit ADC Performance Comparison | Table 5.1. 10-way 1.005/s 0-bit ADC 1 efformatice Comparison | | | | | | | |--------------------------------------------------------------|--------------------------|--------|-----------|---------------|-------------------------|-----------| | Specification | Varzaghani'09 | Cao'09 | Alpman'09 | Yang'10 | Jiang'12 | This Work | | | [13] | [66] | [65] | [67] | [44] | [68] | | CMOS Technology | 130-nm | 130-nm | 45-nm | 65-nm | 40-nm | 90-nm | | Supply Voltage (V) | 1.2 | 1.2 | 1.1 | 1.2 | 1.0 | 1.3 | | ADC Architecture | TI-Pipeline | TI-SAR | TI-SAR | Async. TI-SAR | Async. SAR | TI-SAR | | Embedded Equalization | $\mathrm{DFE}^{\dagger}$ | No | No | No | No | DFE | | Input Capacitance (fF) | 104 | <100 | N/A | 84 | N/A | 60 | | Input Range $(V_{pp})$ | N/A | 1.2 | 1.0 | N/A | 2.0 | 1.0 | | Resolution (bit) | 5 | 6 | 7 | 6 | 6 | 6 | | Sampling Rate (GS/s) | 4.8 | 1.25 | 2.5 | 1.0 | 1.25 | 1.6 | | ERBW (GHz) | 4 | 0.45 | 1.25 | 0.5 | 0.6 | 1.5 | | Max ENOB (bit) | 4.76 | 5.5 | 5.9 | 4.94 | 4.77 | 4.75 | | Power (mW) | 300 | 32 | 50 | 6.27 | $6.08^{\dagger\dagger}$ | 20.1 | | FOM (pJ/convstep) | 2.3 | 0.8 | 0.47 | 0.27 | 0.25 | 0.58 | | Active Area (mm <sup>2</sup> ) | 1.69 | 2.32 | 1.0 | 0.11 | 0.014 | 0.24 | <sup>†</sup>The embedded equalization is referred as multi-level DFE in [13], which differs from normal 1-tap DFE. ††There is no front-end active T/H in [44], and this structure does not need reference or common-mode voltage buffers. ### 4. 6-BIT 10-GS/S ADC WITH EMBEDDED EQUALIZATION\* In this chapter two high-speed time-interleaved ADC prototypes are analyzed with partial embedded equalization for improving the efficiency of ADC-based receivers in wireline communications. First section details a 6-bit 10-GS/s time-interleaved SAR ADC with embedded 2-tap sampled FFE and 1-tap redundant cycle DFE, which was introduced in the previous chapter. The embedded equalization implementation in this work has a limited ISI cancellation range. Second section explains a 6-bit 10-GS/s ADC with embedded 3-tap FFE, which has resolved the limited range issue in the previous design, and achieves a maximum ISI cancellation range as large as the main cursor value. Moreover, asynchronous SAR architecture is used for unit ADCs in this work in order to reduce the number of total time-interleaved unit ADCs by half compared to the first 10GS/s prototype, and hence, simplify the design and calibration process. The proposed time-interleaved ADC with embedded 3-tap FFE is used as the front-end of a hybrid ADC-based receiver followed by further linear and nonlinear digital equalization in order to accommodate operation over 30+dB attenuation channels [69]. #### 4.1 A 6-Bit 10GS/s ADC with Embedded 2-Tap FFE and 1-Tap DFE Feed-forward equalizers are effective in canceling a large amount of inter-symbol interference (ISI) with a relatively small number of taps. A 2-tap version of this <sup>\*© 2014</sup> IEEE. Part of section 4.1 is reprinted, with permission, from E. Zhian Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6 bit 10 GS/s TI-SAR ADC with low-overhead embedded FFE/DFE equalization for wireline receiver applications," IEEE J. Solid-State Circuits, vol. 49, no. 11, pp. 2560–2574, Nov. 2014. <sup>© 2015</sup> IEEE. Part of section 4.2 is reprinted, with permission, from A. Shafik, E. Zhian Tabasy, S. Cai, K. Lee, S. Hoyos, and S. Palermo, "A 10Gb/s hybrid ADC-based receiver with embedded 3-tap analog FFE and dynamically-enabled digital equalization in 65nm CMOS," in ISSCC Dig. Tech. Papers, Feb. 2015, pp. 1–3. equalizer topology has been implemented in a time-interleaved (TI) flash ADC with additional CML input stages that follow the input track-and-holds (T/H) to realize the extra FFE tap [5]. While this approach is effective, significant linearity, speed, and power consumption trade-offs exist with this current-mode approach. FFEs have also been embedded in successive approximation register (SAR) ADCs [16], [17], with charge-sharing in a capacitive digital-to-analog converter (CDAC) performing the signal scaling and summation of multiple input samples, followed by ADC conversion. However, a drawback of this single-CDAC approach is that the main cursor signal is attenuated such that the FFE tap sum is always fixed, similar to transmitter de-emphasis equalization [12]. Decision-feedback equalizers offer the ability to cancel post-cursor inter-symbol interference (ISI) without amplifying noise or cross-talk. Embedded multi-level decision-feedback equalization (DFE) has been previously proposed for pipeline ADCs [13]. As satisfying the DFE feedback critical timing path is not trivial at high data rates, [13] employs loop unrolling or speculative-summing [14] with additional comparators, resulting in significant hardware overhead. A more efficient implementation in a SAR ADC is proposed which involves the use of a redundant conversion cycle [68], [70] rather than redundant comparators and DACs, to perform the loop unrolling operation. While this does increase the number of required conversion cycles, the overhead is only $(8/7) \times$ for a conventional 6-bit SAR converter. This work presents a 10GS/s 6-bit ADC which efficiently incorporates both a novel 2-tap embedded FFE and a 1-tap embedded DFE directly into the capacitive DAC of a time-interleaved SAR ADC [70]. A key goal of this design was to demonstrate the viability of the embedded equalizer approach for wireline receiver ADCs through the implementation of a 10GS/s concept prototype. Section 4.1.1 presents statistical bit error rate (BER) modeling results of ADC-based receivers that quantify the performance advantages of embedded equalization. The proposed embedded equalization techniques, which allow for flexibility in equalizer tap weighting at minimal hardware and power overhead, are analyzed in Section 4.1.2. Section 4.1.3 details the ADC architecture and the main circuit blocks, where power is further optimized through the use of dual voltage supplies. Experimental results from a general purpose (GP) 65nm CMOS prototype are presented in Section 4.1.4. ## 4.1.1 Embedded Equalization Modeling Statistical link modeling [59] allows for both system voltage and timing margins to be efficiently estimated. This section first highlights the differences between a conventional architecture, consisting of an ADC and subsequent digital equalization, and a system with an ADC with embedded DFE and FFE. Results from an ADC-based serial link statistical modeling tool [8] are then presented that show the system performance impact of embedded DFE and FFE equalization for 10-Gb/s operation over four different FR4 channels. A conventional architecture, consisting of an ADC and subsequent digital equalization, and a system with an ADC with embedded DFE and FFE are shown in Fig. 4.1. In order to implement a 1-tap DFE with NRZ signaling (Fig. 4.1(a)), the MSB of either the digital equalizer output or the ADC with embedded DFE is fed back, weighted by the DFE coefficient, and subtracted. Quantization noise is reduced in the system with an ADC with embedded DFE, as the equalization tap is subtracted from the un-quantized analog input. In order to implement a 2-tap FFE (Fig. 4.1(b)), the input signal is delayed, weighted by the FFE coefficient, and then summed. Again, quantization noise is reduced in the system with an ADC with embedded FFE, as the full analog resolution is preserved for the input, delayed signal, and the final summation value. Our previous statistical modeling studies [8], [68] Figure 4.1: Block diagrams of (a) digital versus embedded DFE, and (b) digital versus embedded FFE. have shown that the quantization noise reduction offered by both the embedded DFE and FFE equalization allows for both a lower ADC resolution and reduced digital equalization complexity at a target BER. In order to quantify the relative performance impact of embedded DFE and FFE equalization, the four FR4 channels of Fig. 4.2 are utilized. As shown in Fig. 4.2(a), the loss at the 5-GHz Nyquist frequency increases with channel length, with the longest 30" channel having 23.8 dB attenuation. This is reflected in the time domain 10-Gb/s pulse responses (Fig. 4.2(b)), where the ratio of the main cursor to the ISI cursor values degrades with channel length. 10-Gb/s operation is modeled with the statistical link tool, assuming a $500mV_{ppd}$ transmit swing, 1mVrms receiver inputreferred thermal noise, 5mV uniform supply noise, and receiver sampling jitter with a 0.02 unit interval (UI) deterministic component (DJ) in the form of duty cycle Figure 4.2: (a) Magnitude and (b) 10Gb/s pulse responses of four FR4 channels. distortion and a 0.02 $\mathrm{UI}_{rms}$ random component (RJ). Fig. 4.3 shows the advantage of embedded equalization over its digital counterpart for channels 1–3, with the receiver voltage margin (BER= $10^{-12}$ ) obtained versus front-end ADC resolution for both digital and embedded implementations of a 2-tap FFE plus 1-tap DFE equalization structure. Similar to the prototype discussed later, here the embedded 2-tap FFE consists of an un-attenuated main cursor and an adjustable second FFE tap with $V_{LSB}/4$ maximum coefficient resolution, while the embedded DFE has an un-quantized analog resolution. Due to the quantization error, the digital equalization implementation requires more than 6-bits effective ADC resolution to achieve a similar performance as the embedded equalization architecture. The impact of the various embedded equalization schemes is shown in the 10-Gb/s voltage and timing margins of Fig. 4.4(a) and (b), respectively. For the case when no equalization is embedded in the ADC, only the relatively low-loss 6" channel displays an open eye. Including a 1-tap DFE allows cancellation of the first post-cursor ISI term, which improves the 6" channel margins and opens the previously-closed eye for the 10" channel. However, operation is still Figure 4.3: Simulated voltage margin versus ADC resolution with both digital and embedded implementations of a 2-tap FFE + 1-tap DFE equalization structure for channels 1-3 in Fig. 4.2. not possible for the 15" channel due to excessive residual ISI. As a 2-tap FFE can cancel significant long-tail ISI, better margins are obtained relative to the DFE-only scenario, with all three channels displaying open eyes. Combining both the 2-tap FFE and 1-tap DFE yields the best margins, with the 15" channel having the largest 6× increase in voltage margin relative to the FFE-only case. Finally, it is interesting to consider the potential impact adding a front-end continuous-time linear equalizer (CTLE) can have, particularly with the highest-loss 30" channel. As shown in the Fig. 5(c) voltage and timing margins, combining embedded equalization with a front-end CTLE allows for opening a previously closed eye, with the embedded DFE providing a higher relative improvement versus embedded FFE. These modeling results show that embedded equalization can be useful for both reducing the required ADC resolution and providing a better input signal for subsequent digital equalization, translating into a simpler digital back-end. Although it is beyond the scope of the presented work, the embedded DFE can also be used to en- Figure 4.4: Impact of including embedded DFE and FFE equalization on (a) voltage margin and (b) timing margin for channels 1-3 in Fig. 4.2, with tap coefficients shown for the embedded equalization. (c) Impact of including embedded DFE and FFE equalization on voltage margin and timing margin in the presence of a front-end CTLE for channel 4 in Fig. 4.2. able a hybrid receiver mode [4]. For low ISI channels, only the embedded equalization is used with a reduced re-configurable ADC resolution, while for high ISI channels where the embedded equalization alone does not provide the target BER, the embedded DFE can be disabled to avoid potential error propagation and the front-end ADC with embedded FFE allows for a reduced complexity digital equalizer relative to a separate dual-path front-end implementation [4]. ### 4.1.2 SAR ADC with Low-Overhead Embedded FFE and DFE In order to leverage the potential performance improvements predicted by the modeling results of the previous section, low-overhead implementations of embedded FFE and DFE are necessary. This section describes a novel approach to efficiently embed both a 2-tap FFE and 1-tap DFE into a time-interleaved SAR ADC, with the conceptual operation first explained, followed by the switched-capacitor implementation details. ### 4.1.2.1 Unit ADC with Embedded 2-Tap FFE and 1-Tap DFE A sequential block diagram detailing the different operation phases of the proposed unit SAR ADC with embedded 2-tap FFE and 1-tap DFE is shown in Fig. 4.5. In order to realize the 2-tap FFE, this implementation uses the output of two consecutive track-and-holds (T/Hs) found in a time-interleaved (TI) architecture. Both the current input voltage Vin,n and the previous input voltage $V_{in,n-1}$ are sampled during the first cycle, with a weighting factor of $\beta$ applied to $V_{in,n-1}$ via charge sharing in a CDAC. These two voltages are subtracted at the input of the comparator during the subsequent conversion periods to create the transfer function of a 2-tap FFE. The redundant cycle 1-tap DFE is realized in the second and third cycle, with the MSB value first computed with a + DFE coefficient value and latched, followed by the MSB computation with a $-\alpha$ value in the next cycle [68]. This allows the use of only one comparator and DAC, as in a conventional SAR ADC. At the end of the second MSB cycle the previous symbol MSB is used to select the correct computation and $\alpha$ polarity to use in all the remaining SAR conversion cycles. While the redundant cycle 1-tap embedded DFE adds some latency to the data conversion process, the critical delay path is similar to that of a loop-unrolled 1-tap DFE, as detailed in [68]. Overall, eight equal cycles are used for each sample conversion in a Figure 4.5: Conceptual schematic of a unit SAR ADC with the proposed sampled 2-tap embedded FFE and redundant cycle 1-tap embedded DFE. 6-bit ADC, including the sampling cycle and the redundant cycle for the embedded 1-tap DFE. For a given total ADC sample rate, the proposed redundant cycle method results in an $(8/7)\times$ increase in time-interleaving factor and conversion latency, and almost the same increase in the core ADC area. # 4.1.2.2 Switched-Capacitor Implementation Fig. 4.6(a) shows a simplified single-ended unit ADC schematic to illustrate the switched-capacitor implementation of the 2-tap FFE and 1-tap DFE during the first three phases of the SAR conversion, the sampling phase and the two redundant-cycle MSB computations. An efficient implementation of the redundant cycle 1-tap embedded DFE MUX is realized with the current input sampling capacitor $C_S$ and switches between $+\alpha$ , $-\alpha$ , and GND. The sampled input on $C_S$ also acts as the un-attenuated main cursor tap for the embedded FFE. Embedding the second FFE tap inside the negative-input capacitive DAC structure is achieved with the $B_1 - B_5$ switches that select between the previous input or GND to provide the $\beta$ coefficient weighting without impacting the main cursor value. During the sampling cycle $V_{in,n}$ is sampled on the $C_S$ capacitor using top-plate sampling, while $V_{in,n-1}$ is sampled on a portion of the negative-input DAC capacitors using bottom-plate sampling, as shown in Fig. 4.6(b). The FFE coefficient $\beta$ is defined by a 5-bit word $B_1B_2B_3B_4B_5$ , set to 10001 in this example to charge only $16C_u$ and $C_u$ capacitors with $V_{in,n-1}$ and discharge the other DAC capacitors. In the next cycle (Fig. 4.6(c)) the $\Phi_S$ switches are OFF and the bottom-plate of all the negative-input DAC capacitors are connected to ground. The resultant charge sharing induces a $\beta V_{in,n-1}$ value at the comparator negative input. By having the main cursor value Vin,n at the comparator positive input, assuming the DFE coefficient $\alpha = 0$ for now, the voltage $V_{in,n} - \beta V_{in,n-1}$ appears at the comparator differential input to emulate the 2-tap FFE, where only the post-cursor tap coefficient is adjustable. Note that while a negative version of the previous input voltage $V_{in,n-1}$ is required in this technique, this is easily available in a fully-differential architecture. Considering a non-zero DFE coefficient for this first MSB cycle, the comparator differential input voltage is $V_{in,n} - \beta V_{in,n-1} + \alpha$ due to the top side of CS being connected to $+\alpha$ . The MSB value for this DFE tap polarity is then stored in a latch. In the next phase (Fig. 4.6(d)), the MSB is re-evaluated for the opposite DFE tap polarity, as the top side of $C_S$ is now connected to $-\alpha$ , resulting in a differential voltage at the comparator input of $V_{in,n} - \beta V_{in,n-1} - \alpha$ . The correct MSB decision is then made based on the MSB of the previous ADC channel, and for the remaining ADC bit cycles the corresponding switch for selecting $+\alpha$ or $-\alpha$ is fixed till the end of the SAR conversion period. According to Fig. 4.6, the FFE second tap coefficient $\beta$ normalized to the main Figure 4.6: Simplified unit SAR ADC with embedded 2-tap FFE and 1-tap DFE: (a) single-ended schematic, and operation during the (b) sampling phase, (c) first MSB evaluation, and (d) second MSB evaluation assuming $B_1B_2B_3B_4B_5 = 10001$ for the FFE. cursor tap is ideally equal to $(B_1B_2B_3B_4B_5)_2/32$ , where $(.)_2$ represents the binary-to-decimal conversion operator. However, since the main cursor is sampled directly on the top-plate of $C_S$ , while bottom-plate sampling is employed for the second tap, some attenuation is introduced at the DAC output due to capacitive division between the DAC capacitors and the comparator input capacitance. In practice $\beta$ can be calculated as $$\beta = \frac{(B_1 B_2 B_3 B_4 B_5)_2}{32} \times \frac{C_{DAC}}{C_{DAC} + C_{ip}},\tag{4.1}$$ where $C_{DAC}$ is the total CDAC capacitance, and $C_{ip}$ is the comparator input capacitance. Although not included in the current prototype, extra digitally controlled capacitors can be added to the capacitive DAC in order to control the FFE tap coefficient with one more degree of freedom. ### 4.1.3 ADC Design # 4.1.3.1 Time-Interleaved Architecture Fig. 4.7 shows the implementation of the SAR ADC with embedded FFE and DFE in a 10-GS/s 6-bit converter with 64 time-interleaved unit ADCs. The entire 64-way time-interleaved structure consists of eight time-interleaved sub-ADCs, where each sub-ADC operates at $f_s/8 = 1.25GS/s$ and is formed by eight parallel unit ADCs. Each unit ADC has eight operation cycles: one for input/2-tap FFE sampling, six for bit conversion, and one extra cycle for the embedded 1-tap DFE. Eight frontend track-and-holds, one per sub-ADC, are employed to allow for the use of only eight critical sampling phases at 1.25-GHz. Calibration DACs are included for both comparator offset correction in all 64 unit SAR ADCs and sampling clock skew correction for the eight front-end T/H sampling phases. # 4.1.3.2 Unit ADC with Embedded 2-Tap FFE and 1-Tap DFE The fully-differential schematic of the 6-bit unit SAR ADC with embedded 2-tap sampled FFE and redundant cycle 1-tap embedded DFE is shown in Fig. 4.8. A modified StrongArm comparator with two differential input pairs is used. One input pair is connected to the sampling capacitor, which samples the main cursor and implements the embedded 1-tap DFE functionality. The other input pair is connected to the DAC output, which also implements the FFE second-tap. Since part of the DAC capacitors are connected to the $T/H_{(n-1)}$ output whose hold phase ends 1 UI = 100 ps sooner than $T/H_{(n)}$ , a modified version of the sampling phase $\Phi_{SAn,j}$ , which falls to zero 100ps in advance of normal sampling phase $\Phi_{Sn,j}$ (Fig. 4.7), is used for connecting the top-plate of the DAC capacitors to the input common- Figure 4.7: Block diagram of the 64-way time-interleaved SAR ADC with embedded FFE and DFE. mode voltage $V_{cmi}$ during the sampling phase. A merged capacitor switching (MCS) scheme [49], which allows for very low switching energy and reduced area through removing the MSB capacitor, is employed in the DAC of each 6-bit unit SAR ADC. To further reduce DAC area, a custom layout with a 0.45fF metal-oxide-metal (MOM) unit capacitor ( $C_u$ ) is employed, as shown in Fig. 4.9(a). Minimum width metal 4 (MET4) and metal 5 (MET5) layers with minimum spacing are used, resulting in the optimum desired capacitance value with respect to the bottom-plate parasitic capacitance to the substrate. Both matching and noise performance are considered in the selection of the unit capacitor value. Monte Carlo simulations of the worst-case DNL error due to DAC capacitive mismatch, which happens in the transition from 01111 to 10000 in the utilized 5-bit CDAC, are shown in Fig. 4.9(b). These results consider both process and local Figure 4.8: Fully differential schematic of the unit ADC with sampled 2-tap embedded FFE and redundant cycle 1-tap embedded DFE. mismatch variations, with the Monte Carlo parameters extrapolated beyond the 4fF minimum MOM capacitor offered by the design kit [71]. Since the spacing of the metal fingers in the MOM capacitor is always equal to the minimum 100nm, the unit capacitor mismatch $\sigma C_u$ is approximately scaled by the square root of the capacitor area controlled by the finger length and number of fingers. The 0.45fF unit capacitor value results in this maximum DNL error having $3\sigma < 0.5$ LSB at 6-bit resolution. This value is also larger than the 0.136fF capacitor size required for an additive noise voltage less than $0.5V_{LSB}$ with a $500mV_{pp}$ maximum swing. As the two-stage dynamic comparator allows for high performance at low supply voltages [62], a lower $V_{DDL} = 0.9V$ is used for the comparator and SAR logic to reduce the core ADC power, while the nominal $V_{DD} = 1.1V$ is used for the DAC switches. A foreground technique [68] is employed to control the pseudo-differential 6-bit current-steering DACs that perform offset calibration of the 64 comparators in the time-interleaved ADC. By injecting this calibration current into the internal comparator nodes, an offset correction resolution < 3mV is achieved. Fig. 4.10 Figure 4.9: (a) Custom layout of the capacitive DAC with 0.45fF MOM unit capacitors. (b) CDAC worst-case 01111 to 11111 transition DNL simulation results using 1000 Monte Carlo iterations. shows the simplified setup for foreground offset calibration. The ADC differential input is set to zero by connecting both positive and negative inputs to the 300-mV input common-mode voltage. A 64-to-1 MUX is then used to choose the MSB of the unit ADC under calibration. The optimum calibration code, applied using the serial scan chain, is determined when the MSB of the unit ADC under test toggles between 0 and 1 with near 50% probability. Fig. 4.11 considers two different cases assuming that the offset is initially calibrated at room temperature $(27^{\circ}C)$ in both cases. For the first case the minimum calibration code is applied, and the residual offsets from $0^{\circ}C$ to $85^{\circ}C$ are extracted. The second case utilizes the maximum offset calibration code required for the 64 time-interleaved unit ADCs based on the measurements in the same temperature range. Here the worst temperature sensitivity is observed, with the residual offset equal to -1.5mV at $0^{\circ}C$ and +3.8mV at $85^{\circ}C$ , which translates to $+62\mu V/^{\circ}C$ . Note that for the current 6b ADC with $500mV_{pp}$ input range, this maximum variation is about $0.5V_{LSB}$ and hence tolerable. Furthermore, the comparator input pairs sharing the same source connection are swapped as $V_{in+}/V_{R+}$ and $V_{in-}/V_{R-}$ (Fig. 4.8) in order to decrease the sensitivity to common-mode variations between the differential input and reference terminals. This configuration also helps with the comparator sensitivity near a large DAC differential output. In order to relax the comparator device sizing constraints and also maintain low metastability error impact, the metastability detection and correction algorithm detailed in Fig. 4.12 is utilized. Metastability is detected by sampling the XOR of the comparator differential outputs using a version of the comparator clock delayed by half a bit cycle period (400ps). If the sampled XOR output is ZERO, the comparator input is not large enough to force the outputs into distinguishable logic levels after half a clock cycle and metastability has occurred. The MT signal is then set to ONE and a metastable-then-set (MTS) algorithm [43] is used to assign the current bit to ONE and the remaining bits to ZERO. Utilizing the MTS algorithm, now the comparator sizing is not dictated by a very low metastability error specification; instead, it can be relaxed in a manner to just resolve digital output levels for a $0.5V_{LSB}$ input in less than half a bit cycle period. This way metastability only happens for inputs less than $0.5V_{LSB}$ away from the assigned digital output by the MTS algorithm, and the maximum output error due to metastability is only one Figure 4.10: Simplified diagram of the foreground offset and clock skew calibrations setup. LSB. In order to reduce the probability of the XOR detector going into a metastable state, it should be verified that the combination of comparator and XOR achieve the target metastability error rate. However, since these two stages are cascaded, this error is exponentially reduced, and it is usually not critical. Fig. 4.13 shows the front-end T/H in each sub-ADC, consisting of a switched capacitor sampling network using a bootstrapped switch [63] followed by an active source-follower based buffer. Based on simulation results, the bootstrapped switch structure proves necessary for not limiting the linearity of the $500mV_{pp}$ swing 6-bit core ADC over the entire 5 GHz input frequency range. Extra cross-coupled OFF dummy transistors are used at the input pair, with the same size as the main Figure 4.11: Temperature dependency of residual unit ADC offset calibrated at $27^{\circ}C$ room temperature. bootstrapped NMOS switches, to partially cancel the feed-through path between source and drain of the sampling switch. These dummy transistors improve the front-end T/H linearity, specifically at high input frequencies. The front-end T/H architecture allows for a large input sampling bandwidth, as the sampling capacitor is just the input capacitance of the pseudo-differential PMOS source-follower buffer stage. This buffer drives the core ADC input capacitance and provides isolation from kick-back noise. Simulation results show a low-frequency gain of -1.9 dB and a 5-GHz -3dB bandwidth for the buffers. Transient simulations also verify that with a 300mV input common-mode voltage and a $500mV_{pp}$ input swing, a linearity better than 6 bits is achieved up to a 5-GHz input bandwidth with a 1.25-GHz sample clock. On-chip buffering of the reference and common-mode voltages, generated off-chip, is also performed with similar PMOS source follower stages. Figure 4.12: Simplified metastability detection and correction block diagram and algorithm. # 4.1.3.4 Multi-Phase Sampling Clock Generation and Calibration Eight equally spaced sampling phases for the front-end T/Hs are generated from an input 5-GHz differential clock, as shown in Fig. 4.14. A pseudo-differential self-biased input stage buffers the 5-GHz differential clock to drive a divide-by-4 stage. Utilizing four symmetric clocked SR latches [72] in a loop creates eight 1.25-GHz clock phases spaced at 100ps. A sinewave-input FFT-based foreground method [68] is used to digitally control MOS capacitor arrays in the per-phase distribution network to calibrate the phase mismatches between the eight critical sampling phases. Fig. 4.10 shows the clock skew calibration setup, where the optimum calibration code for each sampling phase is obtained using a successive approximation algorithm. Measurement results verify that the clock skew calibration has a resolution of about 0.4ps and allows for a maximum tuning range of 39ps per phase. This is sufficient to compensate for the Figure 4.13: Front-end T/H schematic with dummy OFF switches for high-frequency input feed-through cancellation. mismatch $\sigma \sim 6$ ps between consecutive sampling phases observed in Monte Carlo simulations of the clock input buffer, divider, and distribution network. # 4.1.4 Experimental Results A chip micrograph of the prototype 6b ADC, which was fabricated in a GP 65-nm CMOS process and occupies a total active area of $0.52mm^2$ , is shown in Fig. 4.15. The core time-interleaved ADC, consisting of eight sub-ADCs that each have eight parallel unit SAR ADCs, occupies $0.33mm^2$ . In order to minimize the critical MSB delay path for DFE operation at 10-Gb/s, the order of the unit ADCs in each sub-ADC is optimized to decrease the maximum distance between consecutive ADCs. This maximum distance is about 400m length, which adds a $\sim 70fF$ capacitive load due to routing. An inverter chain drives this load, while meeting the 100ps critical delay path including the 1-tap DFE MUX. Routing from the sampling clocks phase generator and the parasitic capacitance on the input lines is minimized by placing the eight front-end T/Hs close together in the vicinity of the differential input pads. Figure 4.14: Front-end T/Hs sampling clocks generation, distribution, and calibration network. Also, splitting the global reference and common-mode voltage buffers equally on the top and bottom of the core ADC layout improves the symmetry among the unit ADCs. Local decoupling capacitors in each unit ADC reduce the impact of kickback noise on the reference and common-mode voltages, routed from the two sets of on-die global source-follower based buffers, to an acceptable level for a 6-bit ADC. The custom designed board for testing the 10GS/s 6-bit 64-way time-interleaved ADC is shown in Fig. 4.16. The 65nm CMOS die is packaged in a $10mm \times 10mm$ open cavity 72-pin QFN package. The chip is soldered on the bottom side of the PCB to directly route the 5Gb/s half-rate ADC outputs traces to the vertical SMAs Figure 4.15: Prototype ADC chip micrograph and core ADC floorplan. without creating an open stub, hence, decreasing the undesired reflections at the interface between PCB traces and vertical SMA connectors. #### 4.1.4.1 Core ADC Characterization In characterizing the general performance of the 6-bit ADC, both the DFE coefficient $\alpha$ and FFE coefficient $\beta$ are set to zero. After calibrating the offset errors among the 64 time-interleaved unit ADCs and the phase errors of the eight sampling clocks, the dynamic performance of the full time-interleaved ADC at 10-GHz sampling frequency is shown in Fig. 4.17. A low input frequency maximum SNDR of 29.19dB is achieved, primarily limited by nonlinearity in the unit ADCs, which translates to an effective number of bits (ENOB) of 4.56-bits. The ADC achieves an effective resolution bandwidth (ERBW) of 4.53-GHz, with a 4.03-bits ENOB at this Figure 4.16: Custom test board for the prototype 10GS/s ADC implemented in a GP 65nm CMOS process. ERBW. Fig. 4.18 shows the frequency spectrum of the 10-GS/s ADC output using an $\sim 2.4994$ GHz input frequency for three cases, before calibration, after only offset calibration, and after both offset and clock skew calibrations. Before calibration, both the distortion harmonics due to offset mismatch, located at $kf_s/64$ , and phase mismatch, located at $kf_s/64\pm f_{in}$ (k=1,2,...,32), limit the performance. Performing only offset calibration provides a marginal 1.9dB improvement in SNDR. However, after calibrating for both offset and sampling clock skew, the distortion harmonics due to offset and phase mismatches are non-dominant, and the ADC performance is limited by the nonlinearity of the core ADC and the raised uniform noise floor due to the equipment-limited sampling clock jitter. A sinewave histogram technique [64] is utilized for static characterization. Fig. Figure 4.17: ADC SNDR and SFDR vs. input frequency at $f_s = 10$ GHz. 4.19 shows that, with a 9.746 MHz input at 10-GS/s, the maximum DNL and INL values for the 6-bit ADC are +0.19/0.15 LSB and +0.65/0.23 LSB, respectively. # 4.1.4.2 Embedded Equalization Characterization The range and resolution of the embedded FFE are extracted by averaging the ADC output variation as a function of the 5-bit FFE second tap coefficient $\gamma = B_1B_2B_3B_4B_5$ with a maximum DC input voltage $V_{in} = 0.25V$ for the 500 $mV_{pp}$ input range. As shown in Fig. 4.20(a), since the second FFE tap is hardwired to subtract from the main cursor as a high-pass filter, the ADC output variation starts from 0 for $\gamma = (00000)_2 = 0$ and linearly decreases to more negative values as the coefficient reaches its maximum $\gamma = (11111)_2 = 31$ . The maximum ADC output variation is about 8 LSB, for a maximum 25% range for the second FFE tap relative to the main cursor. While the coefficient maximum range is limited by the $\sim 40 \ fF$ $C_{ip}$ , consisting of the comparator input devices, DAC capacitance to substrate, and wire capacitance, the linear transfer characteristic allows the 5-bit FFE tap coefficient Figure 4.18: 10-GS/s ADC normalized output spectrum for $f_{in} = 2.4994$ GHz using a 16k-point FFT: (a) before calibration, (b) after only offset calibration, and (c) after offset and clock skew calibration. Figure 4.19: DNL/INL plots with $f_{in} = 9.746$ MHz at $f_s = 10$ GHz. to achieve a resolution about four times smaller than the core 6-bit ADC. A similar procedure is utilized to extract the range and resolution of the embedded 1-tap DFE, but with two DC input cases of $V_{in} = 0.25V$ and $V_{in} = 0.25V$ , i.e. the extremes of the 500 $mV_{pp}$ input range. As shown in the right-half of Fig. 4.20(b), for $V_{in} = 0.25V$ , the MSB should resolve to one and the DFE coefficient should subtract from the input voltage, resulting in the averaged ADC output code linearly decreasing as the DFE coefficient is increased. With $V_{in} = 0.25V$ the DFE coefficient should effectively add to the input voltage, and in the left-half of Fig. 4.20(b) the averaged ADC output code linearly increases as the absolute value of the DFE coefficient is increased. A similar range of $\sim 25\%$ of the ADC maximum input range is observed for the embedded DFE coefficient, with the linear transfer characteristic also displaying a resolution better than the 6-bit ADC. In order to verify the functionality of the embedded equalization schemes, a 10-Gb/s $2^{10} - 1$ PRBS input is passed through a 10" FR4 channel (channel 2 from Fig. 4.2) from a Centellax PCB12500 transmit module and the output of the prototype 6-bit ADC is measured using the test setup shown in Fig. 4.21. The mid-point digitized Figure 4.20: Measured tap coefficient range and resolution using DC input voltages for embedded (a) FFE 2nd tap, and (b) 1-tap DFE. eye diagram at the ADC output after reconstruction of the digital 6-bit output word is shown in Fig. 4.22 without and with embedded equalization enabled. Due to ISI, disabling the ADC embedded equalization results in a closed eye and all 64 codes being present. Independently activating the 1-tap DFE and 2-tap FFE results in an eye opening of 9-LSB and 15-LSB, respectively. Enabling both embedded FFE and DFE improves the eye opening to 19-LSB, which verifies the effectiveness of the proposed implementation. BER measurements are also performed on the three 6", 10" and 15" FR4 channels from Fig. 4.2 in order to further verify the embedded equalization operation. The Figure 4.21: Embedded equalization characterization test setup. BER bathtub curves of Fig. 4.23 are produced with a $500mV_{ppd}$ $2^{10}-1$ PRBS input without any transmit equalization applied to the channel and the MSB output of the ADC fed back to the Centellax PCB12500. For the case when no equalization is embedded in the ADC, only the relatively low-loss 6" channel displays an open eye with $\sim 0.3$ -UI timing margin at a BER $< 10^{-9}$ . Activating only the 1-tap DFE improves the 6" channel margins and opens the previously-closed eye for the 10" channel. However, operation is still not possible for the 15" channel due to excessive residual ISI. Activating only the 2-tap FFE allows a more significant improvement, with all three channels displaying open eyes. Enabling both the 2-tap FFE and 1-tap DFE yields the best margins, with a 0.37-UI timing margin achieved with the highest-loss 15" channel. Note that the 25% maximum range of the embedded equalization tap coefficients limits the stand-alone system operation for channels with less than 20dB Nyquist attenuation, where mixed-signal receivers, such as a CTLE followed by a DFE, are generally more energy efficient. While utilizing a subsequent digital equalizer with the presented front-end ADC with embedded FFE should allow for the support of Figure 4.22: Measured digitized 6b ADC output (a) without equalization, (b) with only 1-tap embedded DFE, (c) with only 2-tap embedded FFE, and (d) with both embedded FFE and DFE, for a 10-Gb/s $2^{10} - 1$ PRBS input over a 10-inch FR4 channel. higher loss channels, this was beyond the scope of the presented work. In order to allow the stand-alone ADC with embedded equalization to support higher-loss channels, a solution to increase the equalization taps' range relative to the main cursor is to sample the main cursor on the bottom plate of the switched-capacitor sampling network in each unit ADC. Due to the parasitic capacitance at the comparator input, this attenuates the main cursor in a similar manner as the DFE tap and second FFE tap, which can ideally increase the maximum achievable tap coefficient range to near 100% of the main cursor. The authors are currently implementing this solution in a future ADC-based receiver prototype. Figure 4.23: Measured bathtub curves without and with embedded equalization for a 10-Gb/s $2^{10}-1$ PRBS input over (a) 6-inch FR4, (b) 10-inch FR4, and (c) 15-inch FR4 channels, with channel frequency responses shown in Fig. 4.2(a). Figure 4.24: 10 GS/s ADC power breakdown. ## 4.1.5 Performance Summary The 10-GS/s ADC with embedded equalization consumes 79.1mW, with the power breakdown shown in Fig. 4.24. The core TI-ADC consumes the majority of the power, followed by the front-end T/Hs and reference/common-mode buffers, and the phase generator power of the input clock buffer, phase generator block, and distribution network. Table 4.1 summarizes the main specifications and compares this work with previously reported CMOS ADCs with sampling rates around 10 GHz. To the best of our knowledge, this is the first 10-GS/s ADC with combined embedded FFE and DFE functionality. The figure of merit (FOM) for the prototype ADC (also known as Walden's FOM [73]) results in a 0.48 pJ/conv.-step, considering the ENOB at ERBW. Performance comparable to the ADCs in [74–78], which do not include any equalization functionality, is obtained. While the advanced flash-ADC architecture of [78] achieves a better FOM, the presented dual-supply design offers the potential for lower-voltage operation. Compared to the designs in [4] and [5], which are ex- amples of state-of-the-art ADC-based receivers, the proposed ADC with embedded 2-tap FFE and 1-tap DFE achieves a better ADC FOM while also including the low-overhead embedded equalization schemes. ### 4.1.6 Conclusion This section presented a 10-GS/s 6-bit ADC which efficiently incorporates both a novel 2-tap embedded FFE and a 1-tap embedded DFE. Statistical bit error rate (BER) modeling results of ADC-based receivers show that an ADC with embedded equalization can provide both voltage and timing margin improvements for FR4 channels. These equalization functions are embedded in the capacitive DAC of a time-interleaved SAR ADC, with the FFE post-cursor tap efficiently implemented in the reference DAC, and a redundant cycle technique employed to relax the DFE critical feedback timing path. Measurements verify that the embedded equalization circuitry provides improved timing margins over several FR4 channels. While the maximum embedded equalization coefficient range limits system operation to channels with less than 20dB Nyquist attenuation, the authors are currently investigating alternative unit ADC sampling schemes for support of 30+dB attenuation channels. Leveraging the proposed ADC with embedded equalization design techniques in wireline receivers has the potential to allow for reductions in ADC resolution and digital equalization complexity. Table 4.1: 64-Way 10GS/s 6-Bit ADC Performance Comparison | Specification | Nazemi'08<br>[74] | Verma'13<br>[75] | Chung'09<br>[76] | Chammas'11 [77] | Yang'13<br>[78] | Zhang'13<br>[4] | Chen'12 [5] | This Work [70] | |--------------------------------|-------------------|------------------|------------------|-----------------|-----------------|--------------------------------|-----------------------|---------------------| | CMOS Technology | 90-nm | 40-nm | 65-nm | 65-nm | 65-nm | 40-nm | 65-nm | 65-nm | | Supply Voltage (V) | N/A | 0.9 | 1.1 | 1.1 | 1.2 | N/A | 1.1 | 1.1/0.9 | | ADC Structure | TI-Pipelined | TI-Flash | Flash | TI-Flash | TI PA-Flash | TI-Flash | TI-Flash | TI-SAR | | Equalization | No | No | No | No | No | No | HPF+FFE | Embedded<br>FFE+DFE | | Input Range $(mV_{pp})$ | N/A | N/A | 800 | 590 | N/A | N/A | 600 | 500 | | Resolution (bit) | 6 | 6 | 4.5 | 5 | 6 | 6 | 4 | 6 | | Sampling Rate (GS/s) | 10.3 | 10.3 | 7.5 | 12 | 10 | 8.5 – 11.5 | 10 | 10 | | ERBW (GHz) | 4 | >6 | >6 | 6.5 | 5 | 5 | N/A | 4.53 | | ENOB @ERBW (bit) | 5.1 | 5.1 | 3.8 | 3.88 | 5 | 4.56 | N/A | 4.03 | | Power (mW) | 1600 | 240 | 52 | 81 <sup>†</sup> | 83 | 195 | $93^{\dagger\dagger}$ | 79 | | FOM (pJ/convstep) | 4.52 | 068 | 0.497 | 0.46 | 0.26 | 0.59 | N/A | 0.48 | | Active Area (mm <sup>2</sup> ) | N/A | 0.27 | 0.01 | 0.44 | 0.2 | $0.82^{\dagger\dagger\dagger}$ | 0.29 | 0.52 | <sup>†</sup>Excluding input clock buffers. ††This value includes the analog front-end power. ††This is the whole dual-path receiver area including the front-end CTLE and slicer for the second path. ### 4.2 A 6-Bit 10GS/s ADC with Extended-Range Embedded 3-Tap FFE ## 4.2.1 SAR ADC with Extended-Range 3-Tap Embedded FFE The embedded equalization proposed in the previous section has a limited ISI cancellation range of 25% the main cursor value for embedded FFE post-cursor tap coefficient, and $\pm 25\%$ the full-scale voltage for the embedded 1-tap DFE coefficient. This limited range is a result of undesired signal attenuation at the comparator input for the equalization tap coefficients relative to the main cursor which is sampled unattenuated. Fig. 4.25 shows the simplified block diagram of a unit SAR ADC with embedded 2-tap FFE detailed in the previous 10GS/s prototype. Since the second FFE tap is sampled on the bottom plates of DAC capacitors during the sampling phase, the sampled signal value will be attenuated during the conversion cycles due to charge sharing between the total DAC capacitance $C_{DAC,tot}$ and the parasitic capacitance at the comparator input $C_{ip-}$ with the factor $C_{DAC,tot}/[C_{DAC,tot}+C_{ip-}]$ . Since, total DAC capacitance is $\sim 14.4 fF$ using the custom capacitive DAC with $C_u = 0.45 fF$ , this attenuation factor can be small due to comparator input capacitance, DAC parasitic capacitance to substrate, and routings. On the other hand, the main cursor $V_{in,n}$ is sampled on the top plate of $C_S$ without any attenuation. This means that the second FFE tap experiences much more attenuation than the main cursor, which results in a limited ISI cancellation range. Next section explains how this limitation can be resolved using a simple solution. # 4.2.1.1 Embedded 3-Tap FFE Switched-Capacitor Implementation Fig. 4.26(a) shows a simplified single-ended unit ADC schematic to illustrate the switched-capacitor implementation of the 3-tap FFE during the first two phases of the SAR conversion, the sampling phase and the MSB computation. The sampled Figure 4.25: Simplified unit SAR ADC with limited ISI cancellation range for the embedded FFE equalization due to undesired attenuation at the comparator input for the equalization tap coefficients relative to the main cursor. input on $C_S$ acts as the un-attenuated main cursor tap for the embedded FFE. Embedding the pre- and post-cursor FFE taps inside the capacitive DAC structure is achieved with the $B_{1,1}$ to $B_{5,1}$ switches and the $B_{1,-1}$ to $B_{5,-1}$ switches that select between the previous input, next input or GND to provide the $\beta_1$ post-cursor and $\beta_{-1}$ pre-cursor coefficients without impacting the main cursor value. During the sampling cycle $V_{in,n}$ is sampled on the $C_S$ capacitor using bottomplate sampling, while $V_{in,n-1}$ and $V_{in,n+1}$ are sampled on a portion of the DAC capacitors using also bottom-plate sampling, as shown in Fig. 4.26(b). The FFE coefficients $\beta_1$ and $\beta_{-1}$ are defined by 5-bit words $B_{1,1}B_{2,1}B_{3,1}B_{4,1}B_{5,1} = 01001$ and $B_{1,-1}B_{2,-1}B_{3,-1}B_{4,-1}B_{5,-1} = 00010$ in this example to charge the corresponding capacitors with $V_{in,n-1}$ and $V_{in,n+1}$ , respectively, and discharge the remaining DAC capacitors. In the next cycle (Fig. 4.26(c)) the $\Phi_S$ switches are OFF and the bottom-plate of all the DAC capacitors are connected to ground. The resultant charge sharing induces a $\beta_{-1}V_{in,n+1} + \beta_1V_{in,n-1}$ value at the comparator negative input. By having the main cursor value $V_{in,n}$ at the comparator positive input, the voltage $\beta_{-1}V_{in,n+1} + V_{in,n} - \beta_1V_{in,n-1}$ appears at the comparator differential input to emulate the 3-tap FFE, where the pre-cursor and post-cursor tap coefficients are adjustable. Note that while a negative version of the previous input voltage $V_{in,n-1}$ and next input voltage $V_{in,n+1}$ are required in this technique, this is easily available in a fully-differential architecture. According to Fig. 4.26, since the main cursor is also sampled on the bottom-plate of $C_S = C_{DAC,tot}$ like pre-/post-cursor taps, they all experience the same attenuation at the comparator inputs due to parasitic capacitances. Hence, the 3-tap FFE pre-cursor tap coefficient $\beta_{-1}$ and the post-cursor tap coefficient $\beta_1$ normalized to the main cursor tap can be calculated as $$\beta_{-1} = \frac{(B_{1,-1}B_{2,-1}B_{3,-1}B_{4,-1}B_{5,-1})_2}{32} \quad , \quad \beta_1 = \frac{(B_{1,1}B_{2,1}B_{3,1}B_{4,1}B_{5,1})_2}{32} \, , \quad (4.2)$$ where $(.)_2$ represents the binary-to-decimal conversion operator. Extra digitally controlled capacitors are added to the capacitive DAC in order to control the FFE tap coefficient with one more degree of freedom. ### 4.2.2.1 Time-Interleaved Architecture Fig. 4.27 shows the implementation of the SAR ADC with embedded 3-tap FFE in a 10-GS/s 6-bit converter with 32 time-interleaved unit ADCs. The entire 32-way time-interleaved structure consists of eight parallel sub-ADCs, where each sub-ADC operates at $f_s/8 = 1.25GS/s$ and is formed by four parallel unit asynchronous SAR ADCs working at $f_{s,unit} = f_s/32 = 312.5MS/s$ . Each unit ADC has seven operation cycles: one for input/3-tap FFE sampling, and six for asynchronous bit conversions. Eight front-end track-and-holds, one per sub-ADC, are employed to Figure 4.26: Simplified unit SAR ADC with embedded 3-tap FFE: (a) single-ended schematic, and operation during the (b) sampling phase, and (c) first MSB evaluation assuming $B_{1,-1}B_{2,-1}B_{3,-1}B_{4,-1}B_{5,-1} = 00010$ for the pre-cursor tap, and $B_{1,1}B_{2,1}B_{3,1}B_{4,1}B_{5,1} = 01001$ for the post-cursor tap. Figure 4.27: Block diagram of the 32-way time-interleaved asynchronous SAR ADC with embedded 3-tap FFE. allow for the use of only eight critical sampling phases at 1.25-GHz. A differential divide-by-four circuit is used with 5-GHz complementary input clocks to generate the eight phases spaced at 100ps that clock the sub-ADC T/Hs. Digitally-controlled capacitor banks, with a <0.4ps resolution and $\sim30$ ps range, are employed to calibrate timing mismatches in the clock distribution to the T/H blocks. Calibration DACs are included for comparator offset correction, and linear gain calibration in all 32 unit SAR ADCs. ## 4.2.2.2 Unit Asynchronous SAR ADC with Embedded 3-Tap FFE The fully-differential schematic of the 6-bit unit asynchronous SAR ADC with embedded 3-tap sampled FFE is shown in Fig. 4.28. A modified StrongArm comparator with two differential input pairs is used. One input pair is connected to the sampling capacitor, which samples the main cursor. The other input pair is connected to the DAC output, which also implements the FFE pre-cursor and post- cursor taps. The asynchronous operation can be explained as follows. (1) As soon as the comparator's complementary outputs resolve, the asynchronous logic sets the ready signal RDY to '1' and passes it to the SAR logic [67], which starts the DAC operation. (2) The RDY signal resets the comparator clock $\Phi_{CMP}$ to '0'. (3) A low $\Phi_{CMP}$ resets the latch outputs to $V_{DD}$ . (4) After a specific time assigned for the DAC settling, set by a tunable delay element, RDY goes down to '0', which signals $\Phi_{CMP}$ to transition to '1'. (5) Finally, a high $\Phi_{CMP}$ starts the next decision cycle of the comparator, and the whole process repeats again until the LSB is resolved. A merged capacitor switching (MCS) scheme [49], which allows for very low switching energy and reduced area through removing the MSB capacitor, is employed in the DAC of each 6-bit unit SAR ADC. To further reduce DAC area, a custom layout with a 1fF metal-oxide-metal (MOM) unit capacitor $(C_u)$ is employed, as shown in Fig. 4.29. Four stacked minimum width metal layers, metal 4 (MET4) to metal 7 (MET7), with minimum spacing are used, resulting in the optimum desired capacitance value with respect to the bottom-plate parasitic capacitance to the substrate. Both matching and noise performance are considered in the selection of the unit capacitor value. Half size dummy capacitors ( $C_{dum,unit} = 0.5 fF$ ) are added between the DAC's main capacitor fingers by halving the finger length. The top plate of all dummy capacitors are connected to the DAC's output node, while the bottom plates are controlled in a binary weighted fashion by switches: floated for switch OFF and connected to comparator common-mode voltage for switch ON. Fig. 4.30 shows the embedded gain calibration range and resolution. A similar structure is also embedded in the input sampling capacitor network, which doubles the gain calibration range for each unit SAR ADC. Figure 4.28: Fully differential schematic of the unit asynchronous SAR ADC with sampled 3-tap embedded FFE. # 4.2.3 Experimental Results A chip micrograph of the prototype 6b 10GS/s ADC, which was fabricated in a GP 65-nm CMOS process, is shown in Fig. 4.31. The core time-interleaved ADC, consisting of eight sub-ADCs that each have four parallel unit asynchronous SAR ADCs, occupies $0.38mm^2$ . Routing from the sampling clocks phase generator and the parasitic capacitance on the input lines is minimized by placing the eight front-end T/Hs close together in the vicinity of the differential input pads. Also, splitting the Figure 4.29: Custom layout of the differential capacitive DAC with 1fF MOM unit capacitors and 4-bit embedded gain calibration. global reference and common-mode voltage buffers equally on the top and bottom of the core ADC layout improves the symmetry among the unit ADCs. Local decoupling capacitors in each unit ADC reduce the impact of kickback noise on the reference and common-mode voltages, routed from the two sets of on-die global source-follower based buffers, to an acceptable level for a 6-bit ADC. The custom designed board for testing the 10GS/s 6-bit 32-way time-interleaved ADC is shown in Fig. 4.32. The 65nm CMOS die is packaged in an open-cavity 72-pin QFN package. Figure 4.30: Embedded gain calibration range and resolution for each capacitive $\mathrm{DAC}$ . Figure 4.31: Prototype ADC chip micrograph. Figure 4.32: Custom test boards for the prototype 10GS/s ADC implemented in a GP 65nm CMOS process. Two separate boards are designed: bias board and high-frequency board connected with ribbon cables for transferring the bias signals, supply voltages, and scan chain control bits Figure 4.33: ADC SNDR and SFDR vs. input frequency at $f_s = 10$ GHz. # 4.2.3.1 Core ADC Characterization In characterizing the general performance of the 6-bit ADC, the FFE coefficients $\beta_1$ and $\beta_{-1}$ are set to zero. After calibrating the offset errors among the 32 time-interleaved unit ADCs and the phase errors of the eight sampling clocks, the dynamic performance of the full time-interleaved ADC at 10-GHz sampling frequency is shown in Fig. 4.33. A low input frequency maximum SNDR of 30.4dB is achieved, which translates to an effective number of bits (ENOB) of 4.75-bits. The ADC achieves an effective resolution bandwidth (ERBW) of $\sim 5$ GHz. ## 4.2.3.2 Embedded Equalization Characterization The range and resolution of the embedded FFE pre-cursor and post-cursor taps are extracted by averaging the ADC output variation as a function of the 5-bit FFE tap coefficients $\beta_1$ and $\beta_{-1}$ with a maximum DC input voltage $V_{in} = 0.5V$ for the $1V_{pp}$ input range, as shown in Fig. 4.34. Since the second FFE tap is hardwired to subtract from the main cursor as a high-pass filter, the ADC output variation Figure 4.34: Measured tap coefficient range and resolution using DC input voltages for embedded (a) FFE pre-cursor tap, and (b) FFE post-cursor tap. starts from 0 for $\beta_1 = (00000)_2 = 0$ and linearly decreases to more negative values as the coefficient reaches its maximum $\beta_1 = (11111)_2 = 31$ . The maximum ADC output variation is about 32 LSB, for a maximum $\sim 100\%$ range for the pre-cursor and post-cursor FFE taps relative to the main cursor. BER measurement setup for verification of the embedded 3-tap FFE over different FR4 channels is shown in Fig. 4.35. BER measurements are performed on three 15", 25" and 30" FR4 channels with frequency profiles shown in Fig. 4.10(a) in order to further verify the embedded equalization operation. The BER bathtub curves of Fig. 4.10(b) are produced with a $1V_{ppd}$ $2^{23} - 1$ PRBS input without any transmit equalization applied to the channel and the MSB output of the ADC fed back to the Figure 4.35: Embedded equalization characterization test setup. Centellax PCB12500. Activating the 3-tap embedded FFE allows a 0.37-UI timing margin over a previously closed eye on the highest-loss 30° channel with -24dB loss at Nyquist. ## 4.2.4 Performance Summary The 10-GS/s ADC with embedded equalization consumes 76mW. The core TI-ADC consumes the majority of the power. Table 4.2 summarizes the main specifications and compares this work with previously reported CMOS ADCs with sampling rates around 10 GHz. The figure of merit (FOM) for the prototype ADC (also known as Waldens FOM [73]) results in a 0.41 pJ/conv.-step, considering the ENOB at ERBW. ## 4.2.5 10Gb/s ADC-Based Receiver with Dynamically-Enabled Digital Equalization Fig. 4.37(a) shows PAM-2 BER bathtub curves for two backplane channels with different attenuations. The low-loss channel has an open eye with a voltage region over which a two-level slicer can reliably detect both 0 and 1 symbols at the required BER. Increased ISI from the high-loss channel causes the received eye to close, where Figure 4.36: (a) FR4 channels under study, and (b) measured bathtub curves with embedded 3-tap FFE for a 10-Gb/s $2^{23}-1$ PRBS input over the three FR4 channels. Table 4.2: Proposed 10GS/s 6-Bit ADCs Performance Comparison | Specification | This Work (1) [70] | This Work (2) [69] | | |----------------------------------|-----------------------|--------------------|--| | CMOS Technology | 65-nm | 65-nm | | | Supply Voltage (V) | 1.1/0.9 | 1.0 | | | ADC Structure | TI SAR | TI Async. SAR | | | Embedded Equalization | 2-Tap FFE + 1-Tap DFE | 3-Tap FFE | | | Input Range $(V_{pp})$ | 0.5 | 1.0 | | | Resolution (bit) | 6 | 6 | | | Sampling Rate (GS/s) | 10 | 10 | | | ERBW (GHz) | 4.53 | $\sim 5$ | | | Max ENOB (bit) | 4.56 | 4.75 | | | Power (mW) | 79 | 76 | | | FOM (pJ/convstep) | 0.48 | 0.41 | | | Core ADC Area (mm <sup>2</sup> ) | 0.33 | 0.38 | | with a slicer threshold set at the nominally-optimal zero level, significant errors are observed. In this case, typical receivers employ equalization on all received symbols to reduce ISI and open the eye to achieve the target BER. However, certain received signal levels have a very low probability of generating an error for a given symbol and do not necessarily require additional equalization. The proposed hybrid ADC-based receiver shown in Fig. 4.37(b) takes advantage of this to save power by employing a three-level detector with programmable thresholds that allows for reliable detection of both 0 and 1 symbols when the received signal falls outside the ambiguous region and dynamically disables the digital equalizer on a per-symbol basis. For symbols which exist in the ambiguous region and cannot be reliably detected, the digital equalizer is dynamically enabled to further remove ISI and achieve the target BER. Combining this technique with embedded FFE in the ADC allows for a significant reduction in digital equalizer power, as the embedded FFE allows for a reduced percentage of symbols in the ambiguous region [8]. The proposed hybrid ADC-based receiver utilizes the 32-way time-interleaved 6- Figure 4.37: (a) Receiver voltage margin BER bathtub curves with low- and highloss channels, and (b) simplified block diagram of the proposed hybrid ADC-based receiver. bit SAR ADC with extended ISI cancellation range 3-tap embedded FFE explained before (Fig. 4.27). Following the front-end ADC is a dynamically-enabled digital equalizer, consisting of a 4-tap FFE and 3-tap DFE, which further equalizes any unreliable symbols. The die micrograph of the proposed hybrid ADC-based receiver, fabricated in a GP 65nm CMOS process was previously shown in Fig. 4.31. The core time-interleaved ADC and digital equalizer occupy $0.38mm^2$ and $0.39mm^2$ , respectively, with other circuitry, such as the T/Hs, clock phase generation, reference buffers, and interface re-timing blocks bringing the total area to $0.81mm^2$ . 10Gb/s PRBS data is passed through various FR4 channels from a Centellax PCB12500 transmit module and the proposed receivers digital equalizer output is fed back to the BERT for performance characterization ((Fig. 4.35). Here no transmit equalization is used, with the embedded FFE in the ADC and the dynamically-enabled digital equalizer making up all the equalization in the system. Fig. 4.38 shows timing margin bathtub curves for four FR4 channels with attenuations ranging from 20.9 to 36.4dB at the 5GHz Nyquist frequency, when the additional 1.5dB loss from the receiver board and package is considered. First considered is the performance with only embedded ADC equalization activated, with both the embedded pre- and post-cursor FFE taps having a range of $\sim 32$ LSB and a resolution of 1LSB (Fig. 4.34). For this case, open eyes with timing margins exceeding 0.3UI are observed for the two lowest-loss channels. However, the two highest-loss channels require collaborative use of both the embedded and digital equalizers in order to obtain an open eye. When the digital equalizer is dynamically enabled on a per-symbol basis, timing margins of 0.2UI and 0.1UI are obtained for the 31.7dB and 36.4dB channels, respectively, at a BER< $10^{-10}$ . Fig. 4.39(a) shows how digital equalizer power is saved with the hybrid ADC-based receiver architecture for seven FR4 channels with attenuation ranging from 12.1dB to 36.4dB. For channels with up to 25dB attenuation, the embedded equalizer alone opens the eye, translating into the digital equalizer being disabled 100% of the time and ideally all the digital equalizer power saved. When the power overhead due to the enable latches and threshold detector switching and leakage currents is considered this slightly degrades to more than 80% power savings. For higher attenuation channels when the digital equalizer is enabled, the hybrid architecture achieves digital equalizer power savings of around 75% for up to 36.4dB channel attenuation. The ADC, T/Hs, and clock phase generation dissipate 79mW, and the Figure 4.38: (a) FR4 channels frequency response. (b) Received BER bathtub curves after the front-end ADC using only the embedded 3-tap FFE. Receiver BER bathtub curves with only embedded equalization and combined embedded plus digital equalization for (c) a 35" FR4 channel, and (d) a 40" FR4 channel. digital equalizer consumes 38mW as shown in Fig. 4.39(b), out of which more than 30mW can be saved by the dynamic-enabling of the hybrid architecture. Table 4.3 compares this work with other ADC-based receivers near 10Gb/s [2,4,5]. The presented receiver is able to support operation over the highest loss channel among these designs, while also providing significant power savings in the digital equalizer. Figure 4.39: (a) Hybrid ADC-based receiver digital equalizer power savings vs. channel attenuation (BER $< 10^{-10}$ ), and (b) receiver power breakdown. ## 4.2.6 Conclusion This section presented a 10-GS/s 6-bit ADC which efficiently incorporates a novel 3-tap embedded FFE. The 3-tap FFE pre-cursor and post-cursor tap coefficients are embedded in the capacitive DAC of a time-interleaved SAR ADC. Measurements verify that the embedded equalization circuitry provides improved timing margins over several FR4 channels. Compared to the previous work in [79], which had a limited ISI cancellation range, the maximum embedded equalization coefficient range is extended in this work to be as large as the main cursor. This modification allows the 3-tap embedded FFE to compensate for channels with 24dB Nyquist attenuation. Leveraging the proposed ADC with embedded equalization in a 10Gb/s PAM-2 receiver in 65nm CMOS extends the compensation range up to 36dB Nyquist attenuation, while achieving a state-of-the-art energy efficiency of 8.9 pJ/bit by using a novel dynamic digital equalization enable technique. Table 4.3: Proposed 10Gb/s ADC-Based Receiver Performance Comparison | Specification | Harwood'07 [2] | Chen'12<br>[5] | Zhang'13<br>[4] | This Work [69] | | |-----------------------------|--------------------------|------------------------|-----------------------|--------------------------|---------------------| | CMOS Technology | 65-nm | 65-nm | 40-nm | 65-nm | | | Supply Voltage (V) | N/A | 1.1 | N/A | 1.0 | | | ADC Structure | Flash | Variable VREF<br>Flash | Rectifier<br>Flash | TI Async. SAR | | | Pre-Equalization | 4-Tap FIR<br>@ TX | HPF + 2-Tap<br>FFE | N/A | Embedded<br>3-Tap FFE | | | Post–Equalization | 2-Tap FFE<br>+ 5-Tap DFE | 5-Tap DFE | Adaptive<br>FFE + DFE | 4-Tap FFE<br>+ 3-Tap DFE | | | Input Range $(V_{pp})$ | N/A | 0.6 | N/A | 1.0 | | | Resolution (bit) | 4.5 | 4 | 6 | 6 | | | Sampling Rate (GS/s) | 12.5 | 10 | 8.5–11.5 | 10 | | | Max ENOB (bit) | N/A | N/A | 4.86 | 4.75 | | | Area (mm <sup>2</sup> ) | 0.45 | 0.29 | 0.82 | 0.81 | | | Compensated<br>Channel Loss | -24dB<br>@ 12.5Gb/s | $^{-29}dB$ @ $10Gb/s$ | -34dB<br>@ 10.3Gb/s | -25.3dB<br>@ 10Gb/s | -36.4dB<br>@ 10Gb/s | | ADC Power (mW) | 150 | 93 | 195 | 79 | | | DSP Power (mW) | 85 | 37 | N/A | 8 | 10 | | Energy Efficiency (pJ/bit) | 30.7 | 13 | 19 | 8.7 | 8.9 | ### 5. CONCLUSION AND FUTURE WORK Fig. 5.1 shows the comparison of the proposed 10GS/s 6-bit ADCs in this work against previously reported 10+GS/s general-purpose ADCs in the top three conferences of IEEE Solid-State Society [4, 28, 29, 74, 75, 77, 78, 80–87]. The two proposed 10GS/s time-interleaved ADC prototypes prove to have a competitive performance compared to the previous generic ADCs while including embedded equalization schemes as well. Figure 5.1: ADC performance comparison against previous general purpose ADCs with 10+GS/s sampling rate. #### 5.1 Conclusion ADC-based wireline receivers allow for more complex and flexible digital equalization and DSP relative to mixed-signal receivers. Moreover, digital circuits are less sensitive to PVT variations. However, the main drawback of ADC-based receivers is the large power consumption of the front-end high-speed ADC with a maximum data rate larger than at least double the required data bandwidth, as well as the large power of the following digital equalization and symbol detection at high data rates. Embedding analog equalization in the ADC with low power overhead is introduced in this work as a promising approach to both reduce ADC resolution requirement and digital equalization complexity, allowing for improvements in the overall receiver energy efficiency. Efficient implementations of linear and nonlinear equalization schemes in the high-speed A/D converter was the main focus of this research. In order to achieve the target 10GS/s conversion rate for this research, time-interleaving multiple unit ADCs needs to be employed. Different challenges of a time-interleaved architecture, mainly offset, gain and phase mismatches, are studied carefully, and the correction resolution requirements for each mismatch is derived based on behavioral simulations. In parallel, successive approximation register (SAR) based unit ADC topology is chosen carefully compared to pipelined and flash topologies in order to achieve the best energy efficiency. Besides, SAR architecture provides advantages over its rivals for conveniently embedding linear and non-linear partial equalization, as is the main goal for this research. Three prototypes with different data rates and equalization complexities are designed in this work to prove the effectiveness of partial embedded equalization in high data rate wireline receivers. The first prototype, a 1.6GS/s 16-way time-interleaved SAR ADC with embedded 1-tap DFE suitable for high-speed link applications is presented first. The proposed redundant cycle technique allows embedding DFE with low power and area overheads inside a SAR ADC, while providing the same relaxed critical delay path for the 1-tap DFE similar to a loop-unrolled DFE structure. The 1.6GS/s 6-bit ADC with redundant cycle 1-tap embedded DFE is fabricated in an LP 90nm CMOS process in $0.24mm^2$ area, and consumes 20.1mW total power while achieving a FOM = 0.58pJ/conv.-step. Second prototype presents a 10-GS/s 6-bit 64-way time-interleaved SAR ADC in 65nm CMOS, which efficiently incorporates both a novel 2-tap embedded FFE and a 1-tap embedded DFE. Statistical bit error rate (BER) modeling results of ADC-based receivers show that an ADC with embedded equalization can provide both voltage and timing margin improvements for different FR4 channels. These equalization functions are embedded in the capacitive DAC of a time-interleaved SAR ADC, with the FFE post-cursor tap efficiently implemented in the reference DAC, and a redundant cycle technique employed to relax the DFE critical feedback timing path. Measurements verify that the embedded equalization circuitry provides improved timing margins over several FR4 channels. The maximum embedded equalization coefficient range limits system operation in this prototype to channels with $\sim 16 \, \mathrm{dB}$ Nyquist attenuation. As demonstrated in the next prototype, this issue can be resolved using a modified sampling scheme in every unit ADC. Third and last prototype presents a 10-GS/s 6-bit time-interleaved ADC with 32 parallel asynchronous unit SAR ADCs incorporating a 3-tap embedded FFE in 65nm CMOS. The main limitation of the previous prototype stems from the large signal attenuation at the comparator input due to the large parasitic capacitance at the capacitive DAC output relative to the total DAC capacitance after sampling on the DAC capacitors' bottom-plates during the sampling phase. However, the main cursor value is sampled unattenuated on the sampling capacitor's top-plate in the previous prototype. This results in only $\sim 25\%$ FFE post-cursor tap coefficient maximum range relative to the main cursor magnitude. This issue is resolved in the current prototype with 3-tap embedded FFE by sampling the main-cursor on the bottompate of the sampling capacitor as well, hence, experiencing a similar attenuation as the FFE pre-cursor and post-cursor tap coefficients embedded inside the capacitive DAC in each unit ADC. This way the maximum range of the ISI pre-/post-tap coefficients extends to $\sim 100\%$ of the main cursor magnitude, which in return extends the maximum ISI cancellation capability of the embedded equalization. The performance of the proposed 10GS/s ADC with embedded extended-range 3-tap FFE is verified over multiple FR4 channels, which proves to compensate for channel loss up to 24dB at Nyquist using 10Gb/s NRZ pseudo-random input as a stand-alone system. Furthermore, this ADC is used in a hybrid 10Gb/s ADC-based receiver as the front-end stage. This receiver dynamically enables the digital equalizer on a per-symbol basis if the signal after the ADC using 3-tap embedded FFE is still not reliable. The dynamic power saving technique saves more than 30mW of the digital equalization power consumption. The extra digital equalization extends the compensation range up to 36dB Nyquist attenuation, while achieving a state-of-the-art 8.9 pJ/bit energy efficiency. Leveraging the proposed ADCs with low-overhead embedded linear and nonlinear equalization design techniques has been proved potentially effective in the next generation high data rate wireline receivers targeting 30+dB attenuation channels. #### 5.2 Recommendations for Future Work Fig. 5.2(a) shows the simplified diagram of the proposed hybrid ADC-based receiver [8], [69]. The proposed ADC-based receiver saves power by employing a three-level digital detector with programmable thresholds that allows for reliable detection of both '0' and '1' symbols when the received signal falls outside the ambiguous region and dynamically disables the digital equalizer on a per-symbol basis. For symbols which exist in the ambiguous region and cannot be reliably detected, the digital equalizer is dynamically enabled to further remove ISI and achieve the target BER. Combining this technique with embedded FFE in the ADC allows for a significant reduction in digital equalizer power, as the embedded FFE allows for a reduced percentage of symbols in the ambiguous region [69]. ## 5.2.1 Hybrid RX with Dynamically-Enabled Front-End ADC As a future follow-up work, a potential modification on this architecture to further improve the energy efficiency of the receiver is shown in Fig. 5.2(b). In this modification, the three-level threshold detector is moved to the front-end to process the analog input signal. The dynamic enabling and disabling of the digital equalizer can still be performed like before. Although, threshold detection in the analog domain, especially for a time-interleaved structure, may add more power than its complete digital counterpart, here the detector output can be used to decrease the resolution of the front-end quantizer adaptively. When the received signal falls outside the ambiguous region, the detector sets the quantizer resolution as 1-bit, and disables the digital equalization, while only the embedded equalization is enough for making a reliable symbol decision. For symbols that exist in the ambiguous region, the full resolution of the quantizer is used, and the digital back-end equalizer is enabled to achieve the required performance. Since, for common channels, most of the time the decision is reliable just after using the embedded equalization, a very high energy efficiency can potentially be achieved by reducing the quantizer resolution utilizing a reconfigurable-resolution front-end ADC. As the data rates go higher, this modification can save considerable power in the sense that the front-end ADC will have a large contribution in the total receiver power. Figure 5.2: Simplified block diagrams for (a) hybrid ADC-based RX with dynamically enabled digital equalizer, and (b) hybrid RX with dynamically enabled front-end ADC and digital equalizer. Decreasing the ADC resolution right after the detection may not be trivial in a flash ADC architecture, since the full resolution is resolved simultaneously, and the detector itself may have almost the same delay as the full quantizer. However, a SAR ADC or an algorithmic ADC can simply include this feature, since by nature these architectures resolve only partial resolution in each cycle, and the quantizer can be shut down after detecting a reliable symbol. ### REFERENCES - [1] V. Stojanovic, "Channel-limited high-speed links: Modeling, analysis and design," Ph.D. dissertation, Stanford University, Stanford, CA, Sep. 2004. - [2] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, "A 12.5Gb/s SerDes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2007, pp. 436–437. - [3] D. Crivelli, M. Hueda, H. Carrer, M. del Barco, R. Lopez, P. Gianni, J. Finochietto, N. Swenson, P. Voois, and O. Agazzi, "Architecture of a single-chip 50 Gb/s DP-QPSK/BPSK transceiver with electronic dispersion compensation for coherent optical channels," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 4, pp. 1012–1025, Apr. 2014. - [4] B. Zhang, A. Nazemi, A. Garg, N. Kocaman, M. Ahmadi, M. Khanpour, H. Zhang, J. Cao, and A. Momtaz, "A 195mW / 55mW dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40nm CMOS," in ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2013, pp. 34–35. - [5] E.-H. Chen, R. Yousry, and C.-K. Yang, "Power optimized ADC-based serial link receiver," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, Apr. 2012. - [6] J. Kim, E.-H. Chen, J. Ren, B. Leibowitz, P. Satarzadeh, J. Zerbe, and C.-K. - Yang, "Equalizer design and performance trade-offs in ADC-based serial links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 9, pp. 2096–2107, Sep. 2011. - [7] R. Narasimha, M. Lu, N. Shanbhag, and A. Singer, "BER-optimal analog-to-digital converters for communication links," *IEEE Trans. Sig. Proc.*, vol. 60, no. 7, pp. 3683–3691, Jul. 2012. - [8] A. Shafik, K. Lee, E. Zhian Tabasy, and S. Palermo, "Embedded equalization for ADC-based serial I/O receivers," in *IEEE Electr. Perform. Electron. Packaging* Syst., San Jose, CA, Oct. 2011, pp. 139–142. - [9] T.-C. Lee and B. Razavi, "A 125-MHz CMOS mixed-signal equalizer for gigabit Ethernet on copper wire," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, May 2001, pp. 131–134. - [10] J. Liu and X. Lin, "Equalization in high-speed communication systems," *IEEE Circuits Syst. Mag.*, vol. 4, no. 2, pp. 4–17, Sep. 2004. - [11] S. Hoyos, J. Garcia, and G. Arce, "Mixed-signal equalization architectures for printed circuit board channels," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 2, pp. 264–274, Feb. 2004. - [12] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, S. Wu, J. Powers, M. Erdogan, A.-L. Yee, R. Gu, L. Wu, Y. Xie, B. Parthasarathy, K. Brouse, W. Mohammed, K. Heragu, V. Gupta, L. Dyson, and W. Lee, "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2646–2657, Dec. 2005. - [13] A. Varzaghani and C.-K. Yang, "A 4.8 GS/s 5-bit ADC-based receiver with embedded DFE for signal equalization," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 901–915, Mar. 2009. - [14] S. Kasturia and J. Winters, "Techniques for high-speed implementation of non-linear cancellation," *IEEE J. Sel. Areas Commun.*, vol. 9, no. 5, pp. 711–717, Jun. 1991. - [15] E. Zhian Tabasy, A. Shafik, S. Huang, N. Yang, S. Hoyos, and S. Palermo, "A 6b 1.6GS/s ADC with redundant cycle 1-tap embedded DFE in 90nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, Sep. 2012, pp. 1–4. - [16] J. Kang, D. Lin, L. Li, and M. Flynn, "A reconfigurable FIR filter embedded in a 9b successive approximation ADC," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, Sep. 2008, pp. 711–714. - [17] D. Lin, L. Li, S. Farahani, and M. Flynn, "A flexible 500 MHz to 3.6 GHz wireless receiver with configurable DT FIR and IIR filter embedded in a 7b 21 MS/s SAR ADC," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 12, pp. 2846–2857, Dec. 2012. - [18] W. Black and D. Hodges, "Time interleaved converter arrays," *IEEE J. Solid-State Circuits*, vol. 15, no. 6, pp. 929–938, Dec. 1980. - [19] S. Jamal, D. Fu, M. Singh, P. Hurst, and S. Lewis, "Calibration of sample-time error in a two-channel time-interleaved analog-to-digital converter," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 1, pp. 130–139, Jan. 2004. - [20] E. Alpman, "A 7-bit 2.5GS/sec time-interleaved C-2C SAR ADC for 60GHz multi-band OFDM-based receivers," Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, Aug. 2009. - [21] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, and K. Kobayashi, "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Trans. Circuits Syst. I, Fundamental Theory and Applications*, vol. 48, no. 3, pp. 261–271, Mar. 2001. - [22] Y.-C. Jenq, "Digital spectra of nonuniformly sampled signals: Fundamentals and high-speed waveform digitizers," *IEEE Trans. Instrum. Meas.*, vol. 37, no. 2, pp. 245–251, Jun. 1988. - [23] N. Da Dalt, M. Harteneck, C. Sandner, and A. Wiesbauer, "On the jitter requirements of the sampling clock for analog-to-digital converters," *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.*, vol. 49, no. 9, pp. 1354–1360, Sep. 2002. - [24] B. Razavi, Principles of Data Conversion System Design. New York: IEEE Press, 1995. - [25] M. Mousazadeh, K. Hadidi, and A. Khoei, "A novel open-loop high-speed CMOS sample-and-hold," ELSEVIER Int. J. Electron. Comm., vol. 62, no. 8, pp. 588–596, Sep. 2008. - [26] K. Hadidi, M. Sasaki, T. Watanabe, D. Muramatsu, and T. Matsumoto, "An open-loop full CMOS 103 MHz -61 dB THD S/H circuit," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, May 1998, pp. 381–383. - [27] K. Hadidi, D. Muramatsu, T. Oue, and T. Matsumoto, "A 500MS/sec -54dB THD S/H circuit in a 0.5 μm CMOS process," in Proc. European Solid-State Circuits Conference, Duisburg, Germany, Sep. 1999, pp. 158–161. - [28] P. Schvan, J. Bach, C. Fait, P. Flemke, R. Gibbins, Y. Greshishchev, N. Ben-Hamida, D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, "A 24GS/s 6b - ADC in 90nm CMOS," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2008, pp. 544–634. - [29] Y. Greshishchev, J. Aguirre, M. Besson, R. Gibbins, C. Falt, P. Flemke, N. Ben-Hamida, D. Pollex, P. Schvan, and S.-C. Wang, "A 40GS/s 6b ADC in 65nm CMOS," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2010, pp. 390–391. - [30] M. Dessouky and A. Kaiser, "Input switch configuration suitable for rail-to-rail operation of switched opamp circuits," *Electron. Lett.*, vol. 35, no. 1, pp. 8–10, Jan. 1999. - [31] H. Hagiwara, M. Kumazawa, S. Takagi, M. Furihata, M. Nagata, and T. Yanagisawa, "A monolithic video frequency filter using NIC-based gyrators," *IEEE J. Solid-State Circuits*, vol. 23, no. 1, pp. 175–182, Feb. 1988. - [32] T. Sato, S. Takagi, N. Fuji, Y. Hashimoto, K. Sakata, and H. Okada, "4-Gb/s track and hold circuit using parasitic capacitance canceller," in *Proc. European Solid-State Circuits Conference*, Leuven, Belgium, Sep. 2004, pp. 347–350. - [33] T. Sato, S. Takagi, M. Fujii, Y. Hashimoto, K. Sakata, and H. Okada, "Feedforward-type parasitic capacitance canceler and its application to 4 Gb/s T/H circuit," in *Proc. IEEE Int. Symp. Circuits Syst.*, vol. 6, Kobe, Japan, May 2005, pp. 5561–5564. - [34] J. Ramirez-Angulo, R. Carvajal, and A. Torralba, "Low supply voltage high-performance CMOS current mirror with low input and output voltage requirements," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 51, no. 3, pp. 124–129, Mar. 2004. - [35] R. Carvajal, J. Ramirez-Angulo, A. Lopez-Martin, A. Torralba, J. Galan, A. Carlosena, and F. Chavero, "The flipped voltage follower: a useful cell for low-voltage low-power circuit design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 7, pp. 1276–1291, Jul. 2005. - [36] D. A. Johns and K. Martin, Analog Integrated Circuit Design. New York: Wiley, 1997. - [37] B. Ginsburg and A. Chandrakasan, "Dual time-interleaved successive approximation register ADCs for an ultra-wideband receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 247–257, Feb. 2007. - [38] J.-B. Shyu, G. Temes, and K. Yao, "Random errors in MOS capacitors," *IEEE J. Solid-State Circuits*, vol. 17, no. 6, pp. 1070–1076, Dec. 1982. - [39] J. Montanaro, R. Witek, K. Anne, A. Black, E. Cooper, D. Dobberpuhl, P. Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, T. Lee, P. Lin, L. Madden, D. Murray, M. Pearce, S. Santhanam, K. Snyder, R. Stehpany, and S. Thierauf, "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1703–1714, Nov. 1996. - [40] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2007, pp. 314–315. - [41] B. Goll and H. Zimmermann, "A 65nm CMOS comparator with modified latch to achieve 7GHz/1.3mW at 1.2V and 700MHz/47μW at 0.6V," in ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2009, pp. 328–329. - [42] S. Shekhar, J. Jaussi, F. O'Mahony, M. Mansuri, and B. Casper, "Design considerations for low-power receiver front-end in high-speed data links," in *Proc.* - IEEE Custom Integr. Circuits Conf., San Jose, CA, Sep. 2013, pp. 1–8. - [43] S.-H. Cho, C.-K. Lee, S.-G. Lee, and S.-T. Ryu, "A two-channel asynchronous SAR ADC with metastable-then-set algorithm," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 4, pp. 765–769, Apr. 2012. - [44] T. Jiang, W. Liu, F. Zhong, C. Zhong, K. Hu, and P. Chiang, "A single-channel, 1.25-GS/s, 6-bit, 6.08-mW asynchronous successive-approximation ADC with improved feedback delay in 40-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2444–2453, Oct. 2012. - [45] J. McCreary and P. Gray, "All-MOS charge redistribution analog-to-digital conversion techniques—Part I," *IEEE J. Solid-State Circuits*, vol. SC-10, no. 6, pp. 371–379, Dec. 1975. - [46] B. Ginsburg and A. Chandrakasan, "500-MS/s 5-bit ADC in 65-nm CMOS with split capacitor array DAC," IEEE J. Solid-State Circuits, vol. 42, no. 4, pp. 739–747, Apr. 2007. - [47] Y.-K. Chang, C.-S. Wang, and C.-K. Wang, "A 8-bit 500-KS/s low power SAR ADC for bio-medical applications," in *IEEE ASSCC Dig. Tech. Papers*, Jeju, South Korea, Nov. 2007, pp. 228–231. - [48] C.-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin, "A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 731–740, Apr. 2010. - [49] V. Hariprasath, J. Guerber, S.-H. Lee, and U.-K. Moon, "Merged capacitor switching based SAR ADC with highest switching energy-efficiency," *Electron. Lett.*, vol. 46, no. 9, pp. 620–621, Apr. 2010. - [50] B. P. Ginsburg, "Energy-efficient analog-to-digital conversion for ultra-wideband radio," Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, Jul. 2007. - [51] M. Scott, B. Boser, and K. Pister, "An ultralow-energy ADC for smart dust," IEEE J. Solid-State Circuits, vol. 38, no. 7, pp. 1123–1129, Jul. 2003. - [52] H. Johnson and M. Graham, High-Speed Digital Design: A Handbook of Black Magic. Upper Saddle River, NJ: Prentice Hall, 1993. - [53] S. Palermo, CMOS Nanoelectronics Analog and RF VLSI Circuits. Chapter 9: High-Speed Serial I/O Design for Channel-Limited and Power-Constrained Systems. New York: McGraw Hill, 2011. - [54] K. Parhi, "Pipelining in algorithms with quantizer loops," *IEEE Trans. Circuits Syst.*, vol. 38, no. 7, pp. 745–754, Jul. 1991. - [55] O. Agazzi, M. Hueda, D. Crivelli, H. Carrer, A. Nazemi, G. Luna, F. Ramos, R. Lopez, C. Grace, B. Kobeissy, C. Abidin, M. Kazemi, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson, T. Lindsay, and P. Voois, "A 90 nm CMOS DSP MLSD transceiver with integrated AFE for electronic dispersion compensation of multimode optical fibers at 10 Gb/s," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2939–2957, Dec. 2008. - [56] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi, B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, "A 500 mw ADC-based CMOS AFE with digital calibration for 10 Gb/s serial links over KR-backplane and multimode fiber," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1172–1185, Jun. 2010. - [57] J. Lee, M.-S. Chen, and H.-D. Wang, "Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120–2133, Sep. 2008. - [58] I. Dedic, "56 GS/s ADC: Enabling 100GbE," in Proc. IEEE Optical Fiber Commun. Collocated Nat. Fiber Optic Eng. Conf. San Diego, CA: Optical Society of America, Mar. 2010, pp. 1–3. - [59] G. Balamurugan, B. Casper, J. Jaussi, M. Mansuri, F. O'Mahony, and J. Kennedy, "Modeling and analysis of high-speed I/O links," *IEEE Trans.* Adv. Pack., vol. 32, no. 2, pp. 237–247, May 2009. - [60] Stateye. [Online]. Available: http://www.stateye.org - [61] A. Emami-Neyestanak, A. Varzaghani, J. Bulzacchelli, A. Rylyakov, C.-K. Yang, and D. Friedman, "A 6.0-mw 10.0-Gb/s receiver with switched-capacitor summation DFE," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 889–896, Apr. 2007. - [62] B. Goll and H. Zimmermann, "A comparator with reduced delay time in 65-nm CMOS for supply voltages down to 0.65 V," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 11, pp. 810–814, Nov. 2009. - [63] M. Dessouky and A. Kaiser, "Very low-voltage digital-audio $\delta\sigma$ modulator with 88-dB dynamic range using local switch bootstrapping," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 349–355, Mar. 2001. - [64] W. Kester, *The Data Conversion Handbook*. Burlington, MA, USA: Newnes, 2005. - [65] E. Alpman, H. Lakdawala, L. Carley, and K. Soumyanath, "A 1.1v 50mW 2.5GS/s 7b time-interleaved C-2C SAR ADC in 45nm LP digital CMOS," in - ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2009, pp. 76–77. - [66] Z. Cao, S. Yan, and Y. Li, "A 32 mw 1.25 GS/s 6b 2b/Step SAR ADC in 0.13 $\mu$ m CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 862–873, Mar. 2009. - [67] J. Yang, T. Naing, and R. Brodersen, "A 1 GS/s 6 bit 6.7 mW successive approximation ADC using asynchronous processing," *IEEE J. Solid-State Circuits*, vol. 45, no. 8, pp. 1469–1478, Aug. 2010. - [68] E. Zhian Tabasy, A. Shafik, S. Huang, N.-W. Yang, S. Hoyos, and S. Palermo, "A 6-b 1.6-GS/s ADC with redundant cycle one-tap embedded DFE in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1885–1897, Aug. 2013. - [69] A. Shafik, E. Zhian Tabasy, S. Cai, K. Lee, S. Hoyos, and S. Palermo, "A 10Gb/s hybrid ADC-based receiver with embedded 3-tap analog FFE and dynamically-enabled digital equalization in 65nm CMOS," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2015, pp. 1–3. - [70] E. Zhian Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6b 10GS/s TI-SAR ADC with embedded 2-tap FFE/1-tap DFE in 65nm CMOS," in Proc. IEEE Symp. VLSI Circuits, Kyoto, Japan, Jun. 2013, pp. C274–C275. - [71] P. Harpe, C. Zhou, Y. Bi, N. van der Meijs, X. Wang, K. Philips, G. Dolmans, and H. de Groot, "A 26 μw 8 bit 10 MS/s asynchronous SAR ADC for low energy radios," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1585–1595, Jul. 2011. - [72] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. Ming-Tak Leung, "Improved sense-amplifier-based flip-flop: design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876–884, Jun. 2000. - [73] R. Walden, "Analog-to-digital converter survey and analysis," *IEEE J. Selected Areas in Comm.*, vol. 17, no. 4, pp. 539–550, Apr. 1999. - [74] A. Nazemi, C. Grace, L. Lewyn, B. Kobeissy, O. Agazzi, P. Voois, C. Abidin, G. Eaton, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. Posse, S. Wang, and G. Asmanis, "A 10.3GS/s 6bit (5.1 ENOB at Nyquist) time-interleaved/pipelined ADC using open-loop amplifiers and digital calibration in 90nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, Honolulu, Hawaii, Jun. 2008, pp. 18–19. - [75] S. Verma, A. Kasapi, L. min Lee, D. Liu, D. Loizos, S.-H. Paik, A. Varzaghani, S. Zogopoulos, and S. Sidiropoulos, "A 10.3GS/s 6b flash ADC for 10G Ethernet applications," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2013, pp. 462–463. - [76] H. Chung, A. Rylyakov, Z. T. Deniz, J. Bulzacchelli, G.-Y. Wei, and D. Friedman, "A 7.5-GS/s 3.8-ENOB 52-mW flash ADC with clock duty cycle control in 65nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2009, pp. 268–269. - [77] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit time-interleaved flash ADC with background timing skew calibration," *IEEE J. Solid-State Cir*cuits, vol. 46, no. 4, pp. 838–847, Apr. 2011. - [78] X. Yang, R. Payne, and J. Liu, "A 10GS/s 6b time-interleaved ADC with partially active flash sub-ADCs," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, Sep. 2013, pp. 1–4. - [79] E. Zhian Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6 bit 10 GS/s TI-SAR ADC with low-overhead embedded FFE/DFE equalization for - wireline receiver applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2560–2574, Nov. 2014. - [80] C.-C. Huang, C.-Y. Wang, and J.-T. Wu, "A CMOS 6-bit 16-GS/s time-interleaved ADC with digital background calibration," in *Proc. IEEE Symp. VLSI Circuits*, Honolulu, Hawaii, Jun. 2010, pp. 159–160. - [81] K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett, J. Pernillo, C. Tan, and A. Montijo, "A 20GS/s 8b ADC with a 1mb memory in 0.18μm CMOS," in *ISSCC Dig. Tech. Papers*, vol. 1, San Francisco, CA, Feb. 2003, pp. 318–496. - [82] W. Cheng, W. Ali, M.-J. Choi, K. Liu, T. Tat, D. Devendorf, L. Linder, and R. Stevens, "A 3b 40GS/s ADC-DAC in 0.12μm SiGe," in *ISSCC Dig. Tech. Papers*, vol. 1, San Francisco, CA, Feb. 2004, pp. 262–263. - [83] P. Schvan, D. Pollex, S.-C. Wang, C. Falt, and N. Ben-Hamida, "A 22GS/s 5b ADC in 0.13μm SiGe BiCMOS," in ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp. 2340–2349. - [84] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi, B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, "A 500mW digitally calibrated AFE in 65nm CMOS for 10Gb/s serial links over backplane and multimode fiber," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2009, pp. 370–371. - [85] D. Crivelli, M. Hueda, H. Carrer, J. Zachan, V. Gutnik, M. Del Barco, R. Lopez, G. Hatcher, J. Finochietto, M. Yeo, A. Chartrand, N. Swenson, P. Voois, and O. Agazzi, "A 40nm CMOS single-chip 50Gb/s DP-QPSK/BPSK transceiver with electronic dispersion compensation for coherent optical channels," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2012, pp. 328–330. - [86] V.-C. Chen and L. Pileggi, "A 69.5mW 20GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32nm CMOS SOI," in *ISSCC Dig. Tech.* Papers, San Francisco, CA, Feb. 2014, pp. 380–381. - [87] S. Le Tual, P. Singh, C. Curis, and P. Dautriche, "A 20GHz-BW 6b 10GS/s 32mW time-interleaved SAR ADC with master T&H in 28nm UTBB FDSOI technology," in *ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb 2014, pp. 382–383.