## DESIGN OF HIGH-SPEED POWER-EFFICIENT A/D CONVERTERS FOR

## WIRELINE ADC-BASED RECEIVER APPLICATIONS

A Dissertation

by

## SHENGCHANG CAI

## Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

## DOCTOR OF PHILOSOPHY

| Samuel Palermo      |
|---------------------|
| Sebastian Hoyos     |
| Krishna Narayanan   |
| Duncan Walker       |
| Miroslav M. Begovic |
|                     |

May 2018

Major Subject: Electrical Engineering

Copyright 2018 Shengchang Cai

### ABSTRACT

Serial input/output (I/O) data rates are increasing in order to support the explosion in network traffic driven by big data applications such as the Internet of Things (IoT), cloud computing and etc. As the high-speed data symbol times shrink, this results in an increased amount of inter-symbol interference (ISI) for transmission over both severe low-pass electrical channels and dispersive optical channels. This necessitates increased equalization complexity and consideration of advanced modulation schemes, such as four-level pulseamplitude modulation (PAM-4). Serial links which utilize an analog-to-digital converter (ADC) receiver front-end offer a potential solution, as they enable more powerful and flexible digital signal processing (DSP) for equalization and symbol detection and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to process, voltage, and temperature (PVT) variations, benefits from improved area and power with CMOS technology scaling and offers easy design transfer between different technology nodes and thus improved time-to-market. However, ADC-based receivers generally consume higher power relative to their mixed-signal counterparts because of the significant power consumed by conventional multi-GS/s ADC implementations. This motivates exploration of energy-efficient ADC designs with moderate resolution and very high sampling rates to support data rates at or above 50Gb/s.

This dissertation presents two power-efficient designs of  $\geq$ 25GS/s time-interleaved ADCs for ADC-based wireline receivers. The first prototype includes the implementation of a 6b 25GS/s time-interleaved multi-bit search ADC in 65nm CMOS with a soft-decision selection algorithm that provides redundancy for relaxed track-and-hold (T/H) settling and improved metastability tolerance, achieving a figure-of-merit (FoM) of 143fJ/conversion-step and 1.76pJ/bit for a PAM-4 receiver design. The second prototype features the design

of a 52Gb/s PAM-4 ADC-based receiver in 65nm CMOS, where the front-end consists of a 4-stage continuous-time linear equalizer (CTLE)/variable gain amplifier (VGA) and a 6b 26GS/s time-interleaved SAR ADC with a comparator-assisted 2b/stage structure for reduced digital-to-analog converter (DAC) complexity and a 3-tap embedded feed-forward equalizer (FFE) for relaxed ADC resolution requirement. The receiver front-end achieves an efficiency of 4.53bJ/bit, while compensating for up to 31dB loss with DSP and no transmitter (TX) equalization.

#### ACKNOWLEDGEMENTS

It has been a long journey since I started my Ph.D. back in 2012, when I had very little knowledge about what my Ph.D. life was going to be like. The past six years of my Ph.D. life has been tough but successful in many ways, and there are a lot of people to whom I would like to give credit for my success.

First and foremost, I would like to thank my advisor Prof. Palermo for guiding me through my Ph.D. research with exceedingly high caliber and putting faith in me in difficult times. Also, my thanks go to Prof. Hoyos and Prof. Entesari, who has offered me numerous invaluable advices and insights during the course of my graduate study. I thank my other committee members, Prof. Narayanan and Prof. Walker, for all their critical inputs into my projects.

I would like to thank all the senior students in Sam's group. Thank you Ehsan and Ayman for all your guidance during your time at A&M, especially when I first joined the group. Thank you Younghoon and Byungho for hosting me at Freescale during summer 2015. Thank you Osama for bringing me to the Iftar dinner when we were traveling in Kyoto in 2015. Thank you Cheng for accommodating me during my internship at HP lab in summer 2017. Thank you Noah for sharing with me all the interesting Korean facts. My sincere thanks also go to my other colleague students Ashkan, Gihoon and Kunzhi for all our talks and discussions about work and life.

I want to extend my special thanks to my talented project partner Shiva. Our project would never be as successful if it were not for all your outstanding work and the support from you during the worst time in our Ph.D. life. Thank you for making all the tape-out and testing nights memorable with fun Indian facts, GeoGuessr, boomerang, and street food videos, which I will cherish forever.

Last but not least, it would not be possible for me to finish my degree without the unwavering support from my parents, to whom I dedicate this dissertation. Thank you for all you sacrifices and dedication that makes me what I am today!

### CONTRIBUTORS AND FUNDING SOURCES

## Contributors

This work was supported by a dissertation committee consisting of Professor Samuel Palermo, Sebastion Hoyos and Krishna Narayanan of the Department of Electrical and Computer Engineering and Professor Duncan Walker of the Department of Computer Science.

All the work conducted for the dissertation was completed by the student independently.

### **Funding Sources**

Graduate study was supported in part by SRC and NSF under grant number 1836.111, 2583.001 and EECS-1202508.

## TABLE OF CONTENTS

| Pa                                                                                                                                                                                                                                                                                                                                                                                                      | ge                                                 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
| ABSTRACT                                                                                                                                                                                                                                                                                                                                                                                                | ii                                                 |
| ACKNOWLEDGEMENTS                                                                                                                                                                                                                                                                                                                                                                                        | iv                                                 |
| CONTRIBUTORS AND FUNDING SOURCES                                                                                                                                                                                                                                                                                                                                                                        | vi                                                 |
| TABLE OF CONTENTS                                                                                                                                                                                                                                                                                                                                                                                       | vii                                                |
| LIST OF FIGURES                                                                                                                                                                                                                                                                                                                                                                                         | ix                                                 |
| LIST OF TABLES                                                                                                                                                                                                                                                                                                                                                                                          | tii                                                |
| 1. INTRODUCTION                                                                                                                                                                                                                                                                                                                                                                                         | 1                                                  |
| 2. BACKGROUND                                                                                                                                                                                                                                                                                                                                                                                           | 5                                                  |
| <ul> <li>2.1 High-Speed Link Receivers</li></ul>                                                                                                                                                                                                                                                                                                                                                        | 5<br>7<br>10<br>16<br>19<br>20<br>25               |
| 3. 6-BIT 25-GS/S TIME-INTERLEAVED MULTI-BIT SEARCH ADC WITH SOFT DECISION SEARCH ALGORITHM                                                                                                                                                                                                                                                                                                              | 31                                                 |
| <ul> <li>3.1 Soft-Decision Selection Algorithm.</li> <li>3.1.1 Track-and-Holds Settling Error.</li> <li>3.1.2 Metastability Error</li> <li>3.2 ADC Architecture.</li> <li>3.2.1 Time-Interleaved Architecture</li> <li>3.2.2 Unit ADC Structure</li> <li>3.2.3 Front-End Track-and-Holds.</li> <li>3.2.4 Clock Generation and Skew Calibration</li> <li>3.2.5 Shared-Input Double-Tail Latch</li> </ul> | 33<br>34<br>38<br>40<br>40<br>40<br>42<br>43<br>43 |
| 3.4 Conclusion                                                                                                                                                                                                                                                                                                                                                                                          | +3<br>52                                           |

| 4. | 52-GB/S PAM-4 ADC-BASED RECEIVER WITH A 6-BIT 26GS/S TIME-           |
|----|----------------------------------------------------------------------|
|    | INTERLEAVED SAR ADC                                                  |
|    |                                                                      |
|    | 4.1 Receiver Architecture                                            |
|    | 4.2 6-BIT 26GS/s Time-Interleaved SAR ADC with 3-Tap Embedded FFE 54 |
|    | 4.2.1 ADC Architecture                                               |
|    | 4.2.2 Gain-Boosted T/H with FVF                                      |
|    | 4.2.3 Unit 2b/Stage Comparator-Assisted SAR ADC                      |
|    | 4.2.4 Embedded 3-Tap FFE with Non-Binary DAC                         |
|    | 4.2.5 Refence-Scaling Double-Tail Latch with Shared Input Stage      |
|    | 4.3 Measurement Results                                              |
|    | 4.3.1 ADC Characterization                                           |
|    | 4.3.2 Receiver Characterization                                      |
|    | 4.4 Conclusions                                                      |
|    |                                                                      |
| 5. | CONCLUSION AND FUTURE WORK                                           |
|    |                                                                      |
|    | 5.1 Conclusion                                                       |
|    | 5.2 Future Work                                                      |
|    |                                                                      |
| RE | EFERENCES                                                            |

## LIST OF FIGURES

|      | Page                                                                                                                                                    |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1.1  | A high-speed electrical link transceiver with an ADC-DSP receiver                                                                                       |
| 1.2  | Performance of state-of-the-art ADCs with $f_s > 5G/s$                                                                                                  |
| 2.1  | Wireline channels (a) backplane channel model in data center applications (b) $S_{21}$ (frequency dependent loss) of FR4 channels with various length 6 |
| 2.2  | Equivalent model of a wireline transceiver                                                                                                              |
| 2.3  | Example of a sampled pulse response                                                                                                                     |
| 2.4  | Time & frequency domain representation of NRZ and PAM-4 modulation 9                                                                                    |
| 2.5  | Schematic of (a) passive CTLE and (b) active CTLE                                                                                                       |
| 2.6  | Block diagram of an NRZ DFE with (a) direct feedback and (b) loop-unrolling 13                                                                          |
| 2.7  | Block diagram of a 1-tap PAM-4DFE with loop-unrolling                                                                                                   |
| 2.8  | IEEE 400Gb Ethernet taskforce for 802.3bs                                                                                                               |
| 2.9  | Block diagram of PAM-4 receivers (a) mixed-signal receiver architecture<br>(b) ADC-based receiver architecture                                          |
| 2.10 | Performance summary of state-of-the-art PAM-4 receivers (a) channel loss vs. baud rate (b) power efficiency vs. baud rate                               |
| 2.11 | Block diagram of an M-way TI-ADC and TI-error model                                                                                                     |
| 2.12 | Impact of offset error on ADC output spectrum                                                                                                           |
| 2.13 | Impact of gain error on ADC output spectrum                                                                                                             |
| 2.14 | Impact of bandwidth error on ADC output spectrum                                                                                                        |
| 2.15 | Impact of skew error on ADC output spectrum                                                                                                             |
| 2.16 | Block diagram of a flash ADC                                                                                                                            |
| 2.17 | Block diagram and critical timing path of a SAR ADC                                                                                                     |
| 2.18 | Block diagram of a binary search ADC                                                                                                                    |

| 3.1  | (a) Conventional SAR ADC, (b) multi-bit/stage SAR ADC, (c) conventional binary search ADC, and (d) multi-bit/stage binary search ADC                                           |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3.2  | High-speed T/H structure and settling error                                                                                                                                    |
| 3.3  | T/H settling scenario for a conventional binary search algorithm and the proposed soft-decision selection algorithm                                                            |
| 3.4  | Metastability scenario for a conventional binary search algorithm and the proposed soft-decision selection algorithm                                                           |
| 3.5  | (a) SNDR error vs. T/H buffer time constant and (b) SNDR vs. latch time constant for a conventional binary search algorithm and the proposed soft-decision selection algorithm |
| 3.6  | Block diagram of the 8-way time-interleaved binary search ADC with soft-decision selection                                                                                     |
| 3.7  | Front-end T/H schematic                                                                                                                                                        |
| 3.8  | Block diagram of the front-end T/H sampling clocks generation, distribution and calibration                                                                                    |
| 3.9  | Shared-input double-tail three latch structure (a) schematic and (b) Monte<br>Carlo offset simulation results                                                                  |
| 3.10 | Prototype ADC chip micrograph and core ADC floorplan                                                                                                                           |
| 3.11 | Block diagram of foreground offset/reference calibration, clock skew calibration, and metastability measurement setup                                                          |
| 3.12 | ADC SNDR and SFDR vs. input frequency at $f_s = 25$ GHz                                                                                                                        |
| 3.13 | ADC SNDR and SFDR with sampling frequency of fs = 15GS/s and 25GS/s and supply voltage of 0.9V, 1V and 1.1V                                                                    |
| 3.14 | 25GS/s ADC normalized output spectrum for $f_{in} = 12.21$ GHz (a) before<br>and (b) after offset and skew calibration                                                         |
| 3.15 | ADC DNL/INL plots                                                                                                                                                              |
| 3.16 | ADC metastability error rate (MER) measurement                                                                                                                                 |
| 3.17 | ADC performance summary with Walden FoM                                                                                                                                        |

| 4.1  | Block diagram of an ADC-based PAM-4 receiver with CTLE/VGA, a 6-bit TI-SAR ADC, DSP and CDR                         | 4 |
|------|---------------------------------------------------------------------------------------------------------------------|---|
| 4.2  | Block diagram of 32-way 6 bit 26GS/s 2b/stage TI-SAR ADC with embedded 3-tap FFE                                    | 5 |
| 4.3  | Block and timing diagram of phase generator and 8-way front-end T/H $\ldots$ 5.                                     | б |
| 4.4  | T/H circuit showing the gain boosted source follower and the bootstrap switch . 5                                   | 7 |
| 4.5  | Block diagram of unit comparator-assisted 2b/stage SAR ADC with 3-tap embedded FFE                                  | 8 |
| 4.6  | Timing diagram of unit comparator-assisted 2b/stage SAR ADC with3-tap embedded FFE5                                 | 9 |
| 4.7  | Coverage maps of embedded FFE coefficient (a) with a binary weighted FFE DAC (b) with a non-binary weighted FFE DAC | 1 |
| 4.8  | Schematic of a 2b reference scaling double-tail latch with shared input stage 6                                     | 3 |
| 4.9  | Chip micrograph of 52Gb/s ADC-based receive in 65nm CMOS                                                            | 5 |
| 4.10 | SNDR/SFDR vs input frequency with (a) $f_s=13$ GS/s and (b) $f_s=26$ GS/s 6                                         | б |
| 4.11 | INL/DNL plot                                                                                                        | 7 |
| 4.12 | 32Gb/s receiver results (a) BER timing bathtub curves (b) Recovered clock jitter histogram                          | 8 |
| 4.13 | 52Gb/s receiver characterization: BER timing bathtub curves for 25dB and 31dB loss channels                         | 9 |
| 5.1  | Conventional N-bit single-ended asynchronous SAR ADC (a) block diagram and (b) critical timing path                 | 5 |
| 5.2  | RSP technique with a 5-bit single-ended capacitive DAC                                                              | 7 |
| 5.3  | Time-domain settling of VCM-based MSB DAC settling for conventional and RSP schemes                                 | 8 |
| 5.4  | Block diagram of the RSP technique for a 6-bit SAR ADC                                                              | 9 |
| 5.5  | Normalized FoM and fs versus number of bits                                                                         | 0 |

# LIST OF TABLES

|     |                              | Page |
|-----|------------------------------|------|
| 3.1 | ADC performance summary      | . 50 |
| 4.1 | Receiver performance summary | . 71 |

#### 1. INTRODUCTION

Serial I/O data rates are increasing in order to support the explosion in network traffic driven by big data applications such as the Internet of Things (IoT), cloud computing and etc. Serial link applications such as OC-768 (~40Gb/s) for wide area networks (WAN), PCIe 5.0 (32Gb/s) and 400Gb Ethernet (56Gb/s) for local area networks (LAN) and USB 3.2 & Thunderbolt 2 (20Gb/s) for storage [1] demand an aggressive I/O bandwidth per I/O pad/pin due to the fast-growing aggregate I/O bandwidth versus the relatively slow increases in I/O density [2]. As the high-speed data symbol times shrink, this results in an increased amount of ISI for transmission over both severe low-pass electrical channels and dispersive optical channels. This necessitates increased equalization complexity and consideration of advanced modulation schemes, such as PAM-4.

Serial links which utilize an ADC receiver front-end (Fig. 1.1) offer a potential solution, as they enable more powerful and flexible DSP for equalization and symbol detection [3-6] and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to PVT variations, benefits from improved area and power with CMOS technology scaling and offers easy design transfer between different technology nodes and thus improved time-to-market.

However, ADC-based receivers generally consume higher power relative to their mixed-signal counterparts [3-6] because of the significant power consumed by conventional multi-GS/s ADC implementations. This motivates exploration of energy-efficient ADC designs with moderate resolution and very high sampling rates to support data rates at or above 50Gb/s. Previous works, such as [7-9], introduces BER-optimal positioning of ADC thresholds and reconfiguration of ADC resolution based on channel



Figure 1.1: A high-speed electrical link transceiver with an ADC-DSP receiver.



Figure 1.2: Performance of state-of-the-art ADCs with  $f_s > 5$ G/s.

loss to enable power savings from front-end ADCs. However, flash ADC structures employed in these designs to implement variable thresholds with low overhead suffer from inferior power efficiency when compared against successive approximation (SAR) ADC structures, as shown in Fig. 1.2. Another technique [6] incorporates low-overhead embedded FFE in a time-interleaved SAR ADC to provide partial equalization before ADC quantization, therefore relaxing the requirement of ADC resolution and reduce ADC power consumption. Nevertheless, previous embedded FFE design induces extra routing at the T/H output and suffers from limited pre/post tap coefficient coverage when the FFE DAC is shared with SAR reference DAC. In addition, embedded FFE reduces DC gain in the signal path and therefore requires gain compensation from VGA before ADC and makes it challenging to maintain good T/H linearity.

Meanwhile, although conventional specifications for a general-purpose ADC design including static performance such as integral non-linearity (INL) and differential nonlinearity (DNL) as well as dynamic performance such as signal-to-noise-distortion ratio (SNDR) and signal-to-spur-distortion ratio (SFDR) are useful for circuit designers during the design process, they do not translate to the receiver performance specifications, i.e. biterror rate (BER), in a straight-forward manner. In fact, ADC performance impairments such as noise, jitter, mismatch and non-linearity can affect receiver performance (BER) in significantly different ways from ADC SNDR or effective number of bits (ENOB). Therefore, a system model capable of capturing the impact of ADC non-ideality on receiver BER performance is crucial to prevent over-design or under-design of the front-end ADCs.

In order to address the challenges in ADC-based receiver designs mentioned so far, this dissertation is categorized as follows:

Chapter 2 introduces structures and design techniques of high-speed link receivers including modulation schemes (PAM-2 vs. PAM-4), receiver architectures (mixed-signal vs. ADC-based) and receiver equalization techniques (analog & digital equalizers). Design techniques and challenges of high-speed ADCs including time-interleaved structure, high-bandwidth T/Hs and power-efficient unit ADC structures are explained. A statistical modeling framework is briefly introduced to evaluate the impact of ADC non-ideality including non-linearity, mismatch, noise, jitter and metastability on receiver performance and define the ADC design specifications for the prototypes in the rest of the dissertation.

Chapter 3 details a 6b 25GS/s 8-way time-interleaved ADC design with unit ADC employing an energy efficient multi-bit search structure. A soft-decision selection algorithm is introduced to provide redundancy in the ADC, which relaxes T/H bandwidth requirement for time-interleaved (TI) ADCs and improves metastability tolerance. Implementation details including TI-ADC architecture and critical circuit blocks as well as experimental results are covered.

Chapter 4 presents a 52Gb/s PAM-4 ADC-based receiver design, which consists of a 4-stage CTLE/VGA front-end, a 6b 26GS/s 32-way time-interleaved SAR ADC with a comparator-assisted 2b/stage loop-unrolled unit SAR structure and 3-tap embedded FFE with improved pre/post-tap coefficient coverage and a DSP equalizer with a 12-tap FFE and 2-tap partially unrolled DFE. Design details of the 52GS/s ADC is covered and measurement results of both ADC and receiver are shown to verify the effectiveness of the proposed ADC-based receiver architecture.

Finally, Chapter 5 shows a comprehensive comparison of the two ADC and ADC-based receiver prototypes with other state-of-the-art designs and concludes the dissertation. In addition, some recommendations on potential future work for high-speed ADCs and ADC-based receivers are discussed.

### 2. BACKGROUND\*

This chapter gives a brief background introduction of high-speed wireline receivers, including modulation schemes, receiver architectures and receiver equalization techniques. Next, ADC-based receiver architecture is explained in details, focusing specifically on high-speed ADC structures and critical circuit blocks. A statistical modeling framework is briefly introduced and compared against conventional mean square error (MSE) based methodology for ADC characterization to show the effectiveness of statistical model to incorporate ADC non-ideality into the receiver model.

### 2.1 High-Speed Link Receivers

Contrary to wireless communication where additive Gaussian noise channel model is employed, wireline channels (Fig. 2.1) are usually band-limited due to frequency dependent loss such as skin effect and dielectric absorption. In a bandwidth constraint digital communication system, the performance of the transceivers is generally limited by inter-symbol interference (ISI) that comes from the lossy channel. While it is possible to control ISI by using various pulse shaping waveforms such as raised cosine pulse to reduce the amount of ISI at channel output, it results in complicated filter design at the transmitter side and can be especially power hungry at high data rates. For the simplicity of transmitter design, PAM is commonly used for signaling in high-speed wireline transceivers. While PAM-2, also known as non-return-to-zero (NRZ), is traditionally used in wireline transceivers, more bandwidth-efficient modulation schemes such as PAM-4 is gaining

<sup>\*</sup> Part of this chapter is reprinted with permission from "CMOS ADC-based receivers for highspeed electrical and optical links," by S. Palermo, S. Hoyos, A. Shafik, E. Z. Tabasy, S. Cai, S. Kiran, and K. Lee, IEEE Communication Magazine, vol. 54, no. 10, pp. 168-175, Copyright 2016 by IEEE



Figure 2.1: Wireline channels (a) backplane channel model in data center applications (b)  $S_{21}$  (frequency dependent loss) of FR4 channels with various length. [6]

more interest as data rate increases. Regardless of modulation schemes, various equalization techniques are necessary both at the transmitter and receiver side to compensate for channel loss and ISI to enable reliable symbol detection at the receiver side. Different equalization techniques commonly employed at the receiver side are discussed

in this section. Meanwhile, as the data rate increases, the equalizers need to compensate for channel loss at a higher frequency, translating to a significantly power and complexity overhead for conventional receiver equalizer design, especially for PAM-4 receivers. This motivates the exploration of ADC-based receiver architecture, which enables more powerful and flexible DSP for equalization and symbol detection and can easily support advanced modulation schemes.

### 2.1.1 Modulation Schemes for Wireline Application

Transmission of signals over a band-limited channel (Fig. 2.2) for different types of digital modulation schemes can be generalized as:

$$x(t) = \sum_{n=-\infty}^{\infty} I_n g(t - nT)$$
(2.1)

where x(t) is the transmitter output,  $I_n$  is the discrete sequence of data symbols and g(t) is the pulse shaping waveform. Assuming a PAM modulation scheme:

$$g(t) = \begin{cases} 1, & 0 < t \le T \\ 0, & otherwise \end{cases}$$
(2.2)

where g(t) are simply square pulses.

At the channel output, the received signal can be expressed as:

$$y(t) = \sum_{n=-\infty}^{\infty} I_n h(t - nT) + z(t)$$
(2.3)

where

$$h(t) = \int_{-\infty}^{\infty} g(\tau) c(t-\tau) d\tau$$
(2.4)

c(t) is the channel impulse response and z(t) is the additive white Gaussian noise (AWGN).

After sampling, the received signal can be expressed in discrete time:



Figure 2.2: Equivalent model of a wireline transceiver.



Figure 2.3: Example of a sampled pulse response.

$$y[n] = \sum_{n=-\infty}^{\infty} I_n h[k-n] + z[k], \ k = 0, 1, \dots$$
(2.5)

where h[n] is sampled channel pulse response.

Sampled pulse response can be especially useful for link budgeting, and an example of sampled pulse response with symbol  $I_0$  of +1 is shown in Fig. 2.3. h[0] is the main cursor



Figure 2.4: Time & frequency domain representation of NRZ and PAM-4 modulation. [10]

defined by the optimal sampling point from clock data recovery (CDR). The rest of the pre/post cursors (h[-1], h[1], h[2] and etc.) are the ISI terms that degrades the noise margin available for symbol detection:

$$y[n]|_{n=0} = I_0 h_0 + \sum_{\substack{n=-\infty\\n\neq 0}}^{\infty} I_n h[k-n] + z[k], \ k = 0, 1, \dots$$
(2.6)

NRZ (PAM-2) offers the simplest modulation by assigning *k*-th symbol to two binary values:

$$I_{k,NRZ} = \{-1,+1\}$$
(2.7)

whereas PAM-4 modulation follows:

$$I_{k,PAM-4} = \{-1, -1/3, +1/3, +1\}$$
(2.8)

It is easy to observe that the Euclidean distance between two adjacent symbols is decreased from 2 for NRZ to 2/3 for PAM-4, whereas the AWGN term z[k] in Eq. 2.6 does not scale accordingly. This results in a 9.54dB SNR degradation for PAM-4 modulation. On the other hand, PAM-4 modulation transmits two bits per symbol and is more bandwidth efficient compared with NRZ, i.e. the Nyquist frequency of PAM-4 is half of that of NRZ with the same data rate (Fig 2.4). This means that for a given data rate over a band-limited channel, a PAM-4 system sees less loss than a NRZ system. Therefore, a general rule of thumb is that PAM-4 could be superior compared with NRZ for a channel with a loss gradient larger than 9.54dB/Octave. Consider the example channels in Fig. 2.1(b), for a 10Gb/s link design, channels that are longer than 25" could potentially benefit from employing PAM-4 instead of NRZ modulation.

However, PAM-4 system presents its own challenges in design. Compared with design of a NRZ receiver, a PAM-4 receiver requires 3x comparators for 2-bit decision. Also, CDR design can be challenging because of multiple zero crossings in a PAM-4 system. These design challenges will be explained in more details in the sections to follow.

### 2.1.2 Receiver Equalization Techniques

As introduced in Section 2.1.1, ISI is the main impairment in a wireline communication system to reliable symbol detection at receiver. For wireline transceivers, signal power is usually assumed to dominate when compared with noise power. Minimum mean-squareerror (MMSE-based) linear equalizers are commonly employed to address ISI while taking noise into consideration. In addition, non-linear equalizers such as digital feedback equalizers (DFE) are powerful but expensive tools to cancel post tap ISI without amplifying noise.

Transmit equalization is usually implemented as an FIR filter with low complexity. This FIR filter attempts to invert the channel loss by creating a high-pass filter response.



Figure 2.5: Schematic of (a) passive CTLE and (b) active CTLE.

While this FIR equalization could also be implemented at the receiver, the main advantage of implementing it at the transmitter is its ease of implementation in digital domain.

However, due to the peak power constraints, transmitter FIR introduces DC loss, which degrades the SNR at receiver input. Also, back channels are required to adapt transmitter FIR filter coefficient.

At the receiver side, linear equalizer implementations include continuous time linear equalizer (CTLE) and discrete-time finite impulse response (FIR) filter. CTLE provides effective precursor and postcursor cancellation, and can be implemented both passively and actively. Passive CTLE (Fig. 2.5(a)) is generally more linear, but provides no gain at Nyquist frequency, whereas active CTLE (Fig. 2.5(b)) offers can with RC degeneration at Nyquist frequency. However, active CTLE can be pretty power hungry when the amount of peaking obtainable is limited by the gain-bandwidth of the CML amplifier. Bandwidth extension techniques such as shunt peaking or series peaking can be employed to relax the bandwidth limitation, but requires inductors that result in large area overhead.

Discrete time FIR filter can be implemented in either analog domain or in digital domain for ADC-based receivers. Receiver FIR filter can cancel precursor ISI and its tap coefficient can be adaptively tuned without any back-channel required by transmitter FIR filter. While it can be hard to implement accurate tap delay in analog domain [11], discrete time FIR can be easily embedded in the time-interleaved SAR ADC structure [6] by sampling the weighted subsequent samples at time multiplexed T/H outputs on SAR DACs. However, the DC loss introduced by embedded FFE can constraint the linearity of front-end circuits (CTLE and VGA) for them to amplify the input and match its output swing to ADC full scale range (FSR). When implemented in digital domain, FIR filter consists of shift registers, multipliers and adders can be implemented efficiently and conveniently programmed to address different channel losses.



(a)



(b)

Figure 2.6: Block diagram of an NRZ DFE with (a) direct feedback and (b) loop-unrolling.

While receiver side linear equalizers amplifier high frequency noise and crosstalk, the SNR at the equalizer output remain unchanged without considering the noise generated by the equalizer itself. The output referred noise induced by these equalizes needs to be well controlled by design so that the SNR at equalizer output is not degraded significantly.

DFE is the other commonly employed equalizer at the receiver side. A DFE directly subtracts postcursor ISI from the incoming signal by feeding back the resolved digital data using a slicer to control the polarity of the equalization taps. Unlike linear equalizers, a

DFE does not amplify the input signal noise or crosstalk since it uses the quantized bit information. Since DFE cannot handle precursors, it is usually implemented together with linear equalizers and its coefficient co-optimized [12]. The challenge of designing a DFE is to close the critical 1-UI timing loop for the first postcursor. Block diagram of a conventional NRZ DFE is shown in Fig. 2.6. Fig. 2.6(a) shows a DFE design with direct feedback, where the critical 1-UI path can be expressed as:

$$t_{clk-Q} + t_{sum} + t_{setup} \le 1UI \tag{2.9}$$

where  $t_{clk-Q}$  is the clock to Q delay of the latch,  $t_{sum}$  is the summer settling delay at the latch input node and  $t_{setup}$  is the setup time for the latch to make decisions correctly. This timing constraint can be hard to meet with increasing baud rate, i.e. 31.25ps for a 32GBd system. One way to relax this critical timing loop is to unroll the feedback loop as shown in Fig. 2.6(b). In this case, the subtractions of the postcursor taps are pre-calculated and selected based on the decision from the previous symbol. The 1-UI timing constraint of a loopunrolled DFE can be then given as:

$$t_{clk-Q} + t_{mux} \le 1UI \tag{2.10}$$

where  $t_{mux}$  is the delay from MUX selection to MUX output. When comparing with Eq. 2.9, it can be observed that  $t_{sum}$  is excluded from the feedback timing loop because of the precomputation carried out ahead.  $t_{sum}$  is generally larger than  $t_{mux}$  considering that the latch input node is usually heavily loaded by DFE summers, and the signal at latch input needs to settle with a given resolution to meet the voltage margin requirement, whereas  $t_{mux}$  can be optimized to around 1~2 FO4 delay.



Figure 2.7: Block diagram of a 1-tap PAM-4 DFE with loop-unrolling.

While loop-unrolled DFE structure relaxes the critical timing, it requires two summers and two latches for a 1-tap DFE in a NRZ receiver, which is double of the design of a conventional DFE with direct feedback. The hardware overhead increases significantly if a similar loop-unrolled DFE is implemented for a PAM-4 receiver, as shown in Fig. 2.7. The overhead grows to 12 summers and 12 comparators, which is similar to a 4-bit flash ADC considering the comparator count.

|                                                         | IEEE (Institute of Electrical and Electronics Engineers) 802-30s & 802-3cd |                          |                          |                                |                                |                                                                        |                                                                     |                                                                        |                                                                     |                                       |                                              |
|---------------------------------------------------------|----------------------------------------------------------------------------|--------------------------|--------------------------|--------------------------------|--------------------------------|------------------------------------------------------------------------|---------------------------------------------------------------------|------------------------------------------------------------------------|---------------------------------------------------------------------|---------------------------------------|----------------------------------------------|
| Standard / implementation<br>agreement                  | 400GBASE-SR16                                                              |                          | 400GBASE-DR4             | 200GBASE-FR4 /<br>400GBASE-FR8 | 200GBASE-LR4 /<br>400GBASE-LR8 | 50GAUI-2 C2M /<br>100GAUI-4 C2M /<br>200GAUI-8 C2M /<br>400GAUI-16 C2M | 50GAUI C2M /<br>100GAUI-2 C2M /<br>200GAUI-4 C2M /<br>400GAUI-8 C2M | 50GAUI-2 C2C /<br>100GAUI-4 C2C /<br>200GAUI-8 C2C /<br>400GAUI-16 C2C | 50GAUI C2C /<br>100GAUI-2 C2C /<br>200GAUI-4 C2C /<br>400GAUI-8 C2C | 50G-KR /<br>100G-KR2 /<br>200G-KR4    | 50G-CR /<br>100G-CR2 /<br>200G-CR4           |
| Application                                             | Fiber optic<br>data link                                                   | Fiber optic<br>data link | Fiber optic<br>data link | Fiber optic<br>data link       | Fiber optic<br>data link       | Chip to pluggable optical module                                       | Chip to pluggable optical module                                    | Chip to chip on<br>same circuit<br>board                               | Chip to chip on<br>same circuit board                               | Backplanes with<br>daughter cards     | Passive copper<br>cable                      |
| Link media                                              | Multimode fiber                                                            | Single-mode<br>fiber     | Single-mode<br>fiber     | Single-mode<br>fiber           | Single-mode<br>fiber           | Circuit board trace<br>+ 1 connector                                   | Circuit board trace<br>+ 1 connector                                | Circuit board<br>trace                                                 | Circuit board trace                                                 | Circuit board trace +<br>3 connectors | Twinax copper cable<br>+ 2 connectors        |
| Modulation format                                       | NRZ                                                                        | PAM-4                    | PAM-4                    | PAM-4                          | PAM-4                          | NRZ                                                                    | PAM-4                                                               | NRZ                                                                    | PAM-4                                                               | PAM-4                                 | PAM-4                                        |
| Symbol rate, per lane/wire                              | 26.5625 GBd                                                                | 26.5625 GBd              | 53.125 GBd               | 26.5625 GBd                    | 26.5625 GBd                    | 26.5625 GBd                                                            | 26.5625 GBd                                                         | 26.5625 GBd                                                            | 26.5625 GBd                                                         | 26.5625 GBd                           | 26.5625 GBd                                  |
| Maximum reach<br>(channel loss at Nyquist<br>frequency) | 100 m                                                                      | 500 m                    | 500 m                    | 2 km                           | 10 km                          | 10.2 dB<br>(≈ 100 mm)                                                  | 10.2 dB<br>(≈ 100 mm)                                               | 20 dB<br>(≈ 25 cm)                                                     | 20 dB<br>(≈ 25 cm)                                                  | 30 dB<br>(≈ 1 m)                      | 16.06 dB<br>(≥ 3 m)<br>(cable assembly only) |
| Number of parallel lanes                                | 16                                                                         | 4                        | 4                        | 1                              | 1                              | 2/ 4/ 8/ 16                                                            | 1/2/4/8                                                             | 2/ 4/ 8/ 16                                                            | 1/2/4/8                                                             | 1/2/4                                 | 1/2/4                                        |
| Number of wavelengths                                   | 1                                                                          | 1                        | 1                        | 4/8                            | 4/8                            |                                                                        |                                                                     |                                                                        |                                                                     |                                       |                                              |
| Forward error correction<br>(FEC) overhead              | Required                                                                   | Required                 | Required                 | Required                       | Required                       | Required                                                               | Required                                                            | Required                                                               | Required                                                            | Required                              | Required                                     |
| Pre-FEC bit error ratio<br>(BER)                        | 2.4 E-4                                                                    | 2.4 E-4                  | 2.4 E-4                  | 2.4 E-4                        | 2.4 E-4                        | 1E-6                                                                   | 1E-5                                                                | 1E-5                                                                   | 1E-4                                                                | 2.4 E-6                               | 2.4 E-4                                      |

Figure. 2.8: IEEE 400Gb Ethernet taskforce for 802.3bs.

### 2.1.3 Receiver Architectures

As discussed in the previous sections, PAM-4 modulation reduces the baud rate by half compared against NRZ modulation with a give bit rate, potentially resulting in more efficient receiver design with relaxed ISI even with a 9.54dB SNR degradation. In fact, PAM-4 modulation has been selected as the standard modulation format in IEEE 400Gb Ethernet (IEEE 802.3bs) as shown in Fig. 2.8. One application of particular interest for this dissertation is the 50~200G-KR4 standard, which finds its application in data centers where the communication channels consist of backplanes with daughter cards previously shown in Fig. 2.1(a). The backplane channels consist of circuit board traces on both mother board and daughter boards as well as 3 connectors, which result in a loss of 30dB at ~13GHz Nyquist frequency with a baud rate of 26.5625GBd. The BER target for the link is around  $10^{-5}$  with forward error correction coding (FEC) assumed.

Two main receiver architectures being considered for KR4 400GbE standard are mixed-signal receivers and ADC-based receivers as shown in Fig. 2.9.



(a)



Figure. 2.9: Block diagram of PAM-4 receivers (a) mixed-signal receiver architecture (b) ADC-based receiver architecture.

Mixed-signal receivers employ analog domain equalizers such as CTLE, DFE with FIR feedback for first post tap cancellation or infinite impulse response (IIR) feedback for long-



Figure. 2.10: Performance summary of state-of-the-art PAM-4 receivers (a) channel loss vs. baud rate (b) power efficiency vs. baud rate.

tail cancellation. Edge samplers are usually required in oversampling CDR to provide phase information.

ADC-based receivers employ partial analog domain pre-equalizations such as CTLE and discrete-time FIR equalizer to relax the dynamic range requirement for ADC. The front-end ADC operates at baud rate with a resolution commonly ranging from 6 to 8-bits. The DSP equalizer that follows ADC provides digital domain FFE and DFE for further ISI cancellation and symbol detection. Baud rate CDRs such as Mueller-Muller CDR are usually employed to take advantage of the amplitude information provided by the ADC at the sampling point.

Fig. 2.10 shows performance comparison the stat-of-the art PAM-4 receiver designs. It can be observed that ADC-based receivers are generally employed for backplane applications with channel loss beyond 30dB, whereas mixed-signal receivers can hardly compensate for 25dB channel loss. However, mixed-signal receivers are more power efficient compared with ADC-based receivers as shown in Fig. 2.10(b), where the power efficiency numbers for ADC-based receivers that falls close to mixed-signal receivers do not include DSP power. In other words, the front-end ADC power alone is comparable or even larger than the power of the entire mixed-signal receiver. This motivates the exploration of power efficient high-speed ADC design, which will be introduced in details in the following sections.

## 2.2 High-Speed ADC

In ADC-based receivers, baud rate high-speed ADC samples and quantizes channel output data stream partially equalized by analog pre-equalizers to enable digital domain equalization. This section presents challenges in design and implementation of power-efficient high-speed ADC design for wireline ADC-based receivers, with a sample rate beyond 20GS/s.



Figure. 2.11: Block diagram of an M-way TI-ADC and TI-error model.

### 2.2.1 Time-Interleaved ADC Structure

For ADC designs with sample rate beyond 20GS/s, time-interleaved structure is common employed to achieve good ADC power efficiency. Time-interleaved ADC uses time multiplexed operation, where multiple unit ADCs sample and convert the ADC input subsequently so that the unit ADCs can work at a relatively low sample rate while maintaining the target aggregate sampling rate. As shown in Fig. 2.11, this consists of M nominally identical sub-ADCs running in parallel that are sampling with M clock phases of frequency  $f_s/M$  and time spacing of  $1/f_s$ , which is the sample period. Parallelism allows for sub-ADCs running at a fraction of the aggregate sample rate and use of energy efficient sub-ADC designs. While time interleaving enables extremely high sample rate converters, conversion errors occur due to mismatches between the parallel sub-ADCs. These errors appear as spurious peaks in the ADC output spectrum and can significantly degrade SNDR. The magnitude and position of these tones depend on the type of the mismatch, which can be classified as either offset, gain, bandwidth, or timing skew errors. Calibration techniques,



Figure. 2.12: Impact of offset error on ADC output spectrum.

both in the analog and digital domains, are employed to correct for these time interleaving errors and achieve acceptable performance.

Offset errors occur due to device mismatches in the time-interleaved T/Hs, reference generation circuitry, and comparators. Fig. 2.12 shows the impact of offset error on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and offset standard variation of 40mV. The offset error is most commonly corrected in the analog domain in the comparators. Analog-domain techniques include employing correction DACs, either current-mode [4-6] or capacitive [13-14], which adjust the offset at either the comparator's input or an internal node or utilizing comparators with parallel input differential pairs



Figure. 2.13: Impact of gain error on ADC output spectrum.

which can be digitally-reconfigured [15-16] or driven by a programmable reference voltage [17-18]. This calibration can be performed in the foreground with the aid of input calibration DACs [4-6, 15] and/or in the background [5, 13, 15, 18]. Given that the necessary offset correction range is inversely proportional to the square-root of the comparator's area, a large trade-off exists in flash ADCs. In order to address this, a foreground calibration approach which re-orders the comparator effective reference based on the distance from its inherent offset has been proposed to save significant comparator area [15]. Background offset calibration techniques for flash ADCs include utilizing either redundant time-interleaved sub-ADCs [16] or comparators [13] which rotate between the main input signal and a calibration input, while for asynchronous SAR ADCs the extra



Figure. 2.14: Impact of bandwidth error on ADC output spectrum.

time at the end of the conversion can be used to perform charge switching calibration [18]. In order to avoid the extra area and loading due to this analog offset correction circuitry, it is also possible to compensate for offset purely in the digital domain by estimating the average slicer error after digital equalization and then subtracting this value in the DSP [19]. This digital approach does trade-off some ADC dynamic range for potentially higher bandwidth.

The device mismatches that cause offset errors can also result in time interleave gain errors. Fig. 2.13 shows the impact of gain errors on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and gain standard variation of 5%. In a flash ADC this can



Figure. 2.15: Impact of skew error on ADC output spectrum.

be corrected in the analog domain via further adjustment of the comparator thresholds [15-16] and through the use of a programmable-gain amplifier in each sub-ADC [4-5, 16]. An effective background gain calibration technique is to digitally monitor the ADC peak output and adjust the PGA source degeneration to generate an input swing which matches the full scale range [4]. In a SAR ADC, analog gain correction can be realized by adjusting the capacitive DAC reference voltage [19] and by introduction additional programmable capacitors [12]. Further fine gain calibration can also be implemented in the DSP [19].

Bandwidth errors result from layout asymmetries and also the aforementioned device mismatches. Fig. 2.14 shows the impact of gain errors on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and gain standard variation of 800MHz. One
approach to mitigating bandwidth errors is to simply design the signal path with sufficiently high bandwidth such that any variations do not translate into appreciable differences in the pulse response. This has been achieved in flash ADC designs by employing shunt peaking in the PGAs [4, 15]. Bandwidth error compensation can be also performed in the digital domain by simply considering each sub-ADC as having its own unique channel. As the DSP is often designed in a parallel fashion with the digital slice number equal to or greater than the ADC time-interleave factor, it is possible to independently adapt the FFE and DFE tap coefficients in each slice to its unique pulse response to account for these bandwidth errors [19].

Finally, skew errors result from device mismatches and layout asymmetries in the multi-phase clock generation and distribution to the input track-and-holds. Fig. 2.15 shows the impact of gain errors on ADC output spectrum for an 8-way TI-ADC model with a  $1V_{ppd}$  FSR and skew standard variation of 2ps. These are most often calibrated with perphase digitally-adjustable delay cells in the clock distribution buffers [6, 15, 17, 19] or phase interpolators with independent phase offset codes [4-5]. Similar to bandwidth errors, skew errors will cause each sub-ADC to generate a pulse response with a slightly different ISI characteristic. Thus, an efficient approach to detect skew errors is to monitor the differences in the converged tap coefficients of a per-slice adaptive equalizer [4]. The delay cell or phase interpolator control can be adjusted to minimize the coefficient differences and calibrate the skew to within the resolution of the correction circuitry. Independent equalizer tap control allows for further fine compensation of residual skew and bandwidth errors.

## 2.2.2 Power-Efficient Unit-ADC Structures

Fig. 1.2 shows Walden's figure of merit (FoM) for Nyquist-rate ADCs published since 2000 that operate at sample rates of 5GS/s and higher. Walden's FoM is defined as:



Figure. 2.16: Block diagram of a flash ADC.

$$FOM = \frac{P}{2^{ENOB} f_s}$$
(2.11)

where  $f_s$  is the sample rate and the effective number of bits (ENOB) is derived from the signal-to-noise-plus-distortion (SNDR) ratio with a sinusoidal input near the Nyquist frequency. This FoM captures the trade-offs in power consumption, speed, and resolution faced in high-speed ADC design, where a low FoM is desired which represents the lowest power for a given speed and resolution. In order to achieve sample rates higher than 5GS/s, time-interleaving of parallel identical sub-ADCs clocked with multiple phase-shifted

sample clocks is employed. Two ADC architectures dominate at these high sample rates, the flash and successive approximation register (SAR) converters.

Flash ADCs (Fig. 2.16) consist of a reference ladder and comparators at each reference level to enable simultaneous parallel conversion of an input signal sampled by a track-and-hold (T/H). This results in a thermometer output code which is then converted to standard binary with an encoder. Flash ADCs can be very fast, as the full conversion only requires one clock cycle that is typically partitioned with half the clock cycle for comparator evaluation and the other half for the encoder logic and comparator resetting/precharging. Recent TI flash ADCs have achieved very high sample rates by employing common interleave factors of 4 or 8, with each interleaved sub-ADC running at sample rates of 2GS/s and higher [3-5, 7, 13, 15]. However, given the large number of comparators activated for each conversion, 63 for a 6-bit converter, this results in high comparator and clocking power and large loading experienced by the input T/H. Thus, high-speed flash ADCs are generally only reasonable for resolutions of 6-bits or less.

SAR ADCs (Fig. 2.17) perform a binary search conversion over multiple clock cycles, with the simplest architectures utilizing a single comparator. A comparison is made between the sampled input and the reference digital-to-analog converter (DAC) value during each clock cycle. This comparison is then used to update the reference DAC for the next conversion cycle. The process continues over *N*-cycles until the full *N*-bit conversion is completed. In order to achieve low-power operation, the reference DACs are implemented as charge redistribution capacitive DACs. Since this process takes multiple clock cycles, versus a single clock cycle for a flash ADC, SAR converters employ much larger time interleave factors to achieve high sample rates. For example, a recent 8-bit 56GS/s converter utilized 320 time-interleaved ADCs operating at 175MS/s [20]. While



Figure. 2.17: Block diagram and critical timing path of a SAR ADC.

effective, extremely high interleave factors can result in a large area design and high clocking power consumption. Thus, in order to improve SAR efficiency, techniques have been developed to improve the sub-ADC conversion speed. The critical timing path (minimum conversion period) of a SAR ADC can be expressed as:

$$t_{CLK} \ge \sum_{N} (t_{comp} + t_{logic} + t_{DAC})$$
(2.12)

It can be observed from Eq. 2.12 that one major limitation in SAR ADC speed is performing the conversion over N clock cycles, which each must be long enough to handle a worst-case small input signal. As this small input occurs only once during the conversion process, dramatic speed-ups can be obtained with an asynchronous architecture which automatically senses when the comparator has evaluated and subsequently clocks the

comparator after a delay to accommodate DAC settling. This asynchronous approach has been utilized in recent time interleaved SAR ADCs operating at 10GS/s and higher [6, 17, 18, 21]. Another straight forward way to reduce the conversion delay is to employ multibit per stage conversion [22], which speeds up unit SAR ADC significantly. However, multi-bit per stage conversion usually requires multiple DAC to generate the corresponding references, which results in power and area overhead. Other issues which limit the SAR ADC speed are the comparator reset and DAC settling times. The comparator reset delay is eliminated in a design which employs two comparators which alternative between even and odd bit conversions, which was recently achieved 90GS/s [18]. DAC settling is relaxed in designs which utilize a sub-radix-2 DAC, which also provides over-range protection [17]. Finally, the binary search algorithm used in SAR ADCs is inherently prone to comparator metastability, which can cause large conversion errors at the ADC output and degrade BER performance in ADC-based receivers. Approaches to address this include SAR ADCs with metastability detectors which force a conversion after a given time and digital back-end metastability correction circuitry [21].

While flash and SAR ADCs dominate at sample rates of 10GS/s and higher, an interesting architecture which combines properties of these two is the binary search ADC as shown in Fig. 2.17. Binary search ADCs adopt a similar energy-efficient binary search conversion algorithm as SAR ADCs, but without any DAC settling time or SAR logic delay by letting an MSB comparator activate a second-stage comparator bank with thresholds pre-configured for MSB-1 conversion, and so on throughout a comparator tree. While this architecture requires the same number of comparators as a flash, only the necessary comparators are activated for a given conversion, which results in a good power efficiency with improved conversion rate compared with a SAR implementation. However,



Figure. 2.18: Block diagram of a binary search ADC.

binary search ADC design shares some of the same challenges in SAR and flash ADC design, where the large comparator count results in high clocking power and large loading experienced by the input T/H, and the sequential operation leads to potential metastability errors.

# 3. 6-BIT 25GS/S TIME-INTERLEAVED MULTI-BIT SEARCH ADC WITH SOFT-DECISION SEARCH ALGORITHM\*

Time-interleaving architectures with multiple unit ADCs working at a lower sampling rate are generally employed to achieve sampling rates larger than 10GS/s, with flash and SAR converters often utilized [16, 18, 21, 23]. Flash ADCs [16], [23] can operate at high sampling rates and a relatively small number of unit ADCs. However, the parallel conversion approach of a flash ADC results in high power consumption as the resolution approaches 6 bits due to the switching of all the comparators. Conversely, SAR ADCs offer excellent power efficiency with a minimal number of comparators performing a binary search conversion. Unfortunately, it is challenging to push the unit ADC sampling speed significantly beyond 1GS/s, resulting in a very high channel count to obtain an overall high aggregate sampling rate [18], [21].

Binary search ADCs [24] adopt a similar energy-efficient binary search conversion algorithm as SAR ADCs, but without any DAC settling time or SAR logic delay. While this allows for a potentially higher conversion speed, the multi-stage operation can still limit the achievable sampling rate. This issue has been addressed in SAR ADCs which employ a multi-bit per stage binary search conversion [22]. However, as shown in Fig. 3.1, significant area and power overhead results due to the multiple DACs and comparators required to enable multi-bit conversion in a SAR ADC. Fortunately, this multi-bit per stage conversion algorithm can be directly applied to binary search ADCs with minimal hardware overhead to enable higher sampling rates at excellent power efficiency.

<sup>\*</sup> Part of this chapter is reprinted with permission from "A 25GS/s 6b TI two-stage multi-bit search ADC with soft-decision selection algorithm in 65 nm CMOS," by S. Cai, E. Z. Tabasy, A. Shafik, S. Kiran, S. Hoyos, and S. Palermo, IEEE J. Solid-State Circuits, vol. 52, no. 8, pp. 2168-2179, Copyright 2017 by IEEE



Figure 3.1: (a) Conventional SAR ADC, (b) multi-bit/stage SAR ADC, (c) conventional binary search ADC, and (d) multi-bit/stage binary search ADC.

However, key challenges exist in the efficient implementation of a multi-bit per stage binary search ADC. One issue is that the multi-stage operation is inherently prone to metastability errors which can dramatically degrade ADC signal-to-noise and distortion ratio (SNDR) [25] and system performance in serial I/O receivers [26]. Binary search ADCs also suffer from a similar exponential hardware complexity as flash ADCs, resulting in a large load capacitance for the track-and-hold (T/H). Although reference prediction techniques [27] can reduce comparator count, achieving the maximum benefit of this approach involves the use of relatively-slow unit ADCs which employ multiple single-bit stages.

This Chapter presents an 8-channel time-interleaved 25GS/s 6b ADC with 3.125GS/s unit ADCs employing a two-stage asynchronous binary search structure that consists of a 2b first stage and a 4b second stage that addresses these issues [14]. In order to improve T/H bandwidth and ADC metastability, a novel soft-decision selection algorithm is proposed and analyzed in Section 3.1. Section 3.2 presents the ADC architecture and key circuit blocks, including a novel shared-input double-tail three latch structure utilized to reduce the comparator loading of the T/H circuit. Experimental results from a general purpose (GP) 65nm CMOS prototype are presented in Section 3.3. Finally, Section 3.4 concludes this Chapter.

## 3.1 Soft-Decision Selection Algorithm

A conventional asynchronous binary search algorithm works in a decision ripple fashion, where the search space at each stage other than the first stage depends on the decisions from the previous conversion stages. While an efficient search operation is achieved by each conversion stage generating hard decisions and having non-overlapping search spaces for the following stages, decision errors occurring at a certain stage other than the final stage result in an erroneous subsequent search space and produce conversion errors at the ADC output. A redundant binary search algorithm [28] tolerates these hard decision errors by overlapping the search space of the following stages such that the errors can be recovered from the redundancy introduced in the overlapped region. However, increasing the search space translates into more triggered comparators and results in degraded power efficiency. In addition, for a conventional asynchronous binary search algorithm, each conversion stage will not be triggered until the previous stage decisions



Figure 3.2: High-speed T/H structure and settling error.

ripple to the current stage. Therefore, if a decision stage experiences metastability and takes an excessive amount of time to generate the ripple signal, the following stages will not have enough time for conversion and will also result in conversion errors at the ADC output.

## 3.1.1 Track-and-Holds Settling Error

T/H settling errors are a major source of decision errors at the critical first conversion stage. As shown in Fig. 3.2, a conventional T/H circuit consists of a bootstrap switch followed by a buffer to drive the ADC. This buffer is implemented to isolate the ADC input loading from the sampling switch since the bandwidth of the front-end sampling switch should be larger than the ADC bandwidth to maintain a good dynamic range when the ADC input is close to the Nyquist frequency. Ideally, the buffer output  $V_{TH}$  should track the sampling switch output  $V_{SW}$  with minimal phase delay and generate a sampled input at the instant when the sampling switch is turned off. However, the large capacitive loading from the ADC and routing can result in a high-power design in order to preserve a high-bandwidth buffer output node. When the sampling switch is turned off, the buffer output will settle to the voltage held at the switch output at a rate determined by the settling time constant. Reducing the buffer's output bandwidth to save power results in the T/H output tracking the switch output with a phase and gain error, as shown by the black and blue  $V_{TH}$ 



Figure 3.3: T/H settling scenario for a conventional binary search algorithm and the proposed soft-decision selection algorithm.

curves of Fig. 3.2. In a conventional binary search ADC, during the hold phase the T/H output should settle within 0.5 LSB when the ADC starts conversion. A slower T/H settling time results in less time for ADC conversion and a lower conversion speed.

In order to demonstrate the potential for conversion errors with incomplete T/H settling, Fig. 3.3 shows this scenario for a multi-bit per stage binary search ADC, with two and four bits converted in the first and second stages, respectively. For simplification, the lines represent the comparators with thresholds at the corresponding reference levels. In this example the T/H output (ADC input) is assumed to be slightly less than  $1/2V_{REF}$  when the first stage comparators are triggered. Because of the limited bandwidth at the T/H output



Figure 3.4: Metastability scenario for a conventional binary search algorithm and the proposed soft-decision selection algorithm.

node, the T/H output has not fully settled and continues to settle to 4 LSB above the  $1/2V_{REF}$  level. For a conventional binary search algorithm shown on the left, the first stage middle comparator with reference at  $1/2V_{REF}$  has already made a hard decision '0' to select the bank of 15 comparators (red lines) with references between  $1/4V_{REF}$  and  $1/2V_{REF}$  at the second stage. The triggered comparators do not cover the final settled T/H output and therefore an unrecoverable 4 LSB conversion error appears at the ADC output.

A novel soft-decision selection binary search algorithm is proposed that creates redundancy to tolerate decision errors without the need to overlap search spaces. Relative to a redundant binary search algorithm [28], this improves the ADC critical timing path to relax metastability errors and T/H bandwidth requirements. The redundancy from the

proposed soft-decision selection algorithm offers tolerance to T/H settling errors, such that the ADC can start conversion even if the T/H output has not settled within 0.5 LSB errors. Fig. 3.4 shows that the soft-decision selection search algorithm introduces auxiliary decision information, represented by the dashed lines in between the first stage comparators (solid lines), to select the triggered second-stage comparators. These additional decisions are generated by SR latches comparing the rising edges of the decision outputs from adjacent comparators, which create interpolated levels in the time domain [29]. The second-stage comparator selection is partitioned with the SR latch output triggering 7 second-stage comparators whose references are centered at  $1/4V_{REF}$ ,  $1/2V_{REF}$  and  $3/4V_{REF}$ , and the voltage-domain comparator outputs triggering 9 second-stage comparators with references centered at the SR latch interpolated levels of 1/8V<sub>REF</sub>, 3/8V<sub>REF</sub>, 5/8V<sub>REF</sub> and  $7/8V_{\text{REF}}$ . Assuming the same example where the T/H output is initially slightly lower than  $1/2V_{REF}$ , the first-stage comparators select the bank of 9 comparators with references centered at  $3/8V_{REF}$ . Since the input is close to  $1/2V_{REF}$ , the decision from the middle comparator with threshold at  $1/2V_{\text{REF}}$  arrives later compared to the other two comparators. Thus, the SR latches select the bank of 7 comparators with references centered at  $1/2V_{REF}$ . With the additional information from the SR latches, the second-stage search space is shifted up to create a 4 LSB redundancy to account for any potential settling error. This results in no conversion error at the ADC output due to the second-stage search space covering the settled T/H output. In order to enable soft decision selection at full scale levels, two additional dummy comparators with thresholds at the full scale references are included. For inputs falling close to 0 and  $V_{\text{REF}}$ , SR latches select a bank of 3 comparators with references at  $1-3/64V_{REF}$  and  $61-63/64V_{REF}$ , respectively.

# 3.1.2 Metastability Error

The proposed soft-decision selection binary search algorithm also addresses metastability scenarios where the input is initially extremely close to a reference level. For the example shown in Fig. 3.4, the middle comparator with threshold at  $1/2V_{REF}$  in the first stage experiences an excessively long regeneration time due to the small input difference and can consume almost all the conversion cycle time. In the case of a conventional binary search algorithm, the relevant comparators in the second stage with threshold levels between  $1/4V_{REF}$  and  $1/2V_{REF}$  (grey lines) will not be triggered because their selection depends on the output of the slow decision from the first stage  $1/2V_{REF}$  comparator. Therefore, a 4 LSB conversion error results from metastability. Whereas for the proposed soft-decision selection search algorithm under the same metastability scenario, the 7 relevant second-stage comparators with threshold levels that cover the metastable input (green lines) are triggered by the fast SR latch output instead of the slow  $1/2V_{REF}$  comparator output. Even though the other 9 second-stage comparators (grey lines) are not triggered, there is no conversion information lost.

In order to quantify the performance improvement from the proposed soft-decision selection algorithm, simulations are performed to examine the impact of T/H buffer and comparator time constants on ADC SNDR. Assuming a 3.125GS/s two-stage 2b-4b ADC with a 160ps 50% hold phase period and 35ps logic delay in between the two stages, Fig. 3.5(a) shows the effect of the T/H buffer time constant with a 15ps comparator time constant. The 4 LSB redundancy from the soft-decision selection search algorithm allows relaxing of the T/H buffer time constant by 2X relative to a conventional binary search algorithm when SNDR is kept close to the ideal 37.6dB. Assuming an allocation of 40ps for T/H settling, Fig. 3.5(b) shows that the soft-decision selection search algorithm allows an increase in the comparator time constant by more than 50%.



Figure 3.5: (a) SNDR error vs. T/H buffer time constant and (b) SNDR vs. latch time constant for a conventional binary search algorithm and the proposed soft-decision selection algorithm.

The hardware overhead of implementing the soft-decision selection search algorithm includes the two dummy comparators at the full scale reference levels and the four SR latches in between the first stage comparators. Assuming a uniform input distribution, on

average no extra comparators will be triggered at the second stage due to 16 comparators being activated when  $V_{IN}$  falls within the [1/8VREF, 7/8VREF] range and 12 comparators otherwise.

### *3.2 ADC Architecture*

This section presents the detailed implementation of time-interleaved ADC, including time-interleaved architecture, unit ADC structure, front-end T/H, clock generation and skew calibration and shared input stage double tail latch.

## 3.2.1 Time-Interleaved Architecture

In order to prove the concept of the search algorithm introduced in the previous section, an 8-channel 25GS/s 6b ADC is implemented with 3.125GS/s unit ADCs employing the soft-decision selection algorithm. Fig. 3.6 shows the time-interleaved binary search ADC timing and block diagrams. The ADC input consists of eight front-end T/Hs, one per unit ADC, clocked by eight phases of 3.125GHz 50% duty cycle clocks with 40ps spacing. Calibration DACs are included for both sampling clock skew correction for the eight front-end T/H sampling phases and for the comparators' offset correction/threshold generation in all eight unit ADCs.

## 3.2.2 Unit ADC Structure

Fig. 3.6 shows the unit ADC structure, where the 2b first stage consists of five comparators at reference levels 0,  $1/4V_{REF}$ ,  $1/2V_{REF}$ ,  $3/4V_{REF}$  and  $V_{REF}$  and four SR latches inserted in between the comparators to generate interpolated levels in the time domain that controls the second-stage comparator selection. These second-stage comparators are segmented into nine comparator banks. Five banks of three (edge) or seven (middle)



Figure 3.6: Block diagram of the 8-way time-interleaved binary search ADC with soft-decision selection.



Figure 3.7: Front-end T/H schematic.

comparators with thresholds centered at reference  $1/32V_{REF}$ ,  $1/4V_{REF}$ ,  $1/2V_{REF}$ ,  $3/4V_{REF}$ , and  $31/32V_{REF}$  are activated by the SR latch outputs, while four banks of nine comparators with thresholds centered at  $1/8V_{REF}$ ,  $3/8V_{REF}$ ,  $5/8V_{REF}$ , and  $7/8V_{REF}$  are triggered by the first-stage voltage-domain comparator outputs. The second-stage selection logic is skewed intentionally to have a faster enable path delay than reset path delay to increase the



Figure 3.8: Block diagram of the front-end T/H sampling clocks generation, distribution and calibration.

available conversion time. All the comparator thresholds are set with a 3b reference ladder providing coarse input references and offset calibration DACs setting the equivalent references to the full 6b resolution. Finally, a MUX-based encoder converts the thermometer output from the second stage comparators to the final 6b binary output.

## 3.2.3 Front-End Track-and-Holds

The front-end T/H schematic is shown in Fig. 3.7. It consists of a bootstrapped switch clocked at 3.125 GHz followed by a source follower with an additional high-pass path for bandwidth extension. The front-end T/H architecture allows for a large input sampling bandwidth, as the sampling capacitor is just the input capacitance of the pseudo-differential

PMOS source-follower buffer stage. This buffer drives the loading capacitance of the core ADC and provides isolation from kick-back noise. Simulation results shows that with a 300mV input common-mode voltage and a 500 mV input swing, a linearity better than 6 bits is achieved up to a 12.5 GHz input bandwidth with a 3.125 GHz sampling clock.

## 3.2.4 Clock Generation and Skew Calibration

As shown in Fig. 3.8, eight equally-spaced sampling phases for the front-end T/H are generated from a 12.5 GHz differential input clock. A pseudo-differential self-biased input stage buffers the 12.5 GHz differential clock to drive a CML latch-based divide-by-4 stage which creates eight 3.125 GHz clock phases spaced at 40 ps. Delay lines with digitally-controlled MOS capacitor banks are employed in the 8-phase distribution network to calibrate the phase mismatches between the eight critical sampling phases. Measurement results verify that the clock skew calibration has a resolution of about 150 fs and allows for a maximum tuning range of 20 ps per phase.

# 3.2.5 Shared-Input Double-Tail Latch

In order to reduce T/H loading, Fig. 3.9(a) shows the schematic of a novel shared-input double-tail dynamic three latch structure utilized in the second stage of the unit ADCs. Each input stage is followed by three regenerative latches calibrated with 1LSB difference in threshold levels. Since the input transistors are often sized for a specific input offset variation level, the proposed structure reduces both the comparators' contribution of the T/H loading and kick-back noise by approximately 3X and the increased load at the first-stage output node does not significantly impact comparator performance. A 2b shared capacitive DAC at the first-stage output and an independent 6b resistive DAC at each regeneration stage allow setting of the comparators' threshold with a 2mV resolution and ±50mV tuning range relative to the course input reference ladder signal. Fig. 3.9(b) shows







Figure 3.9: Shared-input double-tail three latch structure (a) schematic and (b) Monte Carlo offset simulation results.

the Monte Carlo offset simulation of the proposed latch with a  $3\sigma$  value around 30mV, which is covered by the comparator offset tuning range.



Figure 3.10: Prototype ADC chip micrograph and core ADC floorplan.



Figure 3.11: Block diagram of foreground offset/reference calibration, clock skew calibration, and metastability measurement setup.

### 3.3 Measurement Results

Fig. 3.10 shows the GP 65nm CMOS chip micrograph and layout floorplan, which occupies a total active area of 0.24mm<sup>2</sup>. The core time-interleaved ADC, consisting of

eight unit-ADCs, occupies 0.21mm<sup>2</sup>, while the front-end T/H and clock generation blocks occupy 0.02mm<sup>2</sup> and 0.01mm<sup>2</sup>, respectively. Placing the eight front-end T/Hs close together near the differential input pads minimizes the input capacitance and routing from the 8-phase clock generator. The differential 12.5GHz clock input signal is distributed to the divider-based phase generator via an on-die differential transmission line. Local decoupling capacitors are placed with the reference ladders in each unit ADC to reduce the impact of kickback noise on the reference voltages.

Comparator offset/reference calibration and phase skew calibration are both done in the foreground as shown in Fig. 3.11. During the comparator offset/reference calibration, ideal DC reference levels are generated from off-chip and the corresponding comparator output is selected by MUXs and monitored via Labview from the sampling scope. A comparator's output is averaged and the calibration DAC code is adjusted automatically until this average reaches 0.5, which implies that the comparator is metastable and generating 50% 0's and 1's. The foreground skew calibration procedure is done in two steps. First, course phase tuning is performed by manually monitoring the muxed 8-phase clock output on the scope. Then a sinewave-input FFT-based foreground method [30] is employed for fine phase tuning.

Fig. 3.12 shows that after calibrating the comparator references among the eight unit ADCs and the phase errors of the eight sampling clocks, the 25GS/s ADC achieves 32.5dB low frequency maximum SNDR and 29.6dB SNDR at the 12.5GHz Nyquist, which translates to 5.10 bits and 4.62 bits ENOB, respectively. The ENOB at Nyquist is primarily limited by the 350fs jitter from the frequency synthesizer used as the input clock source. Fig. 3.14 shows the ADC output spectrum with 12.21GHz -1dBFS input before and after reference and skew calibration, which provides 10.1dB SNDR improvement.



Figure 3.12: ADC SNDR and SFDR vs. input frequency at  $f_s = 25$ GHz.



Figure 3.13: ADC SNDR and SFDR with sampling frequency of  $f_s = 15$ GS/s and 25GS/s and supply voltage of 0.9V, 1V and 1.1V.



Figure 3.14: 25GS/s ADC normalized output spectrum for  $f_{in} = 12.21$ GHz: (a) before and (b) after offset and skew calibration.

A sinewave histogram technique [31] is utilized for ADC static characterization. Fig. 3.15 shows that the maximum DNL and INL after reference and phase calibration are +0.39/-0.37 and +0.30/-0.38 LSB, respectively.



Figure 3.15: ADC DNL/INL plots.



Figure 3.16: ADC metastability error rate (MER) measurement.

Fig. 3.12 also shows the metastability measurement setup where an FMC XM105 debug card is connected to a Xilinx ML623 Virtex-6 FPGA used for data acquisition. A 100kHz sinusoidal input is applied to the ADC, such that consecutive samples have a difference less than 1LSB. The ADC metastability error rate (MER) characterization results are shown in Fig. 3.16. As the measured MER follows the erfc<sup>-1</sup> curve instead of the natural log curve, this implies that noise, rather than metastability errors, is limiting the MER results. This proves the effectiveness of the metastability tolerance with the soft-decision selection algorithm.

| Specification             | [6]                 | [7]                 | [8]                 | [20]                 | This Work           |
|---------------------------|---------------------|---------------------|---------------------|----------------------|---------------------|
| Technology                | 65nm                | 32nm SOI            | 28nm<br>FDSOI       | 32nm SOI             | 65nm                |
| Power Supply              | 1.5V                | 0.9V                | 1.05V/1.6V          | 1V/0.9V              | 1V                  |
| ADC<br>Structure          | TI-Flash            | TI-Flash            | TI-SAR              | TI-SAR               | TI-BS               |
| Sampling<br>Rate          | 16 GS/s             | 20 GS/s             | 46 GS/s             | 36GS/s               | 25 GS/s             |
| Resolution                | 6 bits              | 6 bits              | 6 bits              | 6 bits               | 6 bits              |
| ENOB @<br>Nyquist         | 4.36 bits           | 4.84 bits           | 3.89 bits           | 4.96 bits            | 4.62 bits           |
| Area                      | $1.47 \text{ mm}^2$ | $0.25 \text{ mm}^2$ | $0.14 \text{ mm}^2$ | $0.048 \text{ mm}^2$ | $0.24 \text{ mm}^2$ |
| Power                     | 435 mW              | 69.5 mW             | 381 mW              | 110 mW               | 88 mW               |
| MER<br>(Error>4LSB)       | N.A.                | N.A.                | <10-10              | N.A.                 | <10-10              |
| FOM<br>$(P/2^{ENOB}.f_s)$ | 1.3 pJ/cs.          | 124 fJ/cs.          | 560 fJ/cs.          | 98 fJ/cs.            | 143 fJ/cs.          |

Table 3.1: ADC performance summary



Figure 3.17: ADC performance summary with Walden FoM.

Table 3.1 summarizes the ADC performance and compares this work against recent 6b ADCs with sample rates ranging from 16 to 46GS/s. The ADC consumes 88mW power from a 1V supply, of which 63.9% is dissipated by the core ADC, 14.8% by the clock phase generation and 21.3% by the T/H, achieving a 143 fJ/conv.-step FOM. Relative to the flash converters, the proposed design achieves significant FOM improvement over the 16GS/s 65nm design [23] and 25% faster conversion speed and comparable performance to the 20GS/s 32nm SOI design [16]. Similar metastability tolerance is achieved at a lower FOM relative to the 28nm FDSOI SAR design which employs back-end hardware for metastability correction [21]. While the advanced 32nm SAR architecture of [16] achieves a better FOM, Fig. 3.17 shows that the performance of the presented 65nm prototype ADC falls near the 32nm design trend and achieves around 10X efficiency improvement compared with the 65nm design trend.

## 3.4 Conclusion

This Chapter has presented an 8-channel 25GS/s 6 bit time-interleaving ADC with the unit ADCs employing a 2b-4b two-stage binary search structure to achieve an increased sampling rate. A soft-decision selection search algorithm is implemented with very low overhead to relax T/H bandwidth requirements and improve ADC metastability performance. T/H loading and kick-back noise is reduced with a shared-input double-tail three latch structure. Measurements verify that the soft-decision selection search algorithm delivers robust ADC performance with a relaxed T/H and comparator design. Overall, the presented design achieves good power efficiency, making it a suitable architecture for a 50Gb/s PAM4 wireline receiver.

# 4. 52GB/S PAM-4 ADC-BASED RECEIVER WITH A 6-BIT 26GS/S TIME-INTERLEAVED SAR ADC

This section introduces an ADC-based PAM-4 receiver employing a 32-way timeinterleaved, 2-bit/stage 6-bit SAR ADC. Reference levels for each of the three 2-bit stages scale according to the stage, which enables the utilization of a single reference DAC reduce the area overhead as well as the loading on the buffer in the track and hold stage. Input stages of the 3 comparators making up each 2-bit stage are shared to reduce the parasitic capacitance loading on the reference DAC. A 3-tap FFE embedded in the ADC using a non-binary FFE DAC along with a programmable CTLE provide partial analog equalization prior to the quantization operation in the ADC. The partial analog equalization allows the placement of a Mueller-Muller phase detector directly at the ADC output to avoid excessive loop delay. The DSP employs a 12-tap FFE and a 2-tap partially-unrolled DFE with low complexity.

# 4.1 Receiver Architecture

Fig. 4.1 shows the full ADC-based receiver architecture. The 4-stage CTLE-VGA front-end consists of 2 stages of CTLE, a gain stage and a source follower. Programmable capacitor banks in the CTLE provides 5-15 dB of gain peaking. The resistor in the second stage CTLE allows for variable DC gain and is used to ensure the CTLE/VGA front end output swing spans the FSR of the ADC. By reducing the strength of the ISI components and boosting the main cursor, the CTLE increases the ratio of the main cursor to the quantization noise without the need for a higher resolution ADC. The output of the CTLE drives an 8-way parallel T/H circuit.



Figure 4.1: Block diagram of an ADC-based PAM-4 receiver with CTLE/VGA, a 6-bit TI-SAR ADC, DSP and CDR.

## 4.2 6-Bit 26GS/s Time-Interleaved SAR ADC with 3-Tap Embedded FFE

While TI-SAR are popular ADC architectures in ADC-based RX applications for its excellent power efficiency [6, 17, 33], further power savings are desirable in order not to degrade the overall RX efficiency. This section details the design of a 32-way 6 bit 26 GS/s asynchronous low-overhead 2bit/stage TI-SAR ADC with a 3-tap embedded FFE implemented with non-binary capacitive DAC for improved FFE coefficient coverage.



Figure 4.2: Block diagram of 32-way 6 bit 26GS/s 2b/stage TI-SAR ADC with embedded 3-tap FFE.



Figure 4.3: Block and timing diagram of phase generator and 8-way front-end T/H.

## 4.2.1 ADC Architecture

Fig. 4.2 and Fig. 4.3 show the block diagram and timing diagram of the 32-way 6 bit 26 GS/s converter with 3-tap embedded FFE. The front-end T/H consists of 8 sub channels working at  $f_s/8 = 3.25$  GS/s, and each sub T/H drives 4 unit asynchronous 2b/stage SARs operating at  $f_s/32 = 812.5$  MS/s. The 8 parallel T/Hs are clocked with eight 3.25 GHz critical sampling phases with a 50% duty cycle and 38.5 ps spacing. These critical sampling phases are generated from a differential 13 GHz clock divided by a CML-latch-based divide-by-4 block as shown in Fig. 4.3. The 8 phases spaced at 38.5 ps are then passed



Figure 4.4: T/H circuit showing the gain boosted source follower and the bootstrap switch.

through a bank through 64 current mode phase interpolators controlled by the CDR filter. The outputs of the phase interpolators are skew calibrated by a digitally controlled capacitor banks, with a range of 25 ps and phase tuning resolution of 90 fs. The offset and gain error of the TI-ADC are calibrated with the comparator offset/reference calibration as well as gain control in T/H buffers. Dedicated embedded FFE DACs are employed to sample pre and post cursors from adjacent T/H channels and carry out weighted sum operation. The 32-channel 6 bit ADC outputs are retimed to a single 812.5 MHz clock domain at the ADC-DSP interface. A decimator down samples the TI-ADC output by a decimation factor of 33 for ADC characterization.

# 4.2.2 Gain-Boosted T/H with FVF

The front-end T/H schematic is shown in Fig. 4.4. The T/H circuit consists of a bootstrapped switch followed by a gain boosted FVF source follower. In the absence of



Figure 4.5: Block diagram of unit comparator-assisted 2b/stage SAR ADC with 3-tap embedded FFE.

gain boosting path (red), the T/H buffer has a DC gain of -2.1 dB while with the gain boosting, the buffer achieves a DC gain of 0 dB. With gain-boosted follower design, the linearity of entire front-end (including CTLE and T/H) is improved from -35dB to -39dB at Nyquist frequency with  $700 \text{mV}_{\text{ppd}}$  input swing.

## 4.2.3 Unit 2b/Stage Comparator-Assisted SAR ADC

Fig. 4.5 shows the block diagram of the unit 2b/stage asynchronous SAR ADC. The 2b/stage comparator-assisted structure employs three 2b flash quantizers, one at each conversion stage, with their references scaling from  $1/4 V_{REF}$ ,  $1/16 V_{REF}$  to  $1/64 V_{REF}$ . With the assistance of reference scaling from the comparators, only one reference DAC is needed



Figure 4.6: Timing diagram of unit comparator-assisted 2b/stage SAR ADC with 3-tap embedded FFE.

whereas at least two reference DACs are necessary in other 2b/stage SAR ADC designs [22, 34]. The main cursor input is sampled on the top plate of the reference DAC of each unit ADC, whereas bottom plate sampling is necessary for the pre and post cursor inputs

to be sampled on the FFE DAC at the same time. Top plate sampling of the main cursor prevents any signal attenuation caused by the comparator input capacitance and routing parasitics to maintain a good SNR at comparator input node.

The unit SAR ADC operates with 812.5 MHz clock generated by a local divde-by-4 circuit with a 25% duty cycle, which results in a 307.7 ns track phase for input main and pre/post cursor sampling and 923.1 ns hold phase for unit ADC conversion. The asynchronous SAR operation employs a loop-unrolled scheme [35], which relaxes the latch reset requirement as well as the SAR logic delay by simplifying the SAR logic from the latch output to the DAC switches. As shown in Fig. 4.6, RDY signals are generated from the outputs of the 1<sup>st</sup> and 2<sup>nd</sup> stage comparators as the trigger signals for the following stage comparators and can be delay-tuned to accommodate for sufficient DAC settling ( $t_{DAC,1}$  and  $t_{DAC,2}$ ) at each stage. A 3-bit thermometer code decision is made by the reference scaled comparators at each stage and fed back to a segmented 2-bit thermometer DAC directly without any decoder to avoid additional logic delay in the critical feedback path. All the comparators are reset during the ADC track phase and their outputs retimed at the same time.

A merged capacitor switching (MCS) scheme [36] and custom 4b DAC layout with a  $C_u = 1$  fF unit finger capacitor are employed in the DAC of each 6 bit unit SAR ADC, which allows for good DAC switching efficiency while maintaining adequate matching for 6 bit resolution [6].

# 4.2.4 Embedded 3-Tap FFE With Non-Binary DAC

Embedded FFE provides pre-equalization before quantization without amplifying the ADC quantization noise and reduces the ADC dynamic range requirement. As shown in


Figure 4.7: Coverage maps of embedded FFE coefficient (a) with a binary weighted FFE DAC (b) with a non-binary weighted FFE DAC.

Fig. 4.5, the embedded 3-tap FFE is implemented with a dedicated FFE DAC sampling the pre and post samples from front-end sub T/Hs on the bottom plate during the track phase with programmable weight ( $B_{0,-1}$  to  $B_{5,-1}$  for precursor and  $B_{0,1}$  to  $B_{5,1}$  for postcursor) to realize different FFE coefficients combinations [6]. The subtraction operation is realized by connecting the FFE DAC and main reference DAC to a 4-input comparator with opposite polarity. The use of dedicated FFE DACs does not introduce significant area or power overhead compared against the conventional implementation because the main input sampling and reference switching share one reference DAC in this design. Moreover, a standalone FFE DAC allows for a non-binary DAC implementation, which prevents the limitation on the pre and post tap coefficient coverage for a binary weighted FFE DAC [6].

Fig. 4.6 shows the FFE tap coefficient map of a customized 10-8-6-4-2-1 weight FFE DAC compared with that of a binary weighted FFE DAC. The improved coefficient coverage with a non-binary FFE DAC enables better optimization of the embedded FFE coefficient for different channels, pre- and post-equalization configurations. Note that due to the bottom plate sampling scheme employed in FFE DACs, there is a gain mismatch between the FFE DAC path and the main cursor input path which is top plate sampled. The loading capacitance  $C_L$  from comparators and routing parasitic  $C_P$  result in a charge redistribution on the FFE DAC  $C_{DAC}$ , and the pre and post tap coefficient can be expressed as:

$$\beta_{-1} = -\frac{C_{DAC}}{C_{DAC} + C_L + C_P} \frac{10B_{5,-1} + 8B_{4,-1} + 6B_{3,-1} + 4B_{2,-1} + 2B_{1,-1} + 1B_{0,-1}}{32},$$
  
$$\beta_1 = -\frac{C_{DAC}}{C_{DAC} + C_L + C_P} \frac{10B_{5,1} + 8B_{4,1} + 6B_{3,1} + 4B_{2,1} + 2B_{1,1} + 1B_{0,1}}{32}.$$
 (4.1)

where B<sub>0,-1</sub> to B<sub>5,-1</sub> and B<sub>0,1</sub> to B<sub>5,1</sub> are binary control bits for precursor and postcursor tap



Figure 4.8: Schematic of a 2b reference scaling double-tail latch with shared input stage.

weight control respectively. Routing capacitance are carefully minimized in the layout, and  $C_{DAC}/(C_{DAC}+C_L+C_P)$  gives a gain attenuation of 0.48 from both post-layout simulation and measurement with  $C_{DAC}$  = 32fF. This implies a maximum achievable sum of pre and post tap coefficient of -0.48, which is sufficient for all the channels (30dB loss) under investigation.

# 4.2.5 Reference Scaling Double-Tail Latch with Shared Input Stage

Conventional loop-unrolled SAR ADC structure requires 9 comparators for a 2b/stage scheme, which leads to large capacitive loading  $C_L$  and kickback to the main reference and FFE DAC. To reduce the kickback and impact of FFE gain attenuation from comparator input loading  $C_L$ , a reference scaling 2b flash quantizer is employed in the unit ADC. The 2b flash quantizer shown in Fig. 4.8 employs a shared input double tail latch structure [37], where the input dynamic preamp stage is shared among 3 regeneration stage, which reduces the kickback and loading to the DACs by 3x. The NMOS pseudodifferential pair at the regeneration stage of latch 1 and 3 are symmetrically skewed intentionally in size to generate the required nominal reference levels for different stages with a given input FSR ( $\pm 1/2 V_{REF}$ ), i.e.  $\pm 1/4 V_{REF}$ ,  $\pm 1/16 V_{REF}$  and  $\pm 1/64 V_{REF}$  for the 1<sup>st</sup>, 2<sup>nd</sup> and final stage respectively, while the same NMOS pair in latch 2 is balanced to generate a nominal zero-level threshold. All the NMOS pseudodifferential pairs in 3 latches are source degenerated by a thermometer resistive DAC to provide offset calibration for the three latches. The 5b calibration DAC provides a  $\pm 30$ mV range and 1.9mV resolution, which is less than 1/4 LSB.

# 4.3 Measurement Results

Fig. 4.9 shows the chip micrograph of the PAM-4 ADC based prototype fabricated in



Figure 4.9: Chip micrograph of 52Gb/s ADC-based receive in 65nm CMOS.

GP 65nm process. The total chip area is 2.61mm<sup>2</sup> with the core ADC and the DSP occupying 0.41mm<sup>2</sup> and 1.17mm<sup>2</sup> respectively. A set of two high speed output buffers with a multiplexer at the input that can select either the ADC output or the DSP output to help characterize the ADC and the DSP separately.

The receiver prototype is characterized at two different data rates, 32Gb/s and 52Gb/s respectively.

# 4.3.1 ADC Characterization

Before ADC characterization, termination resistor tuning and CTLE-VGA analog front-end offset calibration is performed. Each of the 284 comparator offsets in the TI-ADC are calibrated by applying reference DC inputs that equal to the desired thresholds for each comparator at different unit ADC stages and tuning the threshold of the comparator until an equal distribution of ones and zeroes are obtained at the comparator



(a)



Figure 4.10: SNDR/SFDR vs input frequency with (a)  $f_s=13$ GS/s and (b)  $f_s=26$ GS/s.



(a)



Figure 4.11: INL/DNL plot.

output. Skew between the different sampling phases is calibrated using a foreground technique where a sinusoid at ADC Nyquist frequency is applied and the spurs in the ADC output spectrum are minimized by digitally tuning a bank of variable delay lines. Gain



(a)



(b)

Fig. 4.12: 32Gb/s receiver results (a) BER timing bathtub curves (b) Recovered clock jitter histogram.



Fig. 4.13: 52Gb/s receiver characterization: BER timing bathtub curves for 25dB and 31dB loss channels.

calibration is performed by tuning the bias current of the T/H source follower buffer. Fig. 4.10 shows the SNDR and SFDR as a function of the input frequency for a sampling rate of 16GS/s and 26GS/s respectively.

At a sample rate of 16GS/s, the achieved ENOB is 4.74 and 4.29 bits at low-frequency and the 8 GHz Nyquist frequency, respectively. At 32GS/s, a low frequency SNDR of 30.29 dB giving an ENOB of 4.74 bits and a high frequency SNDR of 26.4 giving an ENOB of 4.05 bits is achieved. The high frequency SNDR is limited by residual timing skew and clock jitter. Fig. 4.11 shows that the maximum DNL and INL values for the ADC are -0.22LSB and -0.53LSB.

# 4.3.2 Receiver Characterization

32 Gb/s PAM-4 data without any transmit equalization and a swing of 800mVppd is utilized for the BER measurement results shown in Fig. 4.12(a). The timing bathtub curves are obtained by stepping the phase interpolator codes with the CDR in open-loop. A BER less than 10-11 is achieved for a 27dB loss channel and a BER of less than 10-9 is achieved for a 30dB loss channel. Results with the CDR activated are also shown for the 27dB loss channel, verifying that the CDR locks near the optimal BER point. Fig. 4.12(b) shows a recovered clock jitter of 939fsrms for the recovered clock in this testing condition.

52 Gb/s PAM-4 data without any transmit equalization and a swing of 700mVppd is utilized for the BER measurement results in Fig. 4.13(a). The timing bathtub curves are obtained with the same steps for 32Gb/s characterization. A BER of less than 10-8 is achieved for a 25dB loss channel while a BER less than 10-5 is achieved for a 31dB loss channel. Results with the CDR activated are also shown for the 25dB loss channel, verifying that the CDR locks near the optimal BER point.

Table 4.1 shows the receiver performance summary, where the 65nm ADC-based receiver prototype is compared against other state-of-the-art PAM-4 ADC-based receivers. The 65nm design operates at a comparable data rate over the same 31dB loss channel compared with the 16nm FinFET design while achieving a 30% improvement on analog front-end power efficiency without use of any transmitter equalization and a 3dB lower transmitter swing.

# 4.4 Conclusion

A 52 Gb/s PAM-4 ADC based receiver prototype making use of a comparator-assisted 26 GS/s 2b/stage TI-SAR ADC and a DSP equalizer with a 12-tap FFE and a 2-tap DFE is presented. Lab measurements show that without any transmitter equalization and a TX swing of 700mVppd, the ADC-based receiver achieves a BER less than 10-9 over a 30dB loss channel at 32Gb/s and 10-5 over a 31dB loss channel at 52Gb/s, with a receiver power

| Specification                                                                                                                                       | [38]                                                   |                                                                          | [5]                                            |                                                                      | [33]                                                          |                                                     | This Work                                                           |                                                                        |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------------------------|------------------------------------------------|----------------------------------------------------------------------|---------------------------------------------------------------|-----------------------------------------------------|---------------------------------------------------------------------|------------------------------------------------------------------------|
| Technology                                                                                                                                          | 65 nm                                                  |                                                                          | 28 nm                                          |                                                                      | 16nm                                                          |                                                     | 65nm                                                                |                                                                        |
|                                                                                                                                                     | CMOS                                                   |                                                                          | CMOS                                           |                                                                      | FinFET                                                        |                                                     | CMOS                                                                |                                                                        |
| Power Supply                                                                                                                                        | N/A                                                    |                                                                          | N/A                                            |                                                                      | 09, 1.2 and<br>1.8V                                           |                                                     | 1.1 and 0.9V                                                        |                                                                        |
| Data Rate                                                                                                                                           | 28 Gb/s                                                |                                                                          | 32 Gb/s                                        |                                                                      | 56 Gb/s                                                       |                                                     | 52 Gb/s                                                             |                                                                        |
| Modulation Format                                                                                                                                   | PAM-4                                                  |                                                                          | PAM-4                                          |                                                                      | PAM-4                                                         |                                                     | PAM-4                                                               |                                                                        |
| ADC Sample Rate                                                                                                                                     | 14 GS/s                                                |                                                                          | 16 GS/s                                        |                                                                      | 28 GS/s                                                       |                                                     | 26 GS/s                                                             |                                                                        |
| ADC Structure                                                                                                                                       | TI-Flash                                               |                                                                          | TI-SAR                                         |                                                                      | TI-SAR                                                        |                                                     | TI-SAR                                                              |                                                                        |
| Pre-Equalization                                                                                                                                    | Passive<br>CTLE                                        |                                                                          | CTLE                                           |                                                                      | CTLE                                                          |                                                     | CTLE + 3-                                                           |                                                                        |
|                                                                                                                                                     |                                                        |                                                                          |                                                |                                                                      |                                                               |                                                     | tap FFE                                                             |                                                                        |
| Post-Faualization                                                                                                                                   | 3 to 8-tap                                             |                                                                          | N/A                                            |                                                                      | 24-tap FFE +                                                  |                                                     | 12-tap FFE                                                          |                                                                        |
| Post-Haughtzation                                                                                                                                   |                                                        | -                                                                        |                                                |                                                                      |                                                               |                                                     | -                                                                   |                                                                        |
| Post-Equalization                                                                                                                                   | FF                                                     | FE                                                                       | 11                                             | /A                                                                   | 1-tap                                                         | DFE                                                 | + 2-tap                                                             | DFE                                                                    |
| Resolution (bit)                                                                                                                                    | FI<br>2 to 5                                           | FE<br>.5 bits                                                            | 8 t                                            | oits                                                                 | 1-tap<br>8 t                                                  | DFE<br>oits                                         | + 2-tap<br>6 b                                                      | DFE<br>oits                                                            |
| Post-Equalization         Resolution (bit)         ENOB @ Nyquist                                                                                   | FI<br>2 to 5<br>4.1                                    | FE<br>.5 bits<br>bits                                                    | 8 t<br>5.85                                    | oits<br>bits                                                         | 1-tap<br>8 t<br>4.9                                           | DFE<br>oits<br>bits                                 | + 2-tap<br>6 b<br>4.05                                              | o DFE<br>oits<br>bits                                                  |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area                                                                                     | FI<br>2 to 5<br>4.1<br>0.89                            | EE<br>.5 bits<br>bits<br>mm <sup>2</sup>                                 | 8 t<br>5.85                                    | bits<br>bits<br>mm <sup>2</sup>                                      | 1-tap<br>8 t<br>4.9<br>N/                                     | DFE<br>bits<br>bits<br>/A                           | + 2-tap<br>6 b<br>4.05<br>2.62                                      | b DFE<br>bits<br>bits<br>mm <sup>2</sup>                               |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area<br>Max. Compensated                                                                 | FI<br>2 to 5<br>4.1<br>0.89<br>30 d                    | E<br>5 bits<br>bits<br>mm <sup>2</sup><br>B @                            | 8 t<br>5.85<br>0.89<br>32 d                    | bits<br>bits<br>mm <sup>2</sup><br>B @                               | 1-tap<br>8 t<br>4.9<br>N/<br>31 d                             | DFE<br>bits<br>bits<br>/A<br>B @                    | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d                              | b DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @                         |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area<br>Max. Compensated<br>Channel Loss                                                 | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G              | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz                    | 8 t<br>5.85<br>0.89<br>32 d<br>8G              | bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz                         | 1-tap<br>8 t<br>4.9<br>N,<br>31 d<br>140                      | DFE<br>bits<br>/A<br>B @<br>6Hz                     | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d<br>130                       | b DFE<br>bits<br>bits<br>mm <sup>2</sup><br>B @<br>bHz                 |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area<br>Max. Compensated<br>Channel Loss<br>Analog Front-End +                           | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G              | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz                    | 8 t<br>5.85<br>0.89<br>32 d<br>8G              | bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz                         | 1-tap<br>8 t<br>4.9<br>N/<br>31 d<br>140                      | DFE<br>pits<br>bits<br>/A<br>B @<br>GHz<br>W        | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d<br>130                       | b DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @<br>Hz                   |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area<br>Max. Compensated<br>Channel Loss<br>Analog Front-End +<br>ADC Power              | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G<br>130       | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW              | 8 t<br>5.85<br>0.89<br>32 d<br>8G<br>320       | bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW                   | 1-tap<br>8 t<br>4.9<br>N,<br>31 d<br>140<br>370               | DFE<br>pits<br>bits<br>/A<br>B @<br>GHz<br>mW       | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d<br>130<br>236                | b DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW             |
| Post-Equalization<br>Resolution (bit)<br>ENOB @ Nyquist<br>Area<br>Max. Compensated<br>Channel Loss<br>Analog Front-End +<br>ADC Power<br>DSP Power | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G<br>130<br>N/ | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A        | 8 t<br>5.85<br>0.89<br>32 d<br>8G<br>320       | bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A             | 1-tap<br>8 t<br>4.9<br>N/<br>31 d<br>140<br>370<br>N.         | DFE<br>pits<br>bits<br>/A<br>B @<br>GHz<br>mW<br>A. | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d<br>13C<br>236<br>183         | b DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>mW       |
| Post-EqualizationResolution (bit)ENOB @ NyquistAreaMax. CompensatedChannel LossAnalog Front-End +ADC PowerDSP PowerPower Efficiency                 | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G<br>130<br>N/ | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A        | 8 t<br>5.85<br>0.89<br>32 d<br>8G<br>320<br>N  | har pits<br>pits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A | 1-tap<br>8 t<br>4.9<br>N,<br>31 d<br>140<br>370<br>N.         | DFE<br>pits<br>bits<br>/A<br>B @<br>GHz<br>mW<br>A. | + 2-taj<br>6 b<br>4.05<br>2.62<br>31 d<br>130<br>236<br>183         | b DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>mW       |
| Post-EqualizationResolution (bit)ENOB @ NyquistAreaMax. CompensatedChannel LossAnalog Front-End +ADC PowerDSP PowerPower Efficiency(pJ/bit)         | FI<br>2 to 5<br>4.1<br>0.89<br>30 d<br>7G<br>130<br>N/ | FE<br>.5 bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A<br>N/A | 8 t<br>5.85<br>0.89<br>32 d<br>8G<br>320<br>N, | bits<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>/A<br>N/A      | 1-tap<br>8 t<br>4.9<br>N/<br>31 d<br>140<br>370<br>N.<br>6.61 | DFE<br>bits<br>/A<br>B @<br>GHz<br>mW<br>A.<br>N/A  | + 2-tap<br>6 b<br>4.05<br>2.62<br>31 d<br>130<br>236<br>183<br>4.53 | DFE<br>its<br>bits<br>mm <sup>2</sup><br>B @<br>Hz<br>mW<br>mW<br>3.52 |

Table 4.1: Receiver performance summary

efficiency of 8.25pJ/bit and 8.06pJ/bit respectively. The analog front-end including the CTLE, clocking and ADC achieves a power efficiency of 5.19pJ/bit at 32Gb/s and 4.53pJ/bit at 52Gb/s respectively.

### 5. CONCLUSION AND FUTURE WORK\*

#### 5.1 Conclusion

Serial links which utilize an ADC receiver front-end offer a potential solution, as they enable more powerful and flexible DSP for equalization and symbol detection and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to PVT variations, benefits from improved area and power with CMOS technology scaling and offers easy design transfer between different technology nodes and thus improved time-to-market.

However, ADC-based receivers generally consume higher power relative to their mixed-signal counterparts because of the significant power consumed by conventional multi-GS/s ADC implementations. This dissertation presents two power-efficient >25GS/s ADC designs suitable for PAM-4 ADC-based receivers with data rate beyond 50Gb/s.

In the first prototype, an 8-channel 25GS/s 6-bit time-interleaving ADC with the unit ADCs employing a 2b-4b two-stage binary search structure is presented to achieve an increased sampling rate. A soft-decision selection search algorithm is implemented with very low overhead to relax T/H bandwidth requirements and improve ADC metastability performance. T/H loading and kick-back noise is reduced with a shared-input double-tail three latch structure. Measurements verify that the soft-decision selection search algorithm delivers robust ADC performance with a relaxed T/H and comparator design. Overall, the presented design achieves good power efficiency, making it a suitable architecture for a

<sup>\*</sup> Part of this chapter is reprinted with permission from "Reference switching pre-emphasisbased successive approximation register ADC with enhanced DAC settling," by S. Cai, Y. Zhu, S. Kiran, S. Hoyos, and S. Palermo, IET Electronics Letters, vol. 53, no. 20, pp. 1352-1354, Copyright 2017 by IET

50Gb/s PAM4 wireline receiver.

In the second prototype, a 52 Gb/s PAM-4 ADC based receiver prototype making use of a comparator-assisted 26 GS/s 2b/stage TI-SAR ADC and a DSP equalizer with a 12-tap FFE and a 2-tap low overhead DFE is presented. Lab measurements show that without any transmitter equalization and a TX swing of  $700 \text{mV}_{ppd}$ , the ADC-based receiver achieves a BER less than  $10^{-9}$  over a 30dB loss channel at 32Gb/s and  $10^{-5}$  over a 31dB loss channel at 52Gb/s, with receiver power efficiency of 8.25pJ/bit and 8.06pJ/bit respectively. The analog front-end including the CTLE, clocking and ADC achieves an power efficiency of 5.19pJ/bit at 32Gb/s and 4.53pJ/bit at 52Gb/s respectively.

# 5.2 Future Work

Conventional N-bit asynchronous SAR ADCs consist of a comparator, asynchronous SAR logic and a capacitive DAC, as shown in Fig. 5.1. The maximum conversion speed of an N-bit SAR ADC is limited by the comparator regeneration time, asynchronous SAR logic delay, and the DAC settling time as expressed below:

$$t_{conv} = \sum_{i=1}^{N-1} (t_{DAC,i} + t_{logic}) + \sum_{i=1}^{N} t_{comp,i}$$
(5.1)

where  $t_{conv}$  is the total SAR conversion time during the ADC hold phase,  $t_{DAC,i}$  and  $t_{comp,i}$  are the capacitive DAC settling time and the comparator regeneration time at the *i*-th conversion stage, and  $t_{logic}$  is the asynchronous SAR logic delay.

The timing diagram of the 1<sup>st</sup> stage SAR operation is shown in Fig. 5.2, where the 1<sup>st</sup> stage experiences the delay of  $t_{comp,1}$ ,  $t_{logic}$ , and  $t_{DAC,1}$  until the 2<sup>nd</sup> stage is triggered. A large portion of this delay is from the DAC settling and is dependent on both the reference search





Figure 5.1: Conventional N-bit single-ended asynchronous SAR ADC (a) block diagram and (b) critical timing path.

step and the ADC resolution. The DAC settling requirement at the *i*-th conversion stage is given as:

$$V_{e,i} = \frac{1}{2^{i+1}} V_{REF} e^{-\frac{t_{DAC,i}}{\tau_{DAC,i}}} < \frac{1}{2^{N+1}} V_{REF}$$

$$t_{DAC,i} > \tau_{DAC,i} \left(N - i\right) ln(2)$$
(5.2)

where  $V_{e,i}$  and  $\tau_{DAC,i}$  is the DAC error voltage and the RC time constant of the DAC reference switching at the *i*-th stage, VREF is the ADC full scale range, and N is the ADC resolution.

Eq. 5.2 shows that the worst case DAC settling delay happens at the 1st stage, assuming the time constant  $\tau_{DAC}$  at each stage is the same. The DAC settling time window at each stage should be set for this worst case settling delay, which increases with ADC resolution. An alternative SAR ADC design [40] scales the switches to match the settling delay from MSB to LSB path and assign the same delay margin for all N conversion stages.

SAR algorithms with redundancy are popular for ADCs with resolution beyond 10 bits [41]. An N-bit redundant SAR ADC requires M conversion steps (M>N) with overlapping search spaces, providing for tolerance to DAC settling errors during the SAR operation and error correction in the digital backend. However, the relaxation of the DAC settling delay at the cost of increasing the conversion steps can be detrimental for low-medium resolution SAR ADC designs. Moreover, redundant SAR ADCs either require a sub-radix-2 DAC with a backend decoder or a unitary DAC with ROM to generate the non-binary code, which results in large loop delay and hardware/power overhead.

Pre-emphasis signaling techniques are commonly employed in wireline transmitters to increase data rates over low bandwidth communication channels. The DAC reference switching operation in SAR ADCs can be modelled as a low pass RC network, similar to a bandwidth limited channel, and pre-emphasis can be applied to DAC reference switching to equalize the switch-cap channel and decrease the RC time constant.

A single-ended 5-bit capacitive DAC with top plate sampling and a  $V_{CM}$ -based switching scheme [42] is shown in Fig. 5.2. The three-state switches connect to GND when



Figure 5.2: RSP technique with a 5-bit single-ended capacitive DAC.

the control bit  $B_i=0$ ,  $0.5V_{REF}$  when  $B_i=-1$  and  $-0.5V_{REF}$  when  $B_i=1$ . Assuming ideal switches, the MSB cap switching scenario is presented in Fig. 5.2 with  $B_1$  switching from GND to  $-0.5V_{REF}$  and  $V_{DAC}$  settling to  $V_{in}-0.25V_{REF}$ . Only  $B_1$  is switched in the conventional DAC switching scheme, whereas the RSP switching scheme pulses the MSB-1 cap switch  $B_2$  during MSB switching to generate an FIR filter response at the DAC output ( $V_{DAC}$ ) with a post cursor tap coefficient  $\alpha$  of -0.5 ( $-0.125V_{REF}$ ). Note that the post cursor tap coefficient can be easily reconfigured to -1/2k by pulsing the MSB-k switch. Another design parameter in the proposed RSP technique is the pulse width  $\tau$  which should be cooptimized with  $\alpha$  to ensure a flat DAC settling response:



Figure 5.3: Time-domain settling of VCM-based MSB DAC settling for conventional and RSP schemes.

$$\left(1 - e^{-\frac{\tau}{\tau_{DACI}}}\right) - \alpha \left(1 - e^{-\frac{\tau}{\tau_{DAC2}}}\right) = 1$$

$$t_{DAC} = \tau = \tau_{DAC} ln \left(\frac{\alpha - 1}{\alpha}\right) if \tau_{DACI} = \tau_{DAC2} = \tau_{DAC}$$
(5.3)

where  $\tau_{DAC,1}$  and  $\tau_{DAC,2}$  are the time constants for the MSB path and RSP paths. These two time constants need to be matched to achieve optimal RSP settling and the optimized DAC settling delay ( $t_{DAC}$ ) is no longer a function of the ADC resolution N.

Fig. 5.3 shows the simulated V<sub>CM</sub>-based MSB path DAC settling with RSP scheme where  $\alpha$ =-0.25 (k=2) and  $\tau$ =1.6 $\tau$ <sub>DAC</sub> from Eq. 5.3. The time constants of the MSB and RSP (MSB-2) paths are matched to 20ps and the in-line resistance from the reference buffer is



Figure 5.4: Block diagram of the RSP technique for a 6-bit SAR ADC.

assumed to be negligible [18]. For the ideal RSP cases (dashed curve), the DAC output settles to the final value (0.25V<sub>REF</sub>) with a 9b resolution and a delay of  $1.6\tau_{DAC}$  (~32ps) after the DAC is switched, which results in a 71% (110ps to 32ps) improvement in DAC settling delay. In the presence of process variation, the switching time constant mismatch leads to an overdamped or underdamped settling response, which degrades the time savings from the RSP method. Monte-Carlo simulations show the  $3\sigma$  worst case variation of the settling curve (solid curves). Note that the settling time constant mismatch can be calibrated by tuning the pre-emphasis pulse width  $\tau$  to improve this settling time.

The block diagram of the RSP technique for a single-ended 6-bit SAR ADC with  $V_{CM}$ based switching scheme is shown in Fig. 5.4. This implementation only requires extra OR gates and edge detectors, which consist of AND gates and tunable delay lines, and can be



Figure 5.5: Normalized FoM and fs versus number of bits.

readily applied to any monotonic DAC switching scheme. Moreover, the RSP logic can be implemented in switch domain to minimize the delay overhead from the RSP logic. Notice that the RSP scheme does not have to be applied to all stages. Since the settling delay is relaxed down the SAR conversion stages, assuming a matched time constant for all stages, the RSP scheme can be employed until the point where the DAC settling delay with conventional switching scheme is less than the worst case RSP settling delay.

A conventional SAR, redundant SAR and RSP-based SAR are modelled with a few assumptions to benchmark the performance improvement with the proposed RSP scheme:

1. Time constants of DAC settling and comparator regeneration are assumed to be same.

2. Total comparator regeneration time is calculated based on worst case asynchronous operation [2].

3. The worst case MSB settling delay is used for settling in all stages.

4. SAR logic delay is assumed to be  $4\tau$  and the same for all stages.

5. The radix of the redundant SAR is assumed to be 1.85 for all resolutions.

6. The SAR ADC power is assumed to be proportional to the conversion stage number.

Fig. 5.5 shows the normalized FoM and fs of the three SAR structures versus the ADC resolution. The proposed RSP-based SAR ADC achieves up to 25% better power efficiency with faster speed than the other two structures from 4 to 7 bits without calibration and 8 to 9 bits after calibration for RSP settling mismatch.

In conclusion, an RSP technique is proposed to enhance DAC settling speed and break the dependence of DAC settling delay on ADC resolution. The RSP-based SAR ADC can be implemented with very low overhead, operate at faster speed, and achieve better power efficiency relative to conventional and redundant SAR ADCs, making it an ideal structure for unit ADC implementation in a TI-ADC.

## REFERENCES

- [1] <u>https://en.wikipedia.org/wiki/List\_of\_device\_bit\_rates</u>
- [2] <u>https://www.semiconductors.org/clientuploads/Research\_Technology/ITRS/2007/</u> Assembly%20&%20Packaging.pdf
- [3] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, "A 12.5Gb/s SerDes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery," in *ISSCC Dig. Tech. Papers*, pp. 436-437, Feb. 2007.
- [4] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman,
  D. Pi, B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, "A 500 mW ADC-based
  CMOS AFE with digital calibration for 10 Gb/s serial links over KR-backplane and
  multimode fiber," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1172–1185, Jun.
  2010.
- [5] B. Zhang, A. Nazemi, A. Garg, N. Kocaman, M. R. Ahmadi, M. Khanpour, H. Zhang,
   J. Cao, and A. Momtaz, "A 40nm CMOS 195mW/55mW dual-path receiver AFE for multi-standard 8.5-11.5 Gb/s serial links," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 426–439, Feb. 2015.
- [6] A. Shafik, E. Zhian Tabasy, S. Cai, K. Lee, S. Hoyos, and S. Palermo, "A 10Gb/s hybrid ADC-based receiver with embedded analog and per-symbol dynamically-

enabled digital equalization," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 671–685, Mar. 2016.

- [7] E.-H. Chen, R. Yousry, and C.-K. Yang, "Power optimized ADC-based serial link receiver," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, Apr. 2012.
- [8] Y. Lin, M. Keel, A. Faust, A. Xu, N. Shanbhag, E. Rosenbaum, and A. Singer, "A Study of BER-Optimal ADC-Based Receiver for Serial Links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 5, pp. 693–704, May 2016.
- [9] L. Wang, Y. Fu, M. LaCroix, E. Chong, and A. Chan Carusone, "A 64Gb/s PAM-4 Transceiver Utilizing an Adaptive Threshold ADC in 16nm FinFET," in ISSCC Dig. Tech. Papers, pp., Feb. 2018.
- B. Min, K. Lee, and S. Palermo, "A 20 Gb/s triple-mode (PAM-2, PAM-4, and duobinary) transmitter," *Microelectronics Journal*, vol. 43, no. 10, pp. 687–696, Oct. 2012.
- [11] R. Boesch, K. Zheng, and B. Murmann, "A 0.003 mm<sup>2</sup> 5.2 mW/tap 20 GBd Inductor-less 5-Tap Analog RX-FFE," in Symp. VLSI circuits Dig., pp. 170–171, June 2016.
- [12] A. Shafik, K. Lee, E. Zhian Tabasy, and S. Palermo, "Embedded equalization for ADC-based serial I/O receivers," in IEEE EPEPS, pp. 139–142, Oct. 2011.
- [13] S. Rylov, T. Beukema, Z. Toprak-Deniz, T. Toifl, Y. Liu, A. Agrawal, P. Buchmann, A. Rylyakov, M. Beakes, B. Parker, and M. Meghelli, "A 25Gb/s ADC-

based serial line receiver in 32nm CMOS SOI," in ISSCC Dig. Tech. Papers, pp. 56-57, Feb. 2016.

- [14] S. Cai, E. Zhian Tabasy, A. Shafik, S, Kiran, S. Hoyos, and S. Palermo, "A 25GS/s
  6b TI binary search ADC with soft-decision selection in 65nm CMOS," in Proc.
  IEEE Symp. VLSI Circuits, pp. C158-C159, June 2015.
- [15] A. Varzaghani, A. Kasapi, D. Loizos, S.-H. Paik, S. Verma, S. Zogopoulos, and S. Sidiropoulos, "A 10.3-GS/s, 6-Bit flash ADC for 10G ethernet applications," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3038-3048, Dec. 2013.
- [16] V. H.-C. Chen and L. Pileggi, "A 69.5mW 20GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32nm CMOS," IEEE J. Solid-State Circuits, vol. 49, no. 12, pp. 2891–2901, Nov. 2014.
- [17] D. Cui, H. Zhang, N. Huang, A. Nazemi, B. Catli, H. Rhew, B. Zhang, A. Momtaz, and J. Cao, "A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with programmable gain control and analog peaking in 28nm CMOS," in ISSCC Dig. Tech. Papers, pp. 58-59, Feb. 2016.
- [18] L. Kull, J. Pliva, T. Toifl, M. Schmatz, P. Francese, C. Menolfi, M. Brandli, M. Kossel, T. Morf, T. Andersen, and Y. Leblebici, "Implementation of low-power 6–8b 30–90GS/s time-interleaved ADCs with optimized input bandwidth in 32 nm CMOS," IEEE J. Solid-State Circuits, vol. 51, no. 3, pp. 636–648, Mar. 2016.
- [19] K. Gopalakrishnan, A. Ren, A. Tan, A. Farhood, A. Tiruvur, B. Helal, C.-F. Loi,
  C. Jiang, H. Cirit, I. Quek, J. Riani, J. Gorecki, J, Wu, J. Pernillo, L. Tse, M. Le,
  M. Ranjbar, P.-S. Wong, P. Khandelwal, R. Narayanan, R. Mohanavelu, S.

Herlekar, S. Bhoja, and V. Shvydun, "A 40/50/100Gb/s PAM-4 ethernet transceiver in 28nm CMOS," in ISSCC Dig. Tech. Papers, pp. 62-63, Feb. 2016.

- [20] I. Dedic, "56Gs/s ADC: Enabling 100GbE," in Proc. IEEE-OSA Optical Fiber Commun. Conf., Mar. 2010, OThT6.
- [21] Y. Duan and E. Alon, "A 6b 46GS/s ADC with >23GHz BW and sparkle-code error correction," in Proc. IEEE Symp. VLSI Circuits, pp. C162-C163, June 2015.
- [22] Z. Cao, S. Yan, and Y. Li, "A 32mW 1.25 GS/s 6b 2b/step SAR ADC in 0.13um
   CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 3, pp. 862-873, Mar. 2009.
- [23] C.-C. Huang, C.-Y. Wang, and J.-T. Wu, "A CMOS 6-bit 16-GS/s timeinterleaved ADC using digital background calibration techniques," IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 848–858, Mar. 2011.
- [24] G. Van der Plas and B. Verbruggen, "A 150 MS/s 133uW 7b ADC in 90nm digital CMOS using a comparator-based asynchronous binary-search sub-ADC", in ISSCC Dig. Tech. Papers, pp. 242-610, Feb. 2008.
- [25] J. E. Eklund and C. Svensson, "Influence of metastability errors on SNR in successive approximation A/D converters", Analog Integr. Circuits Signal Process, vol. 26, no. 3, pp. 183–190, Mar. 2001.
- [26] S. Cai, A. Shafik, S, Kiran, E. Zhian Tabasy, S. Hoyos, and S. Palermo, "Statistical modeling of metastability in ADC-based serial I/O receivers," in IEEE EPEPS Conf., pp. 39-42, Oct. 2014.
- [27] Y.-Z. Lin, S.-J. Chang, Y.-T. Liu, C.-C. Liu, and G.-Y. Huang, "An asynchronous binary-search ADC architecture with reduced comparator count," IEEE Trans. Circuits Syst., pp. 1829-1837, vol. 44, no. 3, pp. 901-915, Mar. 2009.

- [28] W. Liu, P. Huang and Y. Chiu, "A 12-bit 45-MS/s 3-mW redundant successiveapproximation-register analog-to-digital converter with digital calibration," IEEE
   J. Solid-State Circuits, vol. 44, no. 11, pp. 2661-2672, Aug. 2011
- [29] Y.-S Shu, "A 6b 3GS/s 11mW Fully Dynamic ADC in 40nm CMOS with Reduced Number of comparators," in Proc. IEEE Symp. VLSI Circuits, pp. 26-27, June 2012.
- [30] E. Zhian Tabasy, A. Shafik, S. Huang, N. H.-W. Yang, S. Hoyos, and S. Palermo,
  "A 6-b 1.6-GS/s ADC with redundant cycle one-tap embedded DFE in 90-nm
  CMOS," IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1885–1897, Aug. 2013.
- [31] W. Kester, The Data Conversion Handbook. Burlington, MA, USA: Newnes, 2005.
- [32] L. Kull, J. Pliva, T. Toifl, M. Schmatz, P.A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T.M. Andersen, and Y. Leblebici, "A 110 mW 6 bit 36 GS/s interleaved SAR ADC for 100 GBE occupying 0.048 mm2 in 32 nm SOI CMOS," in Proc. IEEE A-SSCC, pp. 89-92, Nov. 2014.
- [33] Y. Frans et al., "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1101–1110, Apr. 2017.
- [34] Y. Zhu et al., "A 3.8mW 8b 1GS/s 2b/cycle interleaving SAR ADC with compact DAC structure", in Proc. IEEE Symp. VLSI Circuits, pp. 86-87, June 2012.
- [35] T. Jiang et al., "A Single-Channel, 1.25-GS/s, 6-bit, 6.08-mW Asynchronous Successive-Approximation ADC With Improved Feedback Delay in 40-nm CMOS", IEEE J. Solid-State Circuits, vol. 47, pp. 2444–2453, July 2012.

- [36] V. Hariprasath, J. Guerber, S.-H. Lee, and U.-K. Moon, "Merged capacitor switching based SAR ADC with highest switching energy-efficiency," IET Electronic Letters, vol. 46, no. 9, April 2010.
- [37] S. Cai, E. Zhian Tabasy, A. Shafik, S, Kiran, S. Hoyos, and S. Palermo, "A 25 GS/s
  6b TI two-stage multi-bit search ADC with soft-decision selection in 65nm
  CMOS," IEEE J. Solid-State Circuits, vol. 52, pp. 2168–2179, April 2017.
- [38] Aurangozeb et al., "Channel adaptive ADC and TDC for 28 Gb/s PAM-4 digital receiver," in IEEE CICC, April 2017, pp. 1–4.
- [39] T. Ogawa, H. Kobayashi, Y. Takahashi, N. Takai, M. Hotta, H. San, T. Matsuura,
   A. Abe, K. Yagi, and T. Mori, "SAR ADC algorithm with redundancy and digital error correction," IEICE Trans. Fundamentals, vol. E93-A, no. 2, pp. 415–423, Feb. 2010.
- [40] J. Yang, T. Naing, and R. Broderson, "A 1GS/s 6 bit 6.7mW successive approximation ADC using asynchronous processing," IEEE J. Solid-State Circuits, vol. 45, no. 8, pp. 1469–1478, Aug. 2010.
- [41] F. Kuttner, "A 1.2V 10b 20MSamples/s non-binary successive approximation ADC in 0.13um CMOS," in ISSCC Dig. Tech. Papers, pp. 136–137, Feb. 2002.
- [42] F. MALOBERTI, DATA CONVERTERS. DORDRECHT, THE NETHERLANDS: SPRINGER ACADEMIC PUBLISHING, 2007.
- [43] R. M. Gray, "Toeplitz and circulant matrices: A review," Foundations and Trends in communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006.
- [44] J. Bulzacchelli, C. Menolfi, T. Beukema, D. Storaska, J. Hertle, D. Hanson, P.-H.Hsieh, S. Rylov, D. Furrer, D. Gardellini, A. Prati, T. Morf, V. Sharma, R. Kelkar,

H. Ainspan, W. Kelly, L. Chieco, G. Ritter, J. Sorice, J. Garlett, R. Callan, M. Brandli, P. Buchmann, M. Kossel, T. Toifl, and D. Friedman, "A 28-Gb/s 4-Tap FFE/15-Tap DFE serial link transceiver in 32-nm SOI CMOS technology," IEEE J. Solid-State Circuits, vol. 47, no. 12, pp. 3232–3248, Dec. 2012.