### AN ABSTRACT OF THE DISSERTATION OF

Yusang Chun for the degree of Doctor of Philosophy in Electrical and Computer Engineering presented on June 17, 2020.

Title: <u>Design of Energy-Efficient Equalization and Data Encoding/Decoding</u> Techniques for Wireline Communication Systems

Abstract approved:\_

Tejasvi Anand

Ever increasing global internet data traffic has driven up the demand for cuttingedge high-speed wireline communication systems including SerDes PHY for various interfaces, interconnects, data centers servers and switches in optical systems. Operating wireline communications at higher data rates leads to signals suffering from greater channel loss and exponential increase in power consumption, mainly caused by a heavier amount of required equalization.

In this dissertation, two distinct methodologies for designing SerDes transceivers are presented: 1) a pulse width modulated (PWM) time-domain feed forward equalizer (FFE) and linearity improvement technique for higher-order pulse amplitude modulation (PAM) including PAM-8, and 2) an inter-symbol interference (ISI)-resilient data encoding and decoding technique with Dicode encoding and error correction logic for low-bandwidth wireline channels, as an alternative strategy for communicating in an energy-efficient way on bandwidth-limited wireline channels without using conventional equalizers or filters. The first topic is a PAM-8 wireline transceiver with receiver-side pulse-widthmodulated (PWM) or time-domain based feed forward equalization (FFE) technique. The receiver converts voltage-modulated signals or PAM signals to PWM signals and processes them using inverter based delay elements having rail to rail voltage swing. Time-to-voltage and voltage-to-time converters are designed to have nonlinearity with opposite signs with the aim of achieving higher front-end linearity on the receiver. The proposed PAM-8 transceiver can operate from 12.0 Gb/s to 39.6 Gb/s and compensates 14 dB loss at 6.6 GHz with an efficiency of 8.66 pJ/bit in 65 nm CMOS.

The second topic is an alternative strategy for communicating on bandwidthlimited wireline channels without using conventional equalizers or filters (FFE, DFE, and CTLE): Inter-symbol interference (ISI) resilient Dicode encoding and error correction for low-bandwidth wireline channels. The key observation is that Dicode-encoded data have no consecutive 1s or -1s. With this known information, the error correction logic at the receiver can correct multi-bit errors due to ISI. Implemented in 65 nm CMOS, the proposed digital encoding and decoding approach can achieve BER less than  $10^{-12}$  while communicating on a channel with an insertion loss of 24.2 dB and 21.4 dB with 2.56 pJ/bit and 2.66 pJ/bit efficiency while operating at 13.6 Gb/s and 16 Gb/s, respectively. ©Copyright by Yusang Chun June 17, 2020 All Rights Reserved

### Design of Energy-Efficient Equalization and Data Encoding/Decoding Techniques for Wireline Communication Systems

by

Yusang Chun

### A DISSERTATION

submitted to

Oregon State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Presented June 17, 2020 Commencement June 2021 Doctor of Philosophy dissertation of Yusang Chun presented on June 17, 2020.

APPROVED:

Major Professor, representing Electrical and Computer Engineering

Head of the School of Electrical Engineering and Computer Science

Dean of the Graduate School

I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request.

Yusang Chun, Author

### ACKNOWLEDGEMENTS

There were times during my Ph.D. program that I became fragile and vulnerable, although I don't regret that I have chosen this path as it helped me grow and mature both academically and personally. Thanks to the support from professors, mentors, colleagues, friends, and family, I managed to hold on from the beginning to even until this very moment. I am truly honored to express my gratitude toward all the people that help me finish my degree.

Above all, I am the most grateful to my advisor, Professor Tejasvi Anand. His genuine enthusiasm to research inspired me the most to help me stay focused. Not only he taught me academic knowledge from the bottom, but also there were considerate lessons that he shared with me as a person who truly understands and sympathizes. I feel really honored to be his first Ph.D. graduate. I also thank Professor Un-Ku Moon, Professor Kartikeya Mayaram, and Professor Arun Natarajan. I feel sincerely honored to have the greatest professors to be the committee members. My gratitude also goes to Professor Hyun Seok Lee for being kind enough to be my respectable Graduate Council Representative.

Even though I might not be his most proud student, I would like to show my deepest respect to my master's advisor Professor Deog-Kyoon Jeong from Seoul National University as well. When I first met him in 2011, I was just one undergraduate student wanting to pursue an analog circuit design career, knowing nothing about it. Even long after I graduated, he still leads me to all the opportunities and learnings without anything in return. None of my Ph.D. work would have been possible without the help of my colleagues, especially Ashwin Ramachandran and Mohamed Megahed, who have been my second advisors. Also, Hyunkyu Ouh and Jinyong Kim were kind enough to be my brothers and helped overcome homesick together. I also thank all the colleagues from OSU analog and mixed signal (AMS) group from the past to the present including Calvin Lee, Subramanian T R, Abhishekh Devaraj, Zhiping Wang, and Xiaohui Lin. I will never forget all the memories we shared here in our office and lab.

I would like to thank Yohan Frans, Jay Im, Kevin Zheng, Haritha Eachempatti, and Xilinx SerDes team who generously provided me with the most valuable internship opportunity. I feel lucky to have been with the nicest mentors and atmosphere, and I truly enjoyed every moment working and interacting with the team. My special thanks go to Kuan-Chang Xavier Chen who made the internship period much more enjoyable. I deeply appreciate Jaeduk Han and Dongwook Kim who shared their most valuable advices and information whenever necessary. Lastly, I have nothing but sincere gratitude to Tamer Ali and MediaTek team members for accepting me in this unprecedented era. I am thrilled to join this great company and contribute whatever I am capable of.

To all my friends, classmates, and colleagues in Korea, I thank you all for welcoming me whenever I visit as if I were there in the first place. As much as I miss the old times spending time with you, it also provides me with the energy to move forward. To my brother Daein, I am glad that I have someone I can trust as much as I trust you. I wish you all the best with the residency program as I am certain that you will be a great internist. To Minjae, Donghyung and Seungmin, and all the other friends I have in Corvallis, I thank them for making my time in Corvallis much less depressing.

Finally, to my family, there is simply no word that could possibly describe how thankful I am to the greatest love you gave to me. I am always too foolish to realize how important the family means to me before leaving them. I hope we can see each other more often as I will live in a bigger city. I will be trying my best to serve and care after you from now on. In loving memory of my grandfather and grandmother who passed away during my Ph.D., I was so lucky that I manage to stay healthy under your care. As I am working on my final defense and dissertation, human being is facing the unprecedented Coronavirus outbreak. Hope we all get through this soon and go back to the normal life that we were so used to before. Thank you very much.

## TABLE OF CONTENTS

1

 $\mathbf{2}$ 

3

| Introduction                                                                                                                                                       | 1  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1 Introduction to PAM-8 Wireline Transceiver with Time-Domain Equal-<br>ization and Linearity Improvement Technique                                              | 1  |
| 1.2 Introduction to Transceiver Design with ISI-Resilient Data Encoding<br>for Equalizer-Free Wireline Communication using Dicode Encoding and<br>Error Correction | 5  |
| PAM-8 Transceiver with Time-Domain (PWM) Feed Forward Equalization                                                                                                 |    |
| and Non-linearity Cancellation Technique                                                                                                                           | 8  |
| 2.1 Linearity Requirement on Pulse Amplitude Modulated Signal                                                                                                      | 8  |
| 2.2 Proposed Transceiver Architecture                                                                                                                              | 11 |
| 2.2.1 Track-and-Hold and Voltage-to-Time Converter                                                                                                                 | 12 |
| 2.2.2 FFE Delay Lines Using Inverters and a Replica DLL                                                                                                            | 13 |
| 2.2.3 Time-to-Voltage Converter                                                                                                                                    | 15 |
| 2.2.4 Effect of Timing Jitter on Time-to-Voltage Converter                                                                                                         | 16 |
| 2.2.5 Receiver Amplifier $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                                                                                     | 17 |
| 2.3 Proposed Linearity Improvement Technique                                                                                                                       | 19 |
| 2.3.1 Transfer Function of Voltage-to-Time Converter                                                                                                               | 19 |
| 2.3.2 Transfer Function of Time-to-Voltage Conversion $\ldots \ldots \ldots$                                                                                       | 21 |
| 2.3.3 Overall Receiver Front-End Transfer Function with Linearity                                                                                                  |    |
| Improvement Technique                                                                                                                                              | 23 |
| 2.3.4 Limits of Linearity Improvement                                                                                                                              | 24 |
| 2.4 Measurement Results                                                                                                                                            | 29 |
| Equalizer-Free Wireline Transceiver with ISI-Resilient Data Encoding for Wire-                                                                                     |    |
| line Communication – Dicode Encoding and Error Correction                                                                                                          | 34 |
| 3.1 Dicode as ISI Resilient Encoding and Error Correction Concept                                                                                                  | 34 |
| 3.2 Proposed Architecture                                                                                                                                          | 35 |

3.2.1 Pre-coder and Evolution of Transceiver Architecture . . . .

3.2.3 Noise Budget of Transceiver

3.2.4 Error Correction Logic-1

3.2.5 Error Correction Logic-2

#### Page

38

39

39

43

45

# TABLE OF CONTENTS (Continued)

| —                                                                       |    |
|-------------------------------------------------------------------------|----|
| 3.3 Comparison to Existing Techniques                                   | 48 |
| 3.3.1 Differences and Advantages over Feed Forward Equalizers (FFE)     |    |
| and Tomlinson-Harashima Pre-coding                                      | 48 |
| 3.3.2 Differences and Advantages over Decision Feedback Equaliza-       |    |
| tion $(DFE)$                                                            | 49 |
| 3.3.3 Differences and Advantages over Duobinary                         | 52 |
| 3.3.4 Differences with Line Coding Schemes                              | 52 |
| 3.3.5 Differences and Advantages over Prior Dicode Works $\ldots$       | 53 |
| 3.3.6 Differences with Conventional Forward Error Correction (FEC)      |    |
| Coding                                                                  | 54 |
| 3.4 Mathematical Limit of the Proposed Error Correction Scheme $\ldots$ | 55 |
| 3.5 Measurement Result                                                  | 63 |
| 4 Conclusions                                                           | 69 |
| Bibliography                                                            | 70 |

## Page

## LIST OF FIGURES

| Figure |                                                                                                                                                                                                                                                                                                                                                                                                               | Page       |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| 1.1    | Conceptual diagram of the proposed transceiver with time-domain feed forward equalization and linearity improvement technique.                                                                                                                                                                                                                                                                                | 4          |
| 2.1    | Relationship between ratio-of-level mismatch (RLM) and effective-numb<br>of-bits (ENOB) in PAM-4, 8, and 16                                                                                                                                                                                                                                                                                                   | oer-<br>10 |
| 2.2    | Proposed PAM-8 transceiver architecture with a time-domain 4-tap feed forward equalizer (FFE) on the receiver                                                                                                                                                                                                                                                                                                 | 11         |
| 2.3    | (a) Schematic of the single-ended track-and-hold and voltage-to-time converter and (b) timing diagram illustrating its operation                                                                                                                                                                                                                                                                              | 12         |
| 2.4    | Block diagram of the proposed (a) replica delay-locked loop, (b) 4-tap quarter-rate FFE summers (only I and Q lanes are presented in figure), and (c) programmable unit interval delay element                                                                                                                                                                                                                | 14         |
| 2.5    | Schematics of I-lane of the proposed time-to-voltage converter (TVC) in a quater rate receiver.                                                                                                                                                                                                                                                                                                               | 15         |
| 2.6    | (a) Simplified representation of one tap segment of the proposed time-<br>to-voltage converter (TVC). (b) Timing diagram of the TVC operation<br>and eye diagram of the output of the charge pump $V_{TVCP}$ or $V_{TVCN}$<br>without any input data jitter. (c) Timing diagram of the TVC oper-<br>ation and eye diagram of the output of the charge pump $V_{TVCP}$ or<br>$V_{TVCN}$ with input data jitter | 16         |
| 2.7    | Proposed linearity improvement technique in the receiver front-end<br>by using voltage-to-time and time-to-voltage converters with opposite<br>non-linearity.                                                                                                                                                                                                                                                 | 18         |
| 2.8    | <ul> <li>(a) Simplified schematic of a single-ended time-to-voltage converter.</li> <li>(b) Transfer function curves of time-to-voltage converter up and down current with the presence of channel length modulation. (c) Timing diagram of time-to-voltage converter input PWM signals and output voltage.</li> </ul>                                                                                        | 21         |
| 2.9    | Block diagram of two non-linear blocks with opposite non-linearity polarity in series.                                                                                                                                                                                                                                                                                                                        | 24         |

# LIST OF FIGURES (Continued)

| Figure |                                                                                                                                                                                                                                                                                                                               | Page |
|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 2.10   | ENOB difference as a function of non-linear coefficients of each non-linear stage.                                                                                                                                                                                                                                            | 27   |
| 2.11   | Graph of normalized simulated transfer functions of voltage-to-time converter, time-to-voltage converter, and the overall receiver front-end.                                                                                                                                                                                 | 28   |
| 2.12   | (a) Simulated end-to-end SNDR and SFDR of the proposed receiver with 500mVpp Input, and (b) Simulated receiver output spectrum with Nyquist frequency input.                                                                                                                                                                  | 28   |
| 2.13   | (a) Prototype chip micrograph. (b) Area breakdown of the transceiver chip active area.                                                                                                                                                                                                                                        | 29   |
| 2.14   | Measurement setup of the transceiver with a zoomed photo of the re-<br>ceiver showing the chip-on-board                                                                                                                                                                                                                       | 30   |
| 2.15   | Insertion loss profile of the channel used for the measurement                                                                                                                                                                                                                                                                | 31   |
| 2.16   | Measured transceiver BER bathtub curves for 3 bits of PAM-8 at the receiver in PRBS-7 data (a) at 39.6 Gb/s and (b) at 36.0 Gb/s                                                                                                                                                                                              | 31   |
| 2.17   | Power breakdown of the proposed transceiver                                                                                                                                                                                                                                                                                   | 33   |
| 3.1    | (a) Effect of a bandwidth-limited channel on the transmitted pulse and generation of CIDs. (b) Proposed approach on correcting errors due to ISI by encoding data using Dicode to avoid CIDs                                                                                                                                  | 35   |
| 3.2    | Proposed transceiver block diagram.                                                                                                                                                                                                                                                                                           | 36   |
| 3.3    | (a) Block diagrams to explain the evolution of the proposed transceiver architecture with pre-coder. (b) Timing diagram of the precoded Di-<br>code architecture                                                                                                                                                              | 37   |
| 3.4    | <ul> <li>(a) Schematic of voltage amplifier stage in the receiver front-end.</li> <li>(b) Simulated frequency response of the overall amplifier stage.</li> <li>(c) Simulated DC linearity of single stage and aggregated voltage amplifier.</li> <li>(d) Simulated AC linearity of the receiver analog front-end.</li> </ul> | 40   |
| 3.5    | (a) Block diagram of the full-rate Dicode receiver. (b) Truth table of ECL-1 and Dicode decoder. (c) Timing diagram demonstrating the operation of ECL-1                                                                                                                                                                      | 41   |

# LIST OF FIGURES (Continued)

| Figure | ]                                                                                                                                                                                                                                                                                                         | Page |
|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 3.6    | (a) An example where ECL-1 fails with higher channel loss in the presence of both pre and post cursors. (b) An example where three consecutive data bits are processed by ECL-2 to correct the ISI errors.                                                                                                | 44   |
| 3.7    | <ul> <li>(a) Block diagram of the proposed error correction logic-2 (ECL-2).</li> <li>(b) Truth table of the programmable correction logic in ECL-2. (c) Different channel responses generating similar received (sampled) data with ISI errors.</li> </ul>                                               | 46   |
| 3.8    | (a) A feed forward equalizer based transceiver architecture. (b) A conventional Duobinary transceiver architecture with an FFE. (c) A transceiver architecture employing Machester, iPWM and CDC line-coding techniques. (d) Proposed Dicode encoding and error correction based transceiver architecture | 51   |
| 3.9    | (a) Conventional Dicode transceiver used on capacitor coupled high-<br>pass channel. (b) Proposed Dicode transceiver used on a typical band-<br>width limited wireline channel                                                                                                                            | 54   |
| 3.10   | (a) Voltage range of received signal for low channel loss where 2, 3-bit input error correction logic works. (b) Voltage range of received signal for high channel loss where 2, 3-bit error correction logic fails                                                                                       | 56   |
| 3.11   | (a) Simulated pulse response of channel in (3.7) at 10Gb/s, 15Gb/s and 20Gb/s. (b) Simulated response ['-1', '1', '-1'] pattern to show the position of the smallest main tap $(V_{SMT-P1})$ of signal '1' at 10Gb/s, 15Gb/s and 20Gb/s                                                                   | 57   |
| 3.12   | (a) Calculated magnitudes of cursors and $V_{SMT-P1}$ as function of nor-<br>malized data rate. (b) Calculated normalized data rate as a function<br>of number of error correction logic inputs                                                                                                           | 60   |
| 3.13   | (a) Simulated BER vs. $V_{TH,H}$ , $ V_{TH,L} $ bathtub curves of ECL-1 and ECL-2 on channel 4. (b) Simulated BER vs. $V_{TH,H}$ , $ V_{TH,L} $ bathtub curves of ECL-1 and ECL-2 on channel 5.                                                                                                           | 61   |

# LIST OF FIGURES (Continued)

| Figure |                                                                                                                                                                                                                                                                                                                                                                                                                                  | Page |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 3.14   | (a) Die micrograph of the proposed transceiver. (b) Pie chart of the power breakdown at 14.4Gb/s measurement. (c) Additional blocks to implement the proposed scheme in addition to simple NRZ transceiver architecture and their power overhead. (d) Pie chart of the area breakdown.                                                                                                                                           | 64   |
| 3.15   | <ul><li>(a) Simulated clock distribution network power breakdown at 14.4Gb/s.</li><li>(b) Area breakdown of the clock distribution network</li></ul>                                                                                                                                                                                                                                                                             |      |
| 3.16   | Transceiver measurement setup with the chip-on-board photograph                                                                                                                                                                                                                                                                                                                                                                  | 65   |
| 3.17   | (a) Measured Dicode near-end eye diagram, far-end eye diagram, chan-<br>nel loss profile, and in-situ eye diagram. (b) BER sensitivity graphs<br>with respect to upper threshold voltage $(V_{TH,H})$ and lower threshold<br>voltage $(V_{TH,L})$ .                                                                                                                                                                              | 67   |
| 3.18   | (a) Measured channel loss profiles. (b) Measured transceiver BER bathtub curves using PRBS-7. (c) Measured transceiver BER bathtub curves using PRBS-31. (d) Measured channel loss profiles without receiver-side loss. Measured single bit response of (e) channel 4 (=channel 1 without receiver-side loss), (f) channel 5 (=channel 2 without receiver-side loss), and (g) channel 6 (=channel 3 without receiver-side loss). | 68   |
|        | 1065)                                                                                                                                                                                                                                                                                                                                                                                                                            | 00   |

## LIST OF TABLES

| Table |                                               | ] | Page |
|-------|-----------------------------------------------|---|------|
| 2.1   | Comparison with state-of-the-art transceivers |   | 32   |
| 3.1   | Comparison with state-of-the-art transceivers |   | 65   |

#### Chapter 1: Introduction

This dissertation consists of two research topics in wireline communication circuits and systems design. The aim of both these research topics is to achieve energy-efficient error-free communication over bandwidth-limited wireline channels. The first one is a PAM-8 transceiver with a time-domain FFE and linearity improvement technique. The second one is an equalizer-less wireline transceiver with Dicode encoding and forward error correction logic.

# 1.1 Introduction to PAM-8 Wireline Transceiver with Time-Domain Equalization and Linearity Improvement Technique

Ever increasing global internet data traffic has driven up the demand for cuttingedge high-speed wireline communication systems in data centers. As a result, data rates of wireline links have been steadily increasing [1]. PAM-4 signaling has been consistently and widely adopted throughout the SerDes industry and research, mostly due to its capability to double the data rate as compared to non-return-to-zero (NRZ) while keeping the same symbol rate [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. As PAM-4 signaling has become more mature and well-established, there has been a need to explore higher-order modulation technique and architecture beyond PAM-4 with the aim to achieve a higher data rate and better energy efficiency. The next modulation technique after PAM-4 is PAM-8, which can help to increase the data rate to 3 times that of NRZ [14] for the same symbol rate. However, due to the presence of 8 voltage levels, the PAM-8 modulation scheme is more sensitive to non-idealities such as (a) residual intersymbol interference (ISI) and (b) non-linearity in the frontend as compared to NRZ. Both of these non-idealities must be minimized as they reduce the vertical eye opening, which reduces the signal to noise ratio (SNR) and consequently increases the bit error rate (BER). In view of this, this work investigates time-domain FFE-based PAM-8 wireline architecture with the aim to achieve more than 7-bit ENOB linearity in the receiver front-end [15].

There are three techniques to mitigate ISI and achieve error free communication on a bandwidth-limited wireline channel. These techniques are – data encoding with forward error correction [16], line coding [17, 18, 19, 20, 21, 22], and equalization/filtering. Among these techniques, equalization or filtering has been the most prominent technique in wireline systems. There are three equalization architectures: feed forward equalizers (FFE) [23, 24, 25], decision feedback equalizers (DFE) [26, 27, 28, 25], and continuous time linear equalizers (CTLE) [29, 30]. While DFE and CTLE are placed on the receiver end, FFE can be either placed on the transmitter side or on the receiver side. Compared to the transmitter-side FFEs, receiver-side FFEs can help to avoid not only jitter amplification but also the requirement of having a backchannel to tune FFE coefficients [31]. Therefore, this work investigates receiver-side FFE architecture for PAM-8 modulation.

Conventional state-of-the-art receiver-side FFEs are generally in a form of either (a) voltage-domain analog FFE architecture [32, 24, 33, 34] or (b) digital-domain or analog-to-digital-converter (ADC)-based FFE architecture [35, 36, 12, 11, 6, 7, 8, 2, 3, 4] Analog FFEs with amplifier-based delay elements suffer from a trade-off between signal swing and linearity. While a large signal swing is desirable for a higher SNR and a lower BER, a large signal swing also results in reduced linearity of the analog delay elements. Moreover, with the reduced supply voltage in the fine technology nodes, it has become challenging to achieve a large swing without compromising linearity. Furthermore, variation in delay due to process, voltage, and temperature (PVT) variations in an amplifier-based delay line can result in under- or over-equalization, which increases the BER. Another type of analog-based FFE architecture is sampleand-hold FFE architecture [37]. While a sample-and-hold-based FFE can avoid using delay elements, the requirement to generate and transmit a 25% duty cycle clock for a 4-tap FFE and a 12.5% duty cycle clock for a 8-tap FFE makes scaling of this architecture challenging [37]. An ADC-based FFE provides flexibility to implement multiple FFE taps. However, high power consumption in the high-resolution highspeed ADCs and digital signal processors (DSP) degrades the transceiver energy efficiency.

Researchers have explored different domains to find alternate equalization strategies while maintaining the same basic principle. Equalization using a pulse-widthmodulated (PWM) signal in a time domain [5, 38, 39, 40] can be a promising alternative to overcome the limitations of voltage-domain and digital-domain equalization techniques. Time-domain signaling in the advanced technology nodes can process the signal in a time domain while operating with a lower supply voltage. However, prior FFE-based time-domain architecture was demonstrated in NRZ [38]. Prior DFE-



Figure 1.1: Conceptual diagram of the proposed transceiver with time-domain feed forward equalization and linearity improvement technique.

based time-domain architectures, on the other hand, were demonstrated in NRZ [41, 39] and PAM-4 [5]. In all the prior time-domain architectures, the number of taps were limited to 2, mainly due to the limited front-end linearity and fixed data rate range. Higher-order modulation techniques such as PAM-8 require a highly linear analog front-end, which translates to a highly linear time-domain FFE or DFE architecture. In view of this, we propose a PAM-8 transceiver with a time-domain 4-tap FFE on the receiver, achieving a wide data rate range from 12.0-to-39.6 Gb/s [15].

The concept of the proposed time-domain equalization and non-linearity cancellation technique is shown in Fig. 1.1. The received voltage signal is sampled at the receiver and converted to a pulse-width-modulated signal. Delay in the FFE is generated by delaying PWM signals using digital delay lines. PWM signals are added together with appropriate weights to equalize the channel. The PWM signal is converted back to voltage signal at the FFE output and the slicers sample the voltage signal to get the PAM-8 data. It was observed that when the two non-linear functions have non-linearity with opposite sign are cascaded, one non-linear function cancels the other out, making the overall aggregated transfer function of the cascaded system more linear. From this observation, the proposed receiver is designed to have voltage-to-time and time-to-voltage converters with an opposite polarity of linearity.

# 1.2 Introduction to Transceiver Design with ISI-Resilient Data Encoding for Equalizer-Free Wireline Communication using Dicode Encoding and Error Correction

Increasing the data rate through bandwidth-limited wireline channels requires equalizations to cancel inter-symbol interferences (ISI). Since wireline channels have lowpass characteristics, the equalization is typically achieved by leveraging high-pass filters or equalizers (FFE, DFE, or CTLE) in the data path to compensate for the low-pass characteristics of the channel. It is estimated that the energy efficiency of wireline links degrades by 10 times for every 30 dB of channel loss compensated [1, 42]. Conventional equalizers require multiple taps (ex: 4-FFE, 15-DFE, 2-CTLE) to compensate for heavy channel loss, which results in high power consumption and reduced energy efficiency [43, 44]. Furthermore, increasing the number of taps gives a diminishing return on the maximum achievable data rate after a certain point [45].

Researchers have also introduced alternative equalization strategies using phase pre-emphasis [18] or using different line-codes such as Manchester code [46, 17, 19, 20], integrated pulse width modulation (iPWM)[47, 48, 49], and consecutive digit chopping (CDC) [49]. To support the ever-increasing data rates and compensate for heavier channel loss, data encoding and error correction techniques can be leveraged. In upper layers of the wireline communication stack, techniques such as convolutional or 8b10b coding and sequence detection-based Viterbi decoding can achieve low BER in the presence of ISI [50]. However, the traditional data encoding and error correction comes at the cost of (a) low energy efficiency, (b) low achievable data rate due to high complexity and feedback requirement in the decoder, (c) high latency in the decoding process (100s of ns), and (d) coding overhead which requires additional bits to be transmitted in the message.

In view of these limitations, we propose an ISI-resilient (ISIR) encoding and error correction concept – Dicode encoding and sequence detection-based error correction and decoding to achieve low BER on bandwidth-limited wireline channels without using any conventional equalizers or filters. This paper presents two types of all-digital error correction logic which can help to achieve BER $<10^{-12}$ . The proposed encoding/decoding approach has (a) a high energy efficiency, (b) no feedback loops in the decoder (it is a feed-forward architecture), (c) small logic depth with error correction and decoding latency of only 5 UIs (3 UIs from re-timers, 2 UIs from error correction + decoding logic), and (d) no additional bits for coding. Operating at 13.6 Gb/s, 14.4 Gb/s, and 16 Gb/s, the 65 nm CMOS transceiver with the proposed

ISIR Dicode encoding can compensate for 24.2 dB, 21.6 dB and 21.4 dB channel loss with an energy efficiency of 2.56 pJ/bit, 2.38 pJ/bit, and 2.66 pJ/bit, respectively [16]. Compared to the state-of-the-art Viterbi-based transceiver [50], the proposed transceiver achieves 4 times better Tx + Rx energy efficiency, 3 times higher data rate, and 30 times lower latency while communicating on a channel which has 3 dB higher channel loss.

# Chapter 2: PAM-8 Transceiver with Time-Domain (PWM) Feed Forward Equalization and Non-linearity Cancellation Technique

### 2.1 Linearity Requirement on Pulse Amplitude Modulated Signal

Non-linearity in the transceiver front-end results in reduction in the horizontal eye opening, and consequently reduction in SNR and increase in the BER. Therefore, it is important to estimate the non-linearity requirement for a given modulation scheme. In this work, the effect of non-linearity on various higher-order modulation schemes is estimated in MATLAB by applying a non-linear function to PAM-4/8/16 modulated data and to observe their eye openings.

Mathematically, a non-linear function (f) can be modeled as:

$$f = (1+\beta)x - \beta x^3 \tag{2.1}$$

where  $\beta \ge 0$  represents the non-linear coefficient or the third-order harmonic coefficient. Assuming the input is a cosine function or  $x = \cos(t)$ , (2.1) can be expanded

as:

$$f = (1+\beta)\cos(t) - \beta\cos^{3}(t)$$
  
=  $\left(1+\beta+\frac{3}{4}(-\beta)\right)\cos(t) + \frac{1}{4}(-\beta)\cos(3t)$  (2.2)  
=  $\left(1+\frac{\beta}{4}\right)\cos(t) - \frac{\beta}{4}\cos(3t).$ 

To quantify the linearity of (2.2), the total harmonic distortion (THD) and the corresponding effective-number-of-bits (ENOB) can be represented as:

$$THD = 10\log_{10}\left(\frac{(1+\beta/4)^2}{(|-\beta/4|)^2}\right)$$
(2.3)

$$ENOB = \frac{THD - 1.76}{6.02}.$$
 (2.4)

To quantify the irregularity of the multiple eye openings, ratio-of-level mismatch (RLM) of a PAM-N signal has been used [51, 52, 53]. RLM can be defined as

$$RLM = \frac{\min(V_N - V_{N-1}, V_{N-1} - V_{N-2}, \dots, V_2 - V_1)}{(V_N - V_1)/N}$$
(2.5)

where  $V_i$  represents an *i*-th voltage level of a PAM-*N* symbol. The numerator is the minimum eye opening which is represented as the minimum voltage difference between adjacent voltage levels. The denominator represents the average of the eye opening, represented as the difference between the highest voltage level ( $V_N$ ) and the lowest ( $V_1$ ) divided by *N*. When RLM is equal to 1, it means that the system is perfectly linear without any distortions in the eye diagram and all the eye openings



Figure 2.1: Relationship between ratio-of-level mismatch (RLM) and effectivenumber-of-bits (ENOB) in PAM-4, 8, and 16.

having the equal heights. On the other hand, if the eye diagram gets distorted due to non-linearity, RLM will be smaller than 1. For example, PAM-4-based wireline papers typically report RLM  $\geq 0.9$  [51, 52, 53], meaning that the most severely distorted eye has 10% less eye opening than the ideal scenario.

A plot of ratio-of-level mismatch versus effective-number-of-bits (ENOB) is shown in Fig. 2.1. This graph illustrates how much eye openings of different pulse-amplitudemodulated signals get distorted as the linearity of the system changes. Two important observations can be made from Fig. 2.1 regarding the relationship between linearity, modulation order (N), and the ratio-of-level-mismatch of any PAM-N system:

- To maintain the same value of RLM, a higher-order PAM requires a greater ENOB, or a more linear system.
- 2. For PAM-8 signaling, a wireline system is recommended to maintain an ENOB



Figure 2.2: Proposed PAM-8 transceiver architecture with a time-domain 4-tap feed forward equalizer (FFE) on the receiver.

greater than 7 bits.

In a conventional amplifier-based analog front-end, its non-linearity is reduced when the signal swing is reduced. However, reduction of signal swing lowers SNR, which increases the BER. Furthermore, reduction of supply voltage puts a limit on maximum signal swing that can be achieved. Therefore, there is a need to investigate alternative signal processing circuits and schemes, which can avoid or minimize the fundamental trade-off between non-linearity and SNR.

#### 2.2 Proposed Transceiver Architecture

A block diagram of the proposed PAM-8 transceiver is shown in Fig. 2.2. The transmitter consists of a pseudo-random binary sequence (PRBS) generator, three 32:1 multiplexers and a source-series-terminated (SST) output driver. The transmitter has an optional 4-bit tunable 2-tap FFE for debugging purpose only (not used in transceiver measurements). The SST output driver has a parallel segmented structure to implement FFE functionality. The receiver has quarter-rate architecture.



Figure 2.3: (a) Schematic of the single-ended track-and-hold and voltage-to-time converter and (b) timing diagram illustrating its operation.

Each lane of the quarter-rate receiver consists of a track-and-hold circuit, a voltageto-time converter (VTC), a 4-tap (2 post-cursors, 1 main-cursor, and 1 pre-cursor) quarter-rate time-domain FFE, a voltage amplifier, 7 samplers, and a PAM-8 data decoder for decoding PAM-8 symbols to a 3-bit binary output.

#### 2.2.1 Track-and-Hold and Voltage-to-Time Converter

A schematic and a timing diagram of the voltage-to-time converter in the receiver is shown in Fig. 2.3 (a). Track-and-hold circuit samples incoming data  $(V_{DATA})$  with a quarter-rate clock and holds the sampled voltage  $(V_{IN})$  in a capacitor  $(C_{TNH})$ . Bootstrap architecture is leveraged for the track-and-hold to achieve high linearity. During the hold and charge phase, a current source  $(I_{SRC} = 25 \ \mu\text{A} \text{ at } 12 \text{ Gbps and } 80 \ \mu\text{A} \text{ at } 39.6 \text{ Gbps in this work})$  charges up a capacitor  $C_{TNH}$  (40 fF) and increases  $V_{IN}$ linearly, as shown in Fig. 2.3 (b). As  $V_{IN}$  goes above threshold voltage  $(V_{TH-INV})$  of the following CMOS inverter (INV<sub>1</sub>), the rising edge of the digital rail-to-rail PWM signal  $(T_{VTC})$  is asserted.  $T_{VTC}$  is de-asserted at the end of the hold period by the falling edge of the clock. If the sampled voltage has smaller magnitude, then it takes more time for  $V_{IN}$  to get charged up, reach  $V_{TH-INV}$ , and create a rising edge of  $T_{VTC}$ . As a result,  $T_{VTC}$  will have a narrower pulse width. In this work, one VTC is present on each of the positive and negative port of the received fully differential data input signal. The output PWM data signal from each of the four lanes is pseudo-differential in time domain. That is, the time difference between rising edges of the differential pulses contains information on the magnitude of the sampled voltage.

#### 2.2.2 FFE Delay Lines Using Inverters and a Replica DLL

A schematic of the FFE delay lines using a replica delay-locked loop (DLL) in the receiver is shown in Fig. 2.4. The pulse-width-modulated signal,  $T_{VTC}$ , is precisely delayed to perform feed forward equalization. Propagation delay of the inverters are sensitive to process, voltage, and temperature (PVT) variations. To minimize its impact, a replica-based DLL shown in Fig. 2.4 (a) is employed to ensure that the unit interval (UI) delay is less affected by PVT variations. DLL consists of a bangbang phase detector, replica delay elements, and a charge pump. Since the proposed receiver is quarter-rate architecture, DLL is locked to a delay of 1 receiver clock cycle which is 4 UIs. For example, at 39 Gb/s or 13 GBaud, receiver clock frequency is 3.25 GHz, and 1 UI corresponds to approximately 307.7 ps and 4 UI is approximately 1.23 ns. Control voltage of the DLL ( $V_{CTRL}$ ) is shared with all the delay elements in all the 4 lanes. Each UI delay element has 2 digital programmable bits, which enable



Figure 2.4: Block diagram of the proposed (a) replica delay-locked loop, (b) 4-tap quarter-rate FFE summers (only I and Q lanes are presented in figure), and (c) programmable unit interval delay element.



Figure 2.5: Schematics of I-lane of the proposed time-to-voltage converter (TVC) in a quater rate receiver.

the proposed FFE to achieve wide data rate range (approximately  $3.5 \times$ ).

### 2.2.3 Time-to-Voltage Converter

A schematic of the time-to-voltage converter (TVC) in the receiver is shown in Fig. 2.5. Both time-to-voltage conversion and FFE coefficient/weight multiplication are performed using a differential charge pump. Differential PWM signals ( $T_{VTCP}$ and  $T_{VTCN}$ ) turn on/off the charge pump switches to charge/discharge the output capacitor ( $C_{CP}$ ). As a result, the differential voltage across  $C_{CP}$ , or  $V_{TVCP} - V_{TVCN}$ , is proportional to the difference between the pulse width of input PWM signals, or  $T_{VTCP} - T_{VTCN}$ , as shown in Fig. 2.6 (b). This charge pump also acts as an FFE summer and combines PWM output signals from four charge pumps with tunable tap coefficients into one output capacitor  $C_{CP}$ . In this work, the FFE tap coefficients were adjusted by changing the individual bias current of 4 charge pumps.



Figure 2.6: (a) Simplified representation of one tap segment of the proposed time-tovoltage converter (TVC). (b) Timing diagram of the TVC operation and eye diagram of the output of the charge pump  $V_{TVCP}$  or  $V_{TVCN}$  without any input data jitter. (c) Timing diagram of the TVC operation and eye diagram of the output of the charge pump  $V_{TVCP}$  or  $V_{TVCN}$  with input data jitter.

### 2.2.4 Effect of Timing Jitter on Time-to-Voltage Converter

Jitter in PWM signals which are the input to the charge pump results in voltage noise and timing jitter at the output of the charge pump, as illustrated in Fig. 2.6 (b) and (c). Power supply noise in delay elements and charge pumps is the major source of jitter in the input PWM signals to the charge pump. This jitter reduces both the vertical and horizontal eye opening. Eye diagrams generated by the output of the charge pump  $V_{TVCP}$  or  $V_{TVCN}$  without and with jitter of input PWM signals ( $T_{VTCP}$ ,  $T_{VTCN}$ ) are shown in the right side of Fig. 2.6 (b) and (c), respectively. When the rising edge of the input PWM signals have jitter, it changes the timing when it starts to charge or discharge the capacitor. As a result, the final voltage value (flat region of eye diagram) is going to be affected by the jitter. Jitter at the falling edge would affect less on the magnitude of the sampled voltage. However, it still reduces the horizontal eye opening or timing margin of the sampling clock. Differential signaling architecture and sufficient decoupling capacitors both on-chip and off-chip help to suppress power supply noise.

### 2.2.5 Receiver Amplifier

The summing nodes or the outputs of the FFE summer ( $V_{TVCP}$  and  $V_{TVCN}$ ) have large output node capacitance because charge pumps from all 4 lanes are connected together (see Fig. 2.5). The charge pump has to drive 7 comparators for PAM-8 which adds up more capacitive loads to the charge pump output. To drive such a large capacitive load and maintain a large voltage swing for high signal-to-noise ratio (SNR) at the charge pump output which is switching at a very high frequency, the charge pump design could be very energy-inefficient and challenging. To improve the efficiency, a voltage amplifier is introduced between the charge pump and comparators. The



Figure 2.7: Proposed linearity improvement technique in the receiver front-end by using voltage-to-time and time-to-voltage converters with opposite non-linearity.

amplifier achieves a gain of 6.5 dB and a bandwidth of 10.6 GHz. In more advanced technology nodes which has a larger transistor current gain and smaller parasitic capacitance, this voltage amplification stage may not be necessary.

#### 2.3 Proposed Linearity Improvement Technique

In the proposed approach, higher linearity in the receiver front-end is achieved by using non-linear circuit blocks having non-linearity in opposite directions. The proposed linearity improvement concept is illustrated in Fig. 2.7. The VTC architecture is designed to have non-linearity in the opposite direction to that of the TVC nonlinearity. The aim is to achieve a more linear transfer function from  $V_{IN}$  to  $V_{OUT}$  than the transfer function of TVC by canceling non-linearity of TVC with non-linearity of VTC.

#### 2.3.1 Transfer Function of Voltage-to-Time Converter

Non-linearity of the VTC block is introduced mainly due to channel length modulation of its current source  $(I_{SRC})$  (see Fig. 2.7). To understand the effect of channel length modulation on the VTC non-linearity, Fig. 2.7 provides one example of two different current sources whose W/L ratio is the same but the absolute value of width (W) and length (L) are different. In this example,  $L_1$  (blue curve in Fig. 2.7) is shorter than  $L_2$  (red curve in Fig. 2.7). The blue curve  $(L_1)$  which has shorter device length shows a more non-linear relation between  $T_{VTC}$  vs  $V_{IN}$  than the red curve  $(L_2)$ with longer device length. This non-linearity can be mathematically explained as follows. Assuming a first-order and square-law relationship, the current vs. voltage relationship of a PMOS current source in the presence of channel length modulation is expressed as:

$$I_{SRC} = I_0 - K_I V_{IN} \tag{2.6}$$

$$K_I = \frac{1}{2} \lambda \mu_P C_{OX} \frac{W}{L} (|V_{OV,ISRC}|)^2$$
(2.7)

where  $I_0$  is the maximum current magnitude when  $V_{IN} = 0V$  or  $|V_{DS}| = V_{DD}$ ,  $K_I$  is the slope of the  $I_{SRC}$  vs.  $V_{IN}$  curve which is proportional to  $\lambda = \Delta L/(L \cdot |V_{DS}|)$  [54], and  $|V_{OV,ISRC}|$  is the overdrive voltage of  $I_{SRC}$ . The gain of the VTC output pulse width  $(T_{VTC})$  can be mathematically expressed:

$$\frac{\partial T_{VTC}}{\partial V_{IN}} = \frac{C_{TNH}}{I_{SRC}}.$$
(2.8)

Using (2.6), (2.7) and (2.8) to express  $T_{VTC}$  as a function of  $V_{IN}$  results in a convexshaped transfer function, which is mathematically expressed as:

$$T_{VTC} = T_{const} - \frac{C_{TNH}}{K_I} \ln(I_0 - K_I V_{IN}), \qquad (2.9)$$

where  $T_{const}$  is an integration constant. Note that it shows a similar shape to the blue and red curve in the  $T_{VTC}$  vs.  $V_{IN}$  plot, shown in Fig. 2.7. From (2.9), one can observe that the non-linear relation between  $T_{VTC}$  and  $V_{IN}$  can be either tuned by (a) varying the dimensions (W/L) of the current source  $I_{SRC}$  or (b) by changing the overdrive voltage of the current source ( $|V_{OV,ISRC}|$ ). In this work, the dimensions of the current source was determined during the design time with the help of simulations, and the non-linearity is finely tuned by varying the  $|V_{OV,ISRC}|$  during the measurements.

The non-linear relationship between  $V_{IN}$  and  $T_{VTC}$  of pseudo-differential VTC



Figure 2.8: (a) Simplified schematic of a single-ended time-to-voltage converter. (b) Transfer function curves of time-to-voltage converter up and down current with the presence of channel length modulation. (c) Timing diagram of time-to-voltage converter input PWM signals and output voltage.

architecture can be expressed in a polynomial form as:

$$T_{VTC} = (1 - \alpha)V_{IN} + \alpha V_{IN}^3$$
(2.10)

where  $0 \le \alpha < 1$  is the non-linearity coefficient of VTC. For simplicity of analysis, the normalized differential signaling function of VTC is modeled with only a linear and a cubic term.

### 2.3.2 Transfer Function of Time-to-Voltage Conversion

Non-linearity of TVC is also due to channel length modulation of its charge pump current sources. The magnitude of the up/down current in the charge pump is affected by the voltage of the integration node,  $V_{TVC}$ .

As shown in Fig. 2.8 (b),  $I_{UP}$  and  $I_{DN}$  with the presence of channel length mod-

ulation can be expressed as:

$$I_{UP} = I_{AVG} - k_{CP}(V_{TVC} - V_{CM})$$
(2.11)

$$I_{DN} = I_{AVG} + k_{CP}(V_{TVC} - V_{CM})$$
(2.12)

where  $k_{CP}$  represents the slope or channel length modulation parameter, and  $I_{AVG}$  is the magnitude of UP and DN current when  $V_{TVC}$  is at the common-mode voltage  $V_{CM}$ . The relationship between voltage and current at the output node of the charge pump can be expressed as:

$$\frac{\partial V_{TVC}}{\partial T_{VTC}} = \frac{I_{UP} - I_{DN}}{C_{CP}}.$$
(2.13)

where  $T_{VTC}$  is the pulse width difference between  $T_{INP}$  and  $T_{INN}$ . When  $T_{INP}$  arrives at TVC, the UP current source is turned on, charges up the capacitor  $C_{CP}$ , and increases the voltage  $V_{TVC}$  while the DN current source is off or  $I_{DN} = 0$ . Combining (2.13) with (2.11) would result in a first-order differential equation or

$$\frac{\partial V_{TVC}}{\partial T_{VTC}} = \frac{I_{AVG} - k_{CP}(V_{TVC} - V_{CM})}{C_{CP}}.$$
(2.14)

Consequently, using the condition that  $V_{TVC} = V_{CM}$  when  $T_{VTC} = 0$ , the TVC transfer function while the UP current is charging up can be expressed as:

$$V_{TVC} = V_{CM} + \frac{I_{AVG}}{k_{CP}} \left( 1 - e^{-k_{CP}T_{VTC}/C_{CP}} \right)$$
(2.15)

which is a concave-shaped function of  $T_{VTC}$ .

For the simplicity of the further analysis, instead of using the exponential function, the differential TVC transfer function is approximated as a polynomial function which can be expressed as:

$$V_{TVC} = (1+\beta)T_{VTC} - \beta T_{VTC}^{3}$$
(2.16)

where  $0 \leq \beta < 1$  is the non-linearity coefficient of TVC. A reset signal RST that resets the capacitor  $C_{CP}$  is generated by logically ANDing  $T_{VTCP}$  and  $T_{VTCN}$ . RST is delayed by approximately 10 ps to provide sufficient timing margin for the PAM-8 samplers to sample the data without metastability.

### 2.3.3 Overall Receiver Front-End Transfer Function with Linearity Improvement Technique

In the proposed non-linearity cancellation approach, VTC is intentionally designed to have a certain amount of non-linearity such that the combination of VTC and TVC and amplifiers could achieve a more linear transfer function. In other words, the end-to-end function of the receiver front-end, or  $V_{OUT}$  vs.  $V_{IN}$  relationship, can be described as:

$$V_{OUT} = (1 \pm \gamma) V_{IN} \mp \gamma V_{IN}^3 \tag{2.17}$$

where  $0 \le \gamma < 1$  is the non-linearity coefficient of the overall receiver front-end. The aim of the non-linearity cancellation approach is to make the magnitude of  $\gamma$  smaller than of  $\alpha$  in (2.10) and  $\beta$  in (2.16).



Figure 2.9: Block diagram of two non-linear blocks with opposite non-linearity polarity in series.

### 2.3.4 Limits of Linearity Improvement

Transfer functions of two non-linear stages with non-linearity in opposite directions are shown in Fig. 2.9. Mathematically, they can be expressed as:

$$y = (1 - \alpha)x + \alpha x^3 \ (\alpha \ge 0) \tag{2.18}$$

$$z = (1+\beta)y - \beta y^3 \ (\beta \ge 0) \tag{2.19}$$

where  $\alpha, \beta$  represent the non-linear or third-order harmonic coefficient of the first (VTC) and the second stage (TVC), respectively. The first stage is a convex third-order function and the second stage is a concave third-order function. The final output, z, can be expressed in terms of the input of the system, x, as follows:

$$z = (1+\beta)[(1-\alpha)x + \alpha x^3] - \beta[(1-\alpha)x + \alpha x^3]^3$$
  
=  $C_1 x + C_3 x^3 + C_5 x^5 + C_7 x^7 + C_9 x^9,$  (2.20)

where the coefficients are

$$C_{1} = (1 - \alpha)(1 + \beta)$$

$$C_{3} = \alpha(1 + \beta) - (1 - \alpha)^{3}\beta$$

$$C_{5} = -3\alpha(1 - \alpha)^{2}\beta$$

$$C_{7} = -3\alpha^{2}(1 - \alpha)\beta$$

$$C_{9} = -\alpha^{3}\beta$$
(2.21)

respectively.

To estimate the non-linearity at the output z, a sinusoidal signal is applied at the input. By substituting  $x = \cos(t)$  in (2.20), the output can be rewritten as:

$$z = C_1 \cos(t) + C_3 \cos^3(t) + C_5 \cos^5(t) + C_7 \cos^7(t) + C_9 \cos^9(t) = Q_1 \cos(t) + Q_3 \cos(3t) + Q_5 \cos(5t) + Q_7 \cos(7t) + Q_9 \cos(9t),$$
(2.22)

where the coefficients are

$$Q_{1} = C_{1} + \frac{3}{4}C_{3} + \frac{10}{16}C_{5} + \frac{35}{64}C_{7} + \frac{126}{256}C_{9}$$

$$Q_{3} = \frac{1}{4}C_{3} + \frac{5}{16}C_{5} + \frac{21}{64}C_{7} + \frac{84}{256}C_{9}$$

$$Q_{5} = \frac{1}{16}C_{5} + \frac{7}{64}C_{7} + \frac{36}{256}C_{9}$$

$$Q_{7} = \frac{1}{64}C_{7} + \frac{9}{256}C_{9}$$

$$Q_{9} = \frac{1}{256}C_{9}$$
(2.23)

respectively. Note that each  $Q_N$  is a function of  $\alpha$  and  $\beta$ , since each  $C_N$  is also a function of  $\alpha$  and  $\beta$  and  $Q_N$  is a linear combination of  $C_N$ .

Linearity of the output z in terms of effective-number-of-bits (ENOB) can be expressed as:

$$THD = 10 \log_{10} \left( \frac{Q_1^2}{Q_3^2 + Q_5^2 + Q_7^2 + Q_9^2} \right), \qquad (2.24)$$

$$ENOB(\alpha, \beta) = \frac{THD - 1.76}{6.02}.$$
 (2.25)

To estimate the limits of linearity improvement in the proposed approach, ENOB difference is introduced, which can be mathematically defined as:

$$ENOB Difference(\alpha, \beta)$$

$$= ENOB(\alpha, \beta) - ENOB(\alpha = 0, \beta).$$
(2.26)

The ENOB difference indicates the difference between the ENOB when  $\alpha$  is nonzero indicating the VTC is not perfectly linear, and the ENOB when  $\alpha$  is zero indicating



Figure 2.10: ENOB difference as a function of non-linear coefficients of each non-linear stage.

the VTC is perfectly linear. Note that the overall non-linearity is equal to the nonlinearity of the TVC ( $\beta$ ) when  $\alpha$  is equal to 0. An ENOB difference greater than 0 means the improvement in linearity.

To see the trend of the linearity improvement, ENOB difference is calculated for different combinations of the non-linearity coefficients  $\alpha$  and  $\beta$  using (2.26) whose results are shown in Fig. 2.10. Two observations could be deducted from this simulation result:

- 1. ENOB difference is always greater than 0 when  $\alpha \leq \beta$  for  $0 < \beta < 0.2$ .
- 2. ENOB is maximized when  $\alpha$  and  $\beta$  have a similar value.



Figure 2.11: Graph of normalized simulated transfer functions of voltage-to-time converter, time-to-voltage converter, and the overall receiver front-end.



Figure 2.12: (a) Simulated end-to-end SNDR and SFDR of the proposed receiver with 500mVpp Input, and (b) Simulated receiver output spectrum with Nyquist frequency input.

Curves of simulated input-output transfer function of the VTC, TVC, delay line, amplifier, and the overall receiver designed in this work are shown in Fig. 2.11. While ENOB of the TVC is approximately 5.1 bits ( $\beta = 0.09$ ), introducing the VTC with



Figure 2.13: (a) Prototype chip micrograph. (b) Area breakdown of the transceiver chip active area.

 $\alpha = 0.07$  improves the overall linearity of the receiver to have  $\gamma = 0.028$  (see (2.17)) or approximately 6.9 bits. The non-linearity coefficient of both the delay line ( $\delta$ ) and the voltage amplifier ( $\epsilon$ ) show a similar or smaller number than that of the receiver front-end ( $\gamma$ ). The simulated SNDR, SFDR, and output spectrum of the receiver are shown in Fig. 2.12. For a 7-GHz sinusoidal input, the proposed receiver achieves an SNDR of 46.5 dB (7.4 bits). The 0.5 bit error between the linearity obtained from SNDR and linearity from the modeled transfer function of the receiver (using (2.17)) is due to the inaccuracy in modeling the normalized transfer functions into a polynomial with first-order and third-order terms only.

#### 2.4 Measurement Results

The prototype transceiver was fabricated in 65 nm CMOS and the die micrograph is shown in Fig. 2.13 (a). The proposed transceiver operates from 0.9-/1.0-/1.1V supply



Figure 2.14: Measurement setup of the transceiver with a zoomed photo of the receiver showing the chip-on-board.

and occupies an active area of 0.39 mm<sup>2</sup>. Fig. 2.13 (b) shows the area breakdown of the proposed transceiver, which consists of a PAM-8 transmitter (TX), a track-and-hold and a voltage-to-time converter (TNH+VTC), delay lines and a delay-locked loop (DLY+DLL), a charge pump (CP), a stage of voltage amplifiers and AC coupling capacitors (AMP+ACC), and comparators and decoder logic blocks for PAM-8 (SA+DEC).

The measurement setup for the transceiver is shown in Fig. 2.14. The transmitter board is connected to the receiver board through a 15-inch FR4 channel. The clock is generated from the external clock generator (HP 83640B) and the clock is internally divided to generate four phases and connected to the transmitter and the receiver. The threshold voltages are generated using off-chip voltage regulators and potentiometers (variable resistors). Oscilloscope (Tektronix DSA8200) and BER tester (Tektronix BSA286CL) were used to measure the waveforms and bit error rate at the receiver. The chip was bonded to the PCB by wire-bonding chip-on-board (COB).



Figure 2.15: Insertion loss profile of the channel used for the measurement.



Figure 2.16: Measured transceiver BER bathtub curves for 3 bits of PAM-8 at the receiver in PRBS-7 data (a) at 39.6 Gb/s and (b) at 36.0 Gb/s.

The insertion loss profile of the channel which was used during the measurement of the transceiver is presented in Fig. 2.15. The FR4 channel has 12.7 dB and 14.0 dB insertion loss at 6.0 GHz and 6.6 GHz, respectively, which has a similar loss with middle-range (MR) wireline channels targeted for interface applications for chip-tochip communication and mid-range backplanes [55].

The measured BER bathtub curves at 39.6 Gb/s and 36.0 Gb/s are shown in Fig. 2.16 (a) and (b), respectively. Each bathtub graph has three curves, since a

|                     | This Work                    | ISSCC'17 [56]                  | ISSCC'18 [11]                                                                    | ISSCC'18 [12]                                |
|---------------------|------------------------------|--------------------------------|----------------------------------------------------------------------------------|----------------------------------------------|
| Modulation          | PAM-8                        | PAM-4                          | PAM-4                                                                            | PAM-4                                        |
| Process             | $65 \mathrm{nm}$             | 40-nm                          | 16-nm                                                                            | 16-nm                                        |
| Data Rate [Gb/s]    | 12-39.6                      | 56                             | 19-56                                                                            | 64.375                                       |
| Equalization        | 4-tap RxFFE<br>(Time-domain) | 3-tap FFE<br>CTLE<br>3-tap DFE | 4-Tap TxFFE<br>15-tap RxFFE<br>2-tap DFE                                         | 3-tap TxFFE<br>CTLE<br>16-tap DFE (Off-chip) |
| Efficiency [pJ/bit] | $8.65^{ m a}, 8.66^{ m b}$   | 10.39                          | $6.4^{\rm c},  9.7^{\rm d}$                                                      | $2.9^{f^*}, 4.3^{g^*}, 6.2^{h^*}$            |
| Loss [dB]           | $12.7^{\rm a},14^{ m b}$     | 24                             | $7.4^{\rm c}, 32^{\rm d}$                                                        | $8.6^{\rm f}, 13.5^{\rm g}, 29.5^{\rm h}$    |
| Area $[mm^2]$       | 0.39                         | 2.4                            | 8.81                                                                             | $0.25^{*}$                                   |
| Supply [V]          | 0.9/1.0/1.1                  | 1.0/1.5                        | 0.85/0.9/1.2/1.8                                                                 | 0.9/1.2                                      |
| BER                 | $10^{-6a}, 10^{-4b}$         | 10-12                          | $10^{\text{-}12\mathrm{c}},\!10^{\text{-}6\mathrm{e}},10^{\text{-}12\mathrm{d}}$ | $10^{-6\rm f},10^{-5\rm g},10^{-4\rm h}$     |

Table 2.1: Comparison with state-of-the-art transceivers

 $^{\rm a}$  36.0Gb/s.

 $^{\rm b}$  39.6Gb/s.

<sup>c</sup> 7.4dB.

<sup>d</sup> 32dB without XT.

<sup>e</sup> 32dB with 3mVrms XT.

f 8.6dB.

<sup>g</sup> 13.6dB. <sup>h</sup> 29.5dB.

\* ADC and DSP not included.

PAM-8 symbol consists of three bits. The least significant bit (LSB) being the smallest voltage swing has the lowest signal-to-noise ratio, and consequently, have higher bit error rates as compared to other bits. The proposed proposed transceiver achieves a  $BER < 10^{-4}$ .

Table 2.1 shows the performance summary and comparison with the state-of-theart publications. At 39.6 Gb/s, the transceiver can compensate 14-dB loss using a 4-tap time-domain FFE at the receiver with an energy efficiency of 8.66 pJ/bit. At 36 Gb/s, the transceiver achieves  $BER < 10^{-6}$  with an energy efficiency of 8.65 pJ/bit. Note that the transmitter-side FFE was turned off during all the measurements reported in this paper. Fig. 2.17 shows power breakdown of the transceiver at 39.6



Figure 2.17: Power breakdown of the proposed transceiver.

 $\mathrm{Gb/s.}$ 

# Chapter 3: Equalizer-Free Wireline Transceiver with ISI-Resilient Data Encoding for Wireline Communication – Dicode Encoding and Error Correction

#### 3.1 Dicode as ISI Resilient Encoding and Error Correction Concept

When a data pulse is transmitted on a bandwidth-limited channel, it loses its sharpness and spreads out to neighboring bits. This is known as inter-symbol interference (ISI). As a result, the receiver incorrectly samples the data, as shown in Fig. 3.1 (a). The key observation is that due to the spreading of bit, a wrongly sampled bit is of the same value as that of the previous bit. In other words, the incorrectly sampled data have consecutive identical digits (CIDs) i.e.,  $[1, 1, 1, \cdots]$  or  $[-1, -1, -1, \cdots]$ .

In the proposed work, the transmitter encodes the data with the ISI-resilient (ISIR) Dicode such that the Dicode-encoded data do not contain CIDs, as shown in Fig. 3.1 (b). When the encoded data are transmitted on a bandwidth-limited channel, the individual bit spreads to the neighboring bits ISI, as a result, the receiver incorrectly samples the data with CIDs. However, because Dicode-encoded data do not contain CIDs, the receiver can "identify" the incorrectly sampled bit and "correct" the sampled bits. Using the proposed approach, multiple bit errors caused due to ISI can be corrected. As a result, error-free communication can be achieved without using conventional equalizers or filters.



Figure 3.1: (a) Effect of a bandwidth-limited channel on the transmitted pulse and generation of CIDs. (b) Proposed approach on correcting errors due to ISI by encoding data using Dicode to avoid CIDs.

### 3.2 Proposed Architecture

The proposed transceiver architecture is shown in Fig. 3.2. It consists of 3 major blocks: transmitter, receiver, and clock distribution network. The transmitter con-



Figure 3.2: Proposed transceiver block diagram.

sists of a PRBS generator, a digital-domain parallel pre-coder, and an SST output driver with the Dicode encoding functionality. The receiver front-end consists of two voltage amplifiers to increase the input swing,  $4 \times 2$  comparators or slicers operating at a quarter rate (2 comparators per lane), re-timers, two types of error correction logic (ECL), and a Dicode decoder. The clock distribution network has a duty-cycle correction block, a 4-phase generator, and a voltage-controlled delay line.



Figure 3.3: (a) Block diagrams to explain the evolution of the proposed transceiver architecture with pre-coder. (b) Timing diagram of the precoded Dicode architecture.

#### 3.2.1 Pre-coder and Evolution of Transceiver Architecture

The transfer function of Dicode is  $(1 - z^{-1})$ . In this work, a pre-coding operation is employed on the transmitter to avoid error propagation at the receiver. The transfer function of the pre-coder is  $1/(1-z^{-1})$ . Evolutional steps of the proposed transceiver architecture are shown in Fig. 3.3(a) along with the associated timing diagram in Fig. 3.3(b). The evolution starts with a Dicode encoder in Tx and a Dicode decoder in the Rx. In step-1, the adders in the encoder and decoder are replaced with XOR gates (modulo 2 addition). In step-2, the XOR gate in the encoder are split into an adder and a digital rectifier. In step-3, the blocks are re-ordered such that the  $1/(1-z^{-1})$ function implemented with XOR gate is placed in front of the encoder  $(1 - z^{-1})$  and the digital rectifier is moved from Tx to Rx. Finally, in step-4, the digital rectifier is implemented using two comparators and an OR gate, and since it is on the receiver side to convert Dicode data back to NRZ, it is called as the decoder in this work. By placing the pre-coder before the Dicode encoder, logic 1 is encoded as either '1' or '-1' alternatively and logic 0 is encoded as '0'. Therefore, the correlation between the data bits are broken before they are transmitted. Consequently, the current transmitted bit is independent of the previously transmitted bits. As a result, the receiver does not require a feedback architecture and there is no error propagation.

#### 3.2.2 Receiver Front-End

The receiver front-end consists of two voltage amplifiers, as shown in Fig. 3.4(a). High front-end gain helps to overcome the sampler offset and noise. Simulated frequency response in Fig. 3.4(b) shows that the 2 stages of voltage amplifier achieve a gain of 12.5dB and bandwidth of 21.5GHz. Note that there are no equalizers such as CTLE in the receiver front-end. Fig. 3.4(c) shows the simulated DC linearity of a single stage of the voltage amplifier and aggregated two stages in 11dB and P1dB. For clarity, P1dB is defined as the output voltage swing where the gain is 1dB less than small-signal voltage gain. 11dB is the input voltage swing where the output swing is P1dB. At the DC gain of approximately 12.5dB, the 11dB and P1dB for the two voltage amplifiers (combined) are 35.3mV and 145.7mV, respectively. Fig. 3.4(d) shows the simulated power spectral density with a 10GHz input sinusoidal signal. The signal-to-noise-and-distortion ratio (SINAD) is approximately 25dB. The front-end comparators are equipped with binary-weighted capacitor banks for offset cancellation purpose. However, due to the coarse resolution of the capacitor banks, the offset cancellation was not used in the measurements.

#### 3.2.3 Noise Budget of Transceiver

Assuming thermal noise generated by the SST output driver on the transmitter is attenuated by the channel, the total noise power at the input of the comparator/slicer



Figure 3.4: (a) Schematic of voltage amplifier stage in the receiver front-end. (b) Simulated frequency response of the overall amplifier stage. (c) Simulated DC linearity of single stage and aggregated voltage amplifier. (d) Simulated AC linearity of the receiver analog front-end.



Figure 3.5: (a) Block diagram of the full-rate Dicode receiver. (b) Truth table of ECL-1 and Dicode decoder. (c) Timing diagram demonstrating the operation of ECL-1.

can be mathematically expressed as:

$$P_{n,RX} = P_{n,Term} + P_{n,Amp} + P_{n,Comp,in}$$

$$(3.1)$$

where  $P_{n,Term}$  is the noise power generated by the two 50 $\Omega$  termination resistors,  $P_{n,Amp}$  is the noise power generated by the two amplifiers at the front-end, and  $P_{n,Comp,in}$  is the input-referred noise power of the comparator. Total simulated integrated noise at the input of the comparator is 2.66mVrms. To get a certain bit-error rate (BER), one needs to achieve a minimum signal-to-noise ratio (SNR) at the input of the slicer. Assuming Gaussian noise, the relation between SNR and BER can be mathematically derived as:

$$\frac{1}{2}\left(1 + erf\left(\frac{SNR}{2\sqrt{2}}\right)\right) = 1 - BER \tag{3.2}$$

where erf() represents the error function. For achieving BER <  $10^{-12}$ , the minimum required SNR obtained from (3.2) is 23dB. The minimum required SNR is expressed as  $P_{SNR}$ . In the proposed scheme, the Dicode signal has 3 voltage levels. Therefore, the Dicode signal has half the swing as compared to the NRZ. The loss in the signal swing due to encoding is expressed as  $P_{Dicode}$ =6dB. Due to 50 $\Omega$  termination between Tx and Rx, signal swing reduction is expressed as  $P_{50\Omega}$ =6dB. The channel insertion loss is expressed as  $P_{Channel}$ . Finally, the receiver amplifier provides 12.5dB amplification and this signal gain is expressed as  $P_{Amp}$ . Mathematically, the minimum transmitted signal power required to meet a certain BER in Dicode can be mathematically expressed as:

$$P_{sig,TX,min} = P_{n,RX} \times (Path \ Loss)$$

$$= P_{n,RX} + P_{SNR} + P_{Dicode} + P_{50\Omega} + P_{Channel} - P_{Amp}.$$
(3.3)

Note that  $P_{Amp}$  term is subtracted because it represents an amplification. Assuming 25dB insertion loss from the channel (based on the measurement result), the total path loss is 23dB + 6dB + 6dB + 25dB - 12.5dB = 47.5dB. Therefore, the minimum transmitted differential signal swing is

$$2.66mV \times 47.5dB = 2.66mV \times 237.1 = 0.631V_{pp}.$$
(3.4)

The transmitted signal swing before termination is  $2V_{pp,diff}$  since the supply voltage for the transmitter is 1V. Therefore, the proposed scheme has a noise margin of

$$20\log_{10}\left(\frac{2}{0.631}\right) = 10.02dB. \tag{3.5}$$

#### 3.2.4 Error Correction Logic-1

The proposed error correction logic-1 (ECL-1) architecture, its truth table, and timing diagram are shown in Fig. 3.5. When the Dicode-encoded signal T[n] is transmitted through a wireline channel, the '1' and '-1' pulses spread out due to ISI. As a result, the received signal R(t) is sampled with errors. To detect and correct bit errors, the positive and negative side of ECL-1 independently compares the previously sampled bit with the current bit.

Example-1: If both the previous and current bits are '1'  $(S_H[n-1]=1, S_H[n]=1)$ , ECL-1 detects this as an error since two consecutive '1's and '-1's do not exist in Dicode-encoded data. This example assumes that the channel is post-cursor dominant, implying that the current bit '1' is, in fact, the post-cursor of the previous bit '1'. Therefore, ECL-1 corrects the current bit to '0'  $(C_H[n]=0)$ . Dicode decoder takes the corrected  $C_H[n]$  and  $C_L[n]$  to generate NRZ data Q[n]. There are no feedback loops in the proposed decoder which helps to avoid timing constraints. The latency of this decoding logic is 1 UI.

In the case of post-cursor dominant channels, the proposed ECL-1 can ideally correct an infinite number of post-cursors, due to its repetitive pattern.



Figure 3.6: (a) An example where ECL-1 fails with higher channel loss in the presence of both pre and post cursors. (b) An example where three consecutive data bits are processed by ECL-2 to correct the ISI errors.

*Example-2:* Assume a bit '1' is transmitted and due to a severe post-cursor ISI, infinite '1's will be sampled at the receiver. ECL-1 will detect them as errors. Based on the truth table of ECL-1 (see Fig. 3.5(b)), all bits except the first one will be corrected to '0'.

At a higher loss, both pre-cursors and post-cursors could be dominant and one such example of data waveform through such channels is shown in Fig. 3.6(a). The 1<sup>st</sup> pre-cursor and the 1<sup>st</sup> post-cursor are greater than the sampling threshold  $(V_{TH,H})$ , which means that the receiver will sample three consecutive 1s. When such data are processed by ECL-1, the data bits are falsely corrected since ECL-1 assumes that the first '1' of the consecutive '1's is the main-cursor. The errors due to ISI can be corrected if ECL processes three consecutive bits at a time, instead of two by ECL-1, as shown in Fig. 3.6(b). By processing three bits, the ECL recognizes that the second '1' is the main-cursor and the other two bits are the  $1^{st}$  pre-cursor and the  $1^{st}$  postcursor. Therefore, a different ECL (ECL-2) is required to handle ISI errors due to both pre-cursors and post-cursors.

#### 3.2.5 Error Correction Logic-2

The proposed error correction logic-2 (ECL-2) architecture is shown in Fig. 3.7(a). It detects the patterns with consecutive 1s by observing three consecutive samples at the same time ( $S_H$ [n-1],  $S_H$ [n], and  $S_H$ [n+1]) instead of two in the ECL-1 case. This allows ECL-2 to handle larger and multiple pre-cursors and post-cursors. The outputs of ECL-2 ( $L_H$ [n] for the positive side, and  $L_L$ [n] for the negative side) are programmable based on three types of channel profiles: (a) pre-cursor and post-cursor dominant, (b) post-cursor dominant and (c) pre-cursor dominant, as shown in the truth table in Fig. 3.7(b). ISI errors from a pre-cursor dominant, a pre+post cursor dominant, and a post-cursor dominant channel are shown in Fig. 3.7(c). The bit '1' in the circle is the main-cursor and other '1's are due to ISI. It can be observed that different data patterns can result in similar received (sampled) data with ISI errors when transmitting through different channel characteristics. Therefore, programmability is necessary for ECL-2 to identify the main-cursor from the pre-/post-cursors. A



Figure 3.7: (a) Block diagram of the proposed error correction logic-2 (ECL-2). (b) Truth table of the programmable correction logic in ECL-2. (c) Different channel responses generating similar received (sampled) data with ISI errors.

detailed operation of ECL-2 for pre-cursor and post-cursor dominant channel  $(4^{th}$  column of the truth table) can be understood with the following examples:

*Example-1:* The three consecutive slicer outputs for the positive side of the decoder are  $S_H[n-1]=1$ ,  $S_H[n]=1$ ,  $S_H[n+1]=1$ . The positive-side decoder detects that the only valid '1' corresponds to the main tap  $S_H[n]=1$ .  $S_H[n-1]$  and  $S_H[n+1]$  are a pre-cursor and a post-cursor, respectively, and should be '0'. Therefore, the output of ECL-2,  $L_H[n]$ , is 1.

*Example-2:* The three consecutive slicer outputs for the positive side of the decoder are  $S_H[n-1]=1$ ,  $S_H[n]=1$ ,  $S_H[n+1]=0$ . The positive-side decoder detects  $S_H[n-1]$  as the main-cursor where as  $S_H[n]$  as the first post-cursor. Therefore, the output of ECL-2,  $L_H[n]$ , is 0.

*Example-3:* The three consecutive slicer outputs for the positive side of the decoder are  $S_H[n-1]=1$ ,  $S_H[n]=0$ ,  $S_H[n+1]=1$ . Since both  $S_H[n-1]$  and  $S_H[n+1]$  are 1, the only valid Dicode pattern is [1,-1,1]. That is, the middle '-1' should be sampled by the negative side sampler or  $S_L[n]=1$ . However, due to thermal noise or ISI, the negative side sampler can make an error. Therefore, in this logic, regardless of the negative side sampler sampling it as '-1' or '0' (due to noise or ISI), the output of ECL-2,  $L_H[n]$ , is 1.

The proposed ECL-2 will fail when the channel loss is so severe that too many precursors and post-cursors are sampled as 1 by the receiver. In that situation, a different ECL with more than three consecutive samples as input must be used. In Section V, the relationship between the achievable data rate on a bandwidth-limited channel as a function of the number of input to error correction logic is mathematically derived.

#### 3.3 Comparison to Existing Techniques

This section compares the proposed encoding and error correction with conventional equalization and error correction schemes.

## 3.3.1 Differences and Advantages over Feed Forward Equalizers (FFE) and Tomlinson-Harashima Pre-coding

Aim of the feed-forward equalizers (FFE) is to achieve an "open eye" at the receiver so that the comparator/slicer can sample the correct data, as shown in Fig 3.8(a). To achieve this aim, the FFE employs a digital high-pass finite impulse response (FIR) filter, typically on the transmitter, that spectrally shapes the data to compensate the low-pass communication channel. Transmitter-side FFEs require a back-channel to tune the FFE coefficients, which is expensive in most wireline systems. As the insertion loss of the channel increases, the FFE requires a larger number of taps to achieve low bit error rate. Implementing a multi-tap, high-resolution FFE requires the source series terminated (SST) driver to have multiple segments, which results in high pre-driver parasitic losses [57] and therefore, results in higher power consumption. Tomlinson-Harashima (TH) pre-coding requires a feedback structure with tight timing constraints, segmentation of the output driver to generate multiple voltage levels, and TH filter coefficient tuning with back-channels [58, 59, 60, 61].

In the proposed approach, the transmitter transmits Dicode-encoded data in 3 voltage levels. There is no spectral shaping of data based on the channel loss. There

are no coding coefficients, and therefore, the proposed transmitter scheme does not require any kind of tuning or back-channels. In the proposed approach, the receiver samples incorrect data from a "closed eye" and achieves low BER by performing error correction, as shown in Fig. 3.8(d). The ISI resilient encoding approach aims to avoid consecutive '1' and '-1' in the data stream. Subtracting consecutive bits with the help of Dicode  $(1 - z^{-1})$  encoding is one such way/scheme of avoiding consecutive '1' and '-1' in the data stream.

The Dicode encoding and error correction concept is not limited to any particular communication channel. The limitation of achieving error-free communication for a given communication channel only comes from the type and complexity of the error correction logic employed at the receiver. The proposed encoding and error correction scheme is orthogonal to FFE-based equalization, and as a result, one can in principle use both FFE and the proposed encoding and error correction scheme to communicate on a heavy channel loss. When an FFE is employed along with the proposed encoding and error correction concept, the power overhead of such FFE implementation will be segmentation power of the output driver and power consumed in the segmented pre-driver.

# 3.3.2 Differences and Advantages over Decision Feedback Equalization (DFE)

Decision feedback equalizers are implemented on the receivers [62, 63]. DFE requires a tight timing constraint to close the first tap, which limits its use in higher data rate applications. Techniques such as loop unrolling can help to relax the timing, but they require more hardware which increases the power consumption [64, 26, 65]. DFE also suffers from error propagation due to the feedback structure [66]. A good feature of DFE is that it can cancel reflections in the communication channel with the help of deep feedback taps.

The Dicode encoder has  $1-z^{-1}$  transfer function and therefore, the Dicode decoder should have a transfer function of  $1/(1-z^{-1})$ . While the Dicode encoder is a feedforward architecture, the  $1/(1-z^{-1})$  function requires feedback with tight timing constraints. If the  $1/(1-z^{-1})$  decoder is implemented on the receiver, it will also result in error propagation. To avoid the timing constraint and error propagation, the  $1/(1-z^{-1})$ transfer function is implemented on the transmitter before the Dicode encoding. Moving the  $1/(1-z^{-1})$  transfer function on the transmitter is known as pre-coding (discussed in Section-III A). In this work, the Dicode pre-coder is implemented in the sub-rate digital domain (synthesized), which relieves the feedback timing constraints. Moreover, each symbol of the pre-coded Dicode data can be decoded at the receiver without any dependency on previous symbols, which avoid the feedback structure and error propagation at the receiver. Even though it has not been fully investigated, we believe that improvements in the error correction logic would help to cancel reflections in the communication channel.

The proposed scheme is orthogonal to DFE-based equalization, and as a result, one can in principle use both DFE and the proposed encoding and error correction scheme to communicate on a heavy channel loss. The challenges of adding DFE on the Dicode receiver will be similar to the challenges of implementing DFE for a



Figure 3.8: (a) A feed forward equalizer based transceiver architecture. (b) A conventional Duobinary transceiver architecture with an FFE. (c) A transceiver architecture employing Machester, iPWM and CDC line-coding techniques. (d) Proposed Dicode encoding and error correction based transceiver architecture.

PAM-3 receiver.

#### 3.3.3 Differences and Advantages over Duobinary

Conventional Duobinary architecture is shown in Fig 3.8(b). Duobinary signaling aims to reduce the spectral bandwidth of data by reducing the large fast transitions [67, 68, 10]. The objective of the Duobinary transmitter is to create  $1 + z^{-1}$  at the "receiver input" and the data must have an "open eye" at the receiver to achieve low BER. However, to open an eye at the receiver and achieve  $1 + z^{-1}$  by "absorbing" the channel characteristics, the Duobinary system requires multiple, high-resolution FFE taps on the transmitter [68]. Therefore, all the limitations of the FFE are present in the Duobinary.

The proposed scheme does not require any forms of equalization or spectral shaping based on the channel. Thanks to the error correction logic, the proposed scheme achieves low BER with "closed eye" at the receiver, as shown in Fig. 3.8(d).

#### 3.3.4 Differences with Line Coding Schemes

Line coding scheme such as Manchester encoding [46, 17, 19, 20] achieves equalization by shaping the power spectral density of data based on the channel loss profile such that the high-frequency components are amplified and low-frequency components are attenuated. Line coding schemes such as iPWM and CDC [47, 48, 49] achieve equalization by eliminating the ISI caused due to consecutive identical digits through edge modulation and slicing the consecutive identical digits, respectively. The edge modulation coefficients in Manchester, iPWM and the slicing pulse width in CDC are a function of the channel loss profile. Similar to FFE, all the line coding schemes require back-channels to tune their coefficients. Moreover, similar to FFE and DFE, the objective of all the line coding schemes is to "eliminate" the ISI and achieve an "open-eye" at the slicer/comparator input, as shown in Fig. 3.8(c). The correct data is sampled by the receiver.

The objective of the proposed scheme is to "embrace" the ISI and perform error correction after the slicer/comparator. The proposed scheme can work with a "closed eye" at the slicer/comparator input, as shown in Fig. 3.8(d). In other words, incorrect data are sampled by the receiver. There are no coefficients to be tuned on the transmitter, and therefore, there is also no need for back-channels in the proposed approach.

#### 3.3.5 Differences and Advantages over Prior Dicode Works

Comparison and difference between the prior Dicode works and the proposed approach are shown in Fig. 3.9. Capacitive- or inductive-coupled channels have a high-pass transfer function. Prior works have leveraged Dicode signaling for high-pass communication channels because of the absence of low-frequency component in the Dicode spectrum [69, 70, 71]. The prior work requires an "open eye" at the receiver to achieve low BER. The communication distances of only a few  $100\mu$ m have been achieved. There is no ISI error correction done in the prior Dicode works.

In contrast to prior works, the proposed work is targeted towards low-bandwidth wireline channels with communication distances of 10s of inches. The proposed



Figure 3.9: (a) Conventional Dicode transceiver used on capacitor coupled high-pass channel. (b) Proposed Dicode transceiver used on a typical bandwidth limited wireline channel.

scheme achieves low BER even with the "closed eye" at the receiver input. Error correction operation is performed to achieve low BER at the receiver.

### 3.3.6 Differences with Conventional Forward Error Correction (FEC) Coding

Forward error correction (FEC) techniques such as LDPC, Reed-Solomon codes [72], polar codes [73], and Hamming codes [74] can help to improve BER and achieve SNR gain. However, the conventional FEC codes were designed for SNR-limited communication channels with a requirement on maximum pre-FEC BER (ex:  $10^{-3}$ ,  $10^{-4}$ ). If the pre-FEC BER is higher than the required limit, then the application of FEC can result in even worse BER. Successful application of FECs in communication systems with higher pre-FEC BER requires more error correction bits in the FEC, which results in bigger overhead. Since bandwidth-limited channels have very high BER (ex: 0.2) in the presence of ISI, the conventional FECs are not a good fit to correct errors caused due to ISI. Moreover, these FEC techniques cost several 100ns of latency [75, 76] and high power consumption for the error correction logic, and therefore, conventional FECs are not effective in overcoming the bandwidth limitation of wireline channels.

In contrast, the proposed schemes correct ISI errors generated by bandwidthlimited channels with the latency of a few UIs (5 UIs in this work). Simulation suggests that the maximum BER requirement before applying the proposed scheme is as high as 0.38. The proposed scheme is not a replacement for the conventional FECs. The conventional FEC can be used in parallel with the proposed scheme to overcome the SNR limitation and improve the BER even further.

#### 3.4 Mathematical Limit of the Proposed Error Correction Scheme

The proposed ISI error correction concept is based on a truth table. When the receiver employs an error correction logic, which processes two data bits from two comparators, there are  $3^{16}$  possible truth tables or error correction logic solutions. Similarly when the error correction logic process three data bits from two comparators, there are  $3^{64}$ possible truth tables. Since it is impractical to estimate the limits of the individual error correction logic due to large solution space, the following mathematical analysis attempts to find the limits of a general multi-bit input error correction logic.



Figure 3.10: (a) Voltage range of received signal for low channel loss where 2, 3bit input error correction logic works. (b) Voltage range of received signal for high channel loss where 2, 3-bit error correction logic fails.

A graphical representation of voltage levels of '1', '0' and '-1' signals through a low loss channel is shown in Fig. 3.10(a). Due to ISI, the voltage level for signal '0' can be in the voltage range of signal '1' or '-1' and the 2, 3-bit error correction logic can correct this error. It must be noted that the slicer threshold voltages  $V_{TH,H}$  is set below the smallest main tap  $(V_{SMT-P1})$  of signal '1' and  $V_{TH,L}$  is set above the smallest main tap  $(V_{SMT-M1})$  of signal '-1'.

A graphical representation of voltage levels of '1', '0' and '-1' signals through a high loss channel is shown in Fig. 3.10(b). Due to high loss, the voltage range of signal '1' can be below differential 0V, and the voltage range of signal '-1' can be above differential 0V. Under this condition, the signal '0' can be sampled as '1' or '-1' and the 2, 3-bit error correction logic fails to correct this error. Therefore, the



Figure 3.11: (a) Simulated pulse response of channel in (3.7) at 10Gb/s, 15Gb/s and 20Gb/s. (b) Simulated response ['-1', '1', '-1'] pattern to show the position of the smallest main tap  $(V_{SMT-P1})$  of signal '1' at 10Gb/s, 15Gb/s and 20Gb/s.

limit up to which the 2, 3-bit error correction logic corrects for the ISI errors can be mathematically expressed as:

$$V_{SMT-P1} \ge 0; \quad V_{SMT-M1} \le 0.$$
 (3.6)

Assuming the channel response is similar for '1' and '-1' signal,  $V_{SMT-P1} = -V_{SMT-M1}$ . The analysis in this section will derive  $V_{SMT-P1}$  and estimate the limits on ISI correction using (3.6). The communication channel used in this analysis is mathematically expressed in the *s*-domain as:

$$H(s) = \frac{1 + s/z}{(1 + s/p)^4} \tag{3.7}$$

where p and z represents the pole and zero, respectively. This particular channel is chosen because the pulse response of this channel resembles the pulse response of the FR-4 channels used in the measurements. Simulated pulse response of the above channel assuming  $p = 8\pi$  Grad/s and  $z = 12\pi$  Grad/s is shown in Fig. 3.11(a). In Dicode-encoded data, the [-1, 1, -1] data pattern results in the smallest value for the  $V_{SMT-P1}$  and it creates a limiting situation for the 2, 3-bit error correction logic. Simulated pulse responses of the [-1, 1, -1] data pattern at various data rates are shown in Fig. 3.11(b). The magnitude of  $V_{SMT-P1}$  can be expressed as:

$$V_{SMT-P1} = -C_{-1} + C_0 - C_1. ag{3.8}$$

where  $C_{-1}$ ,  $C_0$ , and  $C_1$  are the 1<sup>st</sup> pre-cursor, main-cursor, and 1<sup>st</sup> post-cursor of the pulse response, respectively. Step response of the channel described as (3.7) can be expressed in time domain as:

$$y_0(t) = \mathcal{L}^{-1} \left\{ \frac{1}{s} H(s) \right\}$$
 (3.9)

Input to a single-bit response is a unit-interval pulse, or

$$x(t) = U(t) - U(t - T_0)$$
(3.10)

where U(t) is a unit step function or Heaviside function,  $T_0$  is the unit interval (UI) or a reciprocal of data rate ( $T_0 = 1/f_d$ ). Using (3.9) and (3.10), single-bit response of the channel can be expressed as:

$$y(t) = x(t) * \mathcal{L}^{-1} \{ H(s) \}$$
  
=  $y_0(t) - y_0(t - T_0)$  (3.11)

The position of the main tap is the peak of the single-bit response, or where the time-domain  $1^{st}$  derivative is equal to 0, which can be expressed as:

$$y'(t_m) = 0$$
 (3.12)

where  $t_m$  is the position of main tap  $C_0$  in time. Using (3.11) and (3.12), the magnitude of  $C_{-1}$ ,  $C_0$  and  $C_1$  can be expressed as:

$$C_{-1} = y_0(t_m - T_0) - y_0(t_m - 2T_0)$$

$$C_0 = y_0(t_m) - y_0(t_m - T_0)$$

$$C_1 = y_0(t_m + T_0) - y_0(t_m)$$
(3.13)

Using (3.13) and (3.8), the smallest main tap value  $V_{SMT-P1}$  can be expressed as:

$$V_{SMT-P1} = y_0(t_m - 2T_0) - 2y_0(t_m - T_0) + 2y_0(t_m) - y_0(t_m + T_0)$$
(3.14)

Using (3.14), the magnitude of  $V_{SMT-P1}$  as a function of normalized data rates is shown in Fig. 3.12(a). Based on the magnitudes of the pre-cursors and post-cursors, a different error correction logic is required.



Figure 3.12: (a) Calculated magnitudes of cursors and  $V_{SMT-P1}$  as function of normalized data rate. (b) Calculated normalized data rate as a function of number of error correction logic inputs.

Condition-1: At low data rates when  $C_1 \leq V_{SMT-P1}$ , this implies that the slicer threshold voltage  $V_{TH,H}$  will be placed above the magnitude of  $C_1$ . As a result, all pre-/post-cursors will be sampled as '0' and the main-cursor will be sampled as '1'. Therefore, there is no need for the error correction logic.

Condition-2: At higher data rates when  $C_1 \geq V_{SMT-P1} \geq C_{-1}$ , this implies that



Figure 3.13: (a) Simulated BER vs.  $V_{TH,H}$ ,  $|V_{TH,L}|$  bathtub curves of ECL-1 and ECL-2 on channel 4. (b) Simulated BER vs.  $V_{TH,H}$ ,  $|V_{TH,L}|$  bathtub curves of ECL-1 and ECL-2 on channel 5.

the slicer threshold voltage  $V_{TH,H}$  will be placed below the magnitude of  $C_1$ . As a result, post-cursor  $C_1$  will be sampled as '1'. Therefore, a 2-bit error correction logic is required for the ISI error correction.

Condition-3: When  $C_{-1} \ge V_{SMT-P1} \ge C_2$ , this implies that the slicer threshold voltage  $V_{TH,H}$  will be placed below the magnitude of  $C_{-1}$  and  $C_1$ . As a result, both pre and post-cursors will be sampled as '1'. Therefore, a 3-bit error correction logic is required for the ISI error correction. At even higher data rates, the channel loss will be higher and an error correction logic with more than 3 inputs is required to correct the errors due to heavy ISI. The ISI error correction limit with the proposed error correction logic concept will be reached when  $V_{SMT-P1} = 0$ . A plot of achievable normalized data rate versus the number of inputs to the error correction logic is shown in Fig. 3.12(b). With the assumed channel described in (3.7), the proposed error correction logic approach will reach a limit at approximately 1.54 normalized data rate.

*Example-1:* Assuming the ECL-1 works on channel-1 (used in measurements) up to 14.4 Gb/s (21.6 dB). By increasing the number of bits to ECL from 2 to 7 for channel-1, one can communicate on the same channel at 17.94 Gb/s (25 dB).

*Example-2:* Assuming the ECL-2 works on channel-2 (used in measurements) up to 13.6 Gb/s (24.2 dB). By increasing the number of bits to ECL from 3 to 4 for channel-2, one can communicate on the same channel at 14.9 Gb/s (25.7 dB). While in these examples the achievable data rate and channel loss is estimated based on the mathematical analysis, we believe that innovation in the ECL with improvements in the truth table can help to extend this limit.

A simulation to compare the performance of ECL-1 and ECL-2 under different pre and post-cursor conditions is shown in Fig. 3.13. The comparison is made by plotting the BER bathtub versus the sampling threshold voltage for ECL-1 and ECL-2 (all 3 programmable cases). The simulation is performed using 10<sup>3</sup> bits of PRBS-7 data through (a) channel 4 (see Fig. 3.18(e)) and (b) channel 5 (see Fig. 3.18(f)) that are used in the measurements. Channel 4 is a post-cursor dominant channel, and therefore, the performance of ECL-1 and ECL-2 on channel 4 at 14.4Gb/s is similar.

For channel 5 at 13.6Gb/s which has the considerable 1st pre-cursor, the ECL-1 fails to correct all the bit errors whereas ECL-2 can provide error-free output. Note that ECL-2 programmed for post-cursor dominant channels performs the best for channel 5 and has the widest voltage margin for the samplers.

Mathematically, the range of sampler threshold voltage  $(V_{TH,H} = -V_{TH,L})$  of

ECL-1 can be expressed as:

$$C_{-1} < V_{TH,H} < C_0 - C_1 - C_{-1} (= V_{SMT}).$$
(3.15)

The range of sampler threshold voltage  $(V_{TH,H} = -V_{TH,L})$  of ECL-2 for post-cursor dominant channels can be expressed as:

$$C_{-1} < V_{TH,H} < \min(C_0 - C_1 - C_{-2}, C_0 - C_1 - C_{-1} + C_2).$$

$$(3.16)$$

Note that the upper limit is set by the magnitude of the second smallest main cursor, since the last row of the truth table in Fig. 3.7(b) can help to correct errors generated by the smallest main cursor, and therefore extend the threshold voltage margin further.

## 3.5 Measurement Result

The prototype transceiver was fabricated in 65-nm CMOS and the die micrograph is shown in Fig. 3.14(a). The proposed transceiver operates at 0.7V/0.9V/1V supply and occupies an active area of 0.167mm<sup>2</sup>. Fig. 3.14(b) shows the power breakdown of the proposed scheme at 14.4Gb/s and Fig. 3.14(d) shows the area breakdown. Additional blocks required to implement the proposed encoding and error correction are shown in Fig. 3.14(c). The additional components include digital blocks including Dicode pre-coder, Dicode encoder, Dicode decoder, error correction logic, and 4



Figure 3.14: (a) Die micrograph of the proposed transceiver. (b) Pie chart of the power breakdown at 14.4Gb/s measurement. (c) Additional blocks to implement the proposed scheme in addition to simple NRZ transceiver architecture and their power overhead. (d) Pie chart of the area breakdown.



Figure 3.15: (a) Simulated clock distribution network power breakdown at 14.4Gb/s. (b) Area breakdown of the clock distribution network.

| This Worl                                                       | k                        | JSSC'18    | TOOODO     |                    |            |            |            |             |            |
|-----------------------------------------------------------------|--------------------------|------------|------------|--------------------|------------|------------|------------|-------------|------------|
|                                                                 | This Work                |            | JSSC'08    | ISSCC'18           | ISSCC'18   | ISSCC'14   | JSSC'14    | JSSC'11     | JSSC'10    |
|                                                                 |                          | [50]       | [68]       | [77]               | [47]       | [78]       | [79]       | [80]        | [81]       |
| Technology [nm] 65                                              |                          | 65         | 90         | 40                 | 65         | 28         | 22         | 40          | 65         |
| Encoding Dicode                                                 |                          | 8b10b      | Duobinary  | Framed-            | iPWM       | NRZ        | NRZ        | NRZ         | NRZ        |
| / Modulation                                                    | Modulation               |            |            | Pulsewidth         |            |            |            |             |            |
| Decoding Sequence Detection                                     |                          | Viterbi    | -          | -                  | -          | -          | -          | -           | -          |
| ECL+Decoding Latency 5 UI                                       | y 5 UI                   |            | -          | -                  | -          | -          | -          | -           | -          |
| Equalization X                                                  | X                        |            | 3-tap FFE  | 3-tap FFE          | Passive    | 2-tap FFE  | 3-tap FFE  | 4-tap FFE   | 3-tap FFE  |
|                                                                 |                          |            |            | CTLE               | EQ         | CTLE       | CTLE       | CTLE        | CTLE       |
|                                                                 |                          |            |            |                    |            | 2-tap DFE  | 6-tap DFE  | 10-tap DFE  | 2-tap DFE  |
| Data Rate [Gb/s] 14.4 13.6                                      | 16.0                     | 5          | 20         | 20                 | 16         | 20         | 16         | 14.0245     | 20         |
| Tx+Rx Power [mW] 34.3 <sup>†</sup> 34.8 <sup>†</sup>            | $42.5^\dagger$           | 49.7       | 195        | 90.6               | 30.38      | 120        | 73.6       | $102.5^{*}$ | 87         |
| Clock Power [mW] 36 <sup>‡</sup> 35.3 <sup>‡</sup>              | $39.7^{\ddagger}$        | 21         | -          | -                  | 19.84      | -          | -          | -           | -          |
| Tx+Rx Efficiency $[pJ/bit]$ 2.38 <sup>†</sup> 2.56 <sup>†</sup> | $2.66^{\dagger}$         | 9.94       | 9.5        | 4.53               | $3.14^{*}$ | 6.0*       | $4.6^{*}$  | $7.31^{*}$  | 4.35       |
| Loss@Nyquist [dB] 21.6 24.2                                     | 21.4                     | 21         | 10, 13     | 12                 | 22         | 20         | 24         | 26          | 26         |
| Bit Error Rate         10 <sup>-12</sup> 10 <sup>-12</sup>      | <b>10</b> <sup>-12</sup> | $10^{-11}$ | $10^{-12}$ | $4 \times 10^{-9}$ | $10^{-12}$ | $10^{-10}$ | $10^{-12}$ | $10^{-12}$  | $10^{-12}$ |
| Area [mm <sup>2</sup> ] 0.167                                   |                          | 0.35       | 0.32       | 1.056              | 0.13       | 0.166      | 0.079      | 0.97        | 0.07       |
| Supply [V] 0.7/0.9/1                                            |                          | 0.75/0.95  | 1.5        | 0.9                | 0.9        | 0.9/1.35   | 0.9        | 1.1         | 1.2        |

Table 3.1: Comparison with state-of-the-art transceivers

<sup>†</sup> Clock distribution within Tx and Rx included.

<sup>‡</sup> Simulated clock power includes VCDL, DCC, IQ gen, and buffers. Clock distribution network power included.



Figure 3.16: Transceiver measurement setup with the chip-on-board photograph.

samplers to sample the Dicode. Power and area overhead of these blocks are 12.2%and 12.9% of the NRZ transceiver, respectively. The simulated power break down and area breakdown of the clock distribution network is shown in Fig. 3.15. Power inefficiency of the clock distribution network is because the power consumption of clock distribution outside transmitter and receiver was not optimized in this work.

Measurement setup of the proposed transceiver is shown in Fig. 3.16. The chip and the PCB were bonded on chip-on-board. The clock is generated from the external clock generator and the clock is internally divided to generate four phases and connected to the transmitter and the receiver. The threshold voltages are generated using off-chip voltage regulators and potentiometer (variable resistor). The precision of the voltage levels generated using off-chip regulators is of the order of 1mV. The algorithm for finding the optimal threshold voltages was implemented off-chip using MATLAB. The objective of the algorithm is to minimize the bit error rate. The 2 reference voltages are individually controlled to find the lowest BER. The external voltage supply is controlled by MATLAB through the GPIB control. Also, the MAT-LAB code reads BER from the BER tester. The MATLAB adaptation loop sweeps the reference voltage and converges to the best voltage that gives the lowest BER.

Measured eye diagrams at various test points on the channel at 13.4 Gb/s with channel 2 and 4 is shown in Fig. 3.17(a). Near-end of the output eye diagram has 180mV vertical opening for each open eye. Channel 4 has a 16.4dB loss at 6.7GHz and the far-end eye at the output of channel 4 is closed even before it reaches the receiver front-end. The proposed receiver samples a closed eye and performs error correction. The in-situ eye diagram indicates that the achievable BER of the Dicode receiver is less than  $10^{-12}$  after channel 2 which has a 23.8dB loss at 6.7GHz. Due to the non-linearity of the strongARM-based comparator, the in-situ eye diagram shows asymmetry across the common-mode voltage of 0.7V. The BER sensitivity graphs from the in-situ eye diagram in Fig. 3.17(a) are shown in Fig. 3.17(b). The values of the threshold voltages used in the measurements are 0.8V for  $V_{TH,H}$  and 0.6V for



Figure 3.17: (a) Measured Dicode near-end eye diagram, far-end eye diagram, channel loss profile, and in-situ eye diagram. (b) BER sensitivity graphs with respect to upper threshold voltage  $(V_{TH,H})$  and lower threshold voltage  $(V_{TH,L})$ .

 $V_{TH,L}$ .

Three different communication channels are used in the measurements whose loss profiles are shown in Fig. 3.18(a). At Nyquist frequency, channels have a loss from 21.4dB to 24.2dB. Measured BER bathtub curves using PRBS-7 data on these three channels are shown in Fig. 3.18(b). Operating at 14.4 Gb/s, the proposed transceiver achieves a horizontal eye-opening of 0.06UI or 4.2ps with BER  $< 10^{-12}$ . Fig. 3.18(c) shows measured BER bathtub curves using PRBS-31 data with the BER  $< 10^{-6}$ . Higher BER in PRBS-31 is because longer PRBS data can produce data patterns such that the ISI error in these data patterns was not corrected by the error correction logic implemented in this work. Improvements in the error correction logic can help to reduce the BER for PRBS-31 data. Fig. 3.18(d) shows measured channel loss



Figure 3.18: (a) Measured channel loss profiles. (b) Measured transceiver BER bathtub curves using PRBS-7. (c) Measured transceiver BER bathtub curves using PRBS-31. (d) Measured channel loss profiles without receiver-side loss. Measured single bit response of (e) channel 4 (=channel 1 without receiver-side loss), (f) channel 5 (=channel 2 without receiver-side loss), and (g) channel 6 (=channel 3 without receiver-side loss).

profiles of channel 4, 5, and 6, whose single-bit responses are shown in Fig. 3.18(e), (f), and (g), respectively. Note that channel 4, 5, and 6 are channel 1, 2, and 3 without receiver-side loss including Rx PCB traces, bond wires, SMA connectors.

Table 3.1 shows the performance summary and comparison with state-of-the-art publications. Compared to the transceiver using the Viterbi decoding [50], this work has 30 times shorter decoding latency while communicating on channel with similar or more loss with 4 times better Tx+Rx efficiency at approximately 3 times higher data rate. Comparing with other transceivers compensating similar loss [47, 78, 79, 80, 81], this work has achieved a competitive and better Tx+Rx efficiency numbers.

## Chapter 4: Conclusions

This dissertation presented a time-domain 4-tap FFE-based PAM-8 wireline transceiver. A linearity improvement technique in the receiver is introduced by employing non-linearity cancellation approach. The proposed transceiver can operate at 39.6 Gb/s, compensate 14 dB loss and achieve an energy efficiency of 8.66 pJ/bit in 65 nm CMOS.

This dissertation also proposed an ISI resilient (ISIR) data encoding and error correction concept to achieve error-free communication on bandwidth-limited communication channels. Dicode encoding was proposed as an ISIR encoding and two error correction logic architectures were proposed. The complete transceiver is implemented in 65nm CMOS. Using the error correction logic, error-free communication (BER  $< 10^{-12}$ ) on a 24.2dB channel loss is achieved with an energy efficiency of 2.56pJ/bit. The error correction and decoding latency of 5 UI is achieved. A mathematical analysis to derive the limits on the error correction is presented. The proposed ISIR encoding error correction approach will open up a new dimension of data encoding and error correction for bandwidth-limited channels.

## Bibliography

- T. Anand, Wireline Link Performance Survey. Accessed: May 10, 2020. [Online]. Available: https://web.engr.oregonstate.edu/~anandt/linksurvey, 2020.
- [2] J. Im, K. Zheng, A. Chou, L. Zhou, J. W. Kim, S. Chen, Y. Wang, H. Hung, K. Tan, W. Lin, A. Roldan, D. Carey, I. Chlis, R. Casey, A. Bekele, Y. Cao, D. Mahashin, H. Ahn, H. Zhang, Y. Frans, and K. Chang, "A 112Gb/s PAM-4 long-reach wireline transceiver using a 36-way time-interleaved SAR-ADC and inverter-based RX analog front-end in 7nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 116–118.
- [3] T. Ali, E. Chen, H. Park, R. Yousry, Y. Ying, M. Abdullatif, M. Gandara, C. Liu, P. Weng, H. Chen, M. Elbadry, Q. Nehal, K. Tsai, K. Tan, Y. Huang, C. Tsai, Y. Chang, and Y. Tung, "A 460mW 112Gb/s DSP-based transceiver with 38dB loss compensation for next-generation data centers in 7nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 118–120.
- [4] B. Yoo, D. Lim, H. Pang, J. Lee, S. Baek, N. Kim, D. Choi, Y. Choi, H. Yang, T. Yoon, S. Chu, K. Kim, W. Jung, B. Kim, J. Lee, G. Kang, S. Park, M. Choi, and J. Shin, "A 56Gb/s 7.7mW/Gb/s PAM-4 wireline transceiver in 10nm Fin-FET using MM-CDR-based ADC timing skew control and low-power DSP with approximate multiplier," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 122–124.
- [5] P.-W. Chiu and C. Kim, "A 32Gb/s digital-intensive single-ended PAM-4 transceiver for high-speed memory interfaces featuring a 2-tap time-based decision feedback equalizer and an in-situ channel-loss monitor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 336–338.
- [6] M. LaCroix, H. Wong, Y. H. Liu, H. Ho, S. Lebedev, P. Krotnev, D. A. Nicolescu, D. Petrov, C. Carvalho, S. Alie, E. Chong, F. A. Musa, and D. Tonietto, "A 60Gb/s PAM-4 ADC-DSP transceiver in 7nm CMOS with SNR-based adaptive power scaling achieving 6.9pJ/b at 32dB loss," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 114–116.

- [7] M. Pisati, F. D. Bernardinis, P. Pascale, C. Nani, M. Sosio, E. Pozzati, N. Ghittori, F. Magni, M. Garampazzi, G. Bollati, A. Milani, A. Minuti, F. Giunco, P. Uggetti, I. Fabiano, N. Codega, A. Bosi, N. Carta, D. Pellicone, G. Spelgatti, M. Cutrupi, A. Rossini, R. Massolini, G. Cesura, and I. Bietti, "A sub-250mW 1-to-56Gb/s continuous-range PAM-4 42.5dB IL ADC/DAC-based transceiver in 7nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 116–118.
- [8] T. Ali, R. Yousry, H. Park, E. Chen, P. Weng, Y. Huang, C. Liu, C. Wu, S. Huang, C. Lin, K. Wu, K. Tsai, K. Tan, A. ElShater, K. Chen, W. Tsai, H. Chen, W. Leng, and M. Soliman, "A 180mW 56Gb/s DSP-based transceiver for high density IOs in data center switches in 7nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 118–120.
- [9] C. Loi, A. Mellati, A. Tan, A. Farhoodfar, A. Tiruvur, B. Helal, B. Killips, F. Rad, J. Riani, J. Pernillo, J. Sun, J. Wong, K. Abdelhalim, K. Gopalakrishnan, K. Kim, L. Tse, M. Davoodi, M. Le, M. Zhang, M. Talegaonkar, P. Prabha, R. Mohanavelu, S. Chong, S. Forey, S. Netto, S. Bhoja, W. Liew, Y. Duan, and Y. Liao, "A 400Gb/s transceiver for PAM-4 optical direct-detect application in 16nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 120–122.
- [10] Aurangozeb, C. Dick, M. Mohammad, and M. Hossain, "A 32Gb/s 2.9pJ/b transceiver for sequence-coded PAM-4 signalling with 4-to-6dB SNR gain in 28nm FDSOI CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 480–482.
- [11] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 108–110.
- [12] L. Wang, Y. Fu, M. LaCroix, E. Chong, and A. C. Carusone, "A 64Gb/s PAM-4 transceiver utilizing an adaptive threshold ADC in 16nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 110–112.
- [13] L. Tang, W. Gai, L. Shi, X. Xiang, K. Sheng, and A. He, "A 32Gb/s 133mW PAM-4 transceiver with DFE based on adaptive clock phase and threshold voltage in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 114–116.

- M. Wu, K. Qiu, and G. Zhang, 112Gbps serial transmission over copper - PAM4 vs PAM8 signaling. Accessed: May 10, 2020. [Online]. Available: https://www.xilinx.com/publications/events/designcon/2017/112gbpsserial-transmission-over-copperpam4-vs-pam8-slides.pdf, 2017.
- [15] Y. Chun, A. Ramachandran, and T. Anand, "A PAM-8 wireline transceiver with receiver side PWM (time-domain) feed forward equalization operating from 12to-39.6Gb/s in 65nm CMOS," in ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC), Sep. 2019, pp. 269–272.
- [16] Y. Chun and T. Anand, "A 13.6-16Gb/s wireline transceiver with dicode encoding and sequence detection decoding for equalizing 24.2dB with 2.56pJ/bit in 65nm CMOS," in *Proc. IEEE Custom Int. Circuits Conf. (CICC)*, Apr. 2019, pp. 635–638.
- [17] J. H. R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, "Pulsewidth modulation pre-emphasis applied in a wireline transmitter, achieving 33 dB loss compensation at 5-Gb/s in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 990–998, Apr. 2006.
- [18] J. F. Buckwalter, M. Meghelli, D. J. Friedman, and A. Hajimiri, "Phase and amplitude pre-emphasis techniques for low-power serial links," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1391–1398, Jun. 2006.
- [19] H. Cheng and A. C. Carusone, "A 32/16 Gb/s 4/2-PAM transmitter with PWM pre-emphasis and 1.2 Vpp per side output swing in 0.13-μCMOS," in Proc. IEEE Custom Int. Circuits Conf. (CICC), Sep. 2008, pp. 635–638.
- [20] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, "A 5 Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis," *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.
- [21] A. Ramachandran, A. Natarajan, and T. Anand, "Line coding techniques for channel equalization: integrated pulse-width modulation and consecutive digit chopping," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 3, pp. 1192–1204, 2019.
- [22] A. Ramachandran, Y. Chun, M. Megahed, and T. Anand, "An iPWM linecoding-based wireline transceiver with clock-domain encoding for compensating up to 27-db loss while operating at 0.5-to-0.9 V and 3-to-16 Gb/s in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, pp. 1–14, 2020.

- [23] A. Momtaz and M. M. Green, "An 80 mW 40 Gb/s 7-tap T/2-spaced feedforward equalizer in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 3, pp. 629–639, 2010.
- [24] E. Mammei, F. Loi, F. Radice, A. Dati, M. Bruccoleri, M. Bassi, and A. Mazzanti, "Analysis and design of a power-scalable continuous-time FIR equalizer for 10 Gb/s to 25 Gb/s multi-mode fiber EDC in 28 nm LP CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 3130–3140, Dec. 2014.
- [25] J. Han, Y. Lu, N. Sutardja, and E. Alon, "A 60Gb/s 288mW NRZ transceiver with adaptive equalization and baud-rate clock and data recovery in 65nm CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 112–113.
- [26] S. Kasturia and J. H. Winters, "Techniques for high-speed implementation of nonlinear cancellation," *IEEE Journal on Selected Areas in Communications*, vol. 9, no. 5, pp. 711–717, 1991.
- [27] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, S. Wu, J. D. Powers, M. U. Erdogan, A. Yee, R. Gu, L. Wu, Y. Xie, B. Parthasarathy, K. Brouse, W. Mohammed, V. G. K. Heragu, L. Dyson, and W. Lee, "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2646–2655, Dec. 2005.
- [28] Y. Lu and E. Alon, "A 66Gb/s 46mW 3-tap decision-feedback equalizer in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 30–31.
- [29] S. Gondi and B. Razavi, "Equalization and clock and data recovery techniques for 10-Gb/s CMOS serial-link receivers," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 9, pp. 1999–2011, Aug. 2007.
- [30] K. Zheng, Y. Frans, K. Chang, and B. Murmann, "A 56 Gb/s 6 mW 300 um<sup>2</sup> inverter-based CTLE for short-reach PAM2 applications in 16 nm CMOS," in *Proc. IEEE Custom Int. Circuits Conf. (CICC)*, Apr. 2018, pp. 1–4.
- [31] J. F. Bulzacchelli, "Equalization for electrical links: current design techniques and future directions," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 4, pp. 23–31, 2015.

- [32] C. Thakkar, N. Narevsky, C. D. Hull, and E. Alon, "Design techniques for a mixed-signal I/Q 32-coefficient rx-feedforward equalizer, 100-coefficient decision feedback equalizer in an 8 Gb/s 60 GHz 65 nm LP CMOS receiver," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2588–2607, 2014.
- [33] R. Boesch, K. Zheng, and B. Murmann, "A 0.003 mm<sup>2</sup> 5.2 mW/tap 20 GBd inductor-less 5-tap analog RX-FFE," in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), 2016, pp. 1–2.
- [34] J. Cao, D. Cui, A. Nazemi, T. He, G. Li, B. Catli, M. Khanpour, K. Hu, T. Ali, H. Zhang, H. Yu, B. Rhew, S. Sheng, Y. Shim, B. Zhang, and A. Momtaz, "A transmitter and receiver for 100Gb/s coherent networks with integrated 4x64GS/s 8b ADCs and DACs in 20nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 484–485.
- [35] J. Kim, E. H. Chen, J. Ren, B. S. Leibowitz, P. Satarzadeh, J. L. Zerbe, and C. K. Yang, "Equalizer design and performance trade-offs in ADC-based serial links," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 58, no. 9, pp. 2096–2107, 2011.
- [36] E.-H. Chen, R. Yousry, and C. K. Yang, "Power optimized ADC-based serial link receiver," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, 2012.
- [37] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno, and D. J. Friedman, "A 19-Gb/s serial link receiver with both 4-tap FFE and 5-tap DFE functions in 45-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3220–3231, Dec. 2012.
- [38] S. Song and V. Stojanović, "A 6.25 Gb/s voltage-time conversion based fractionally spaced linear receive equalizer for mesochronous high-speed links," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1183–1197, May 2011.
- [39] P. Chiu, S. Kundu, Q. Tang, and C. H. Kim, "A 65-nm 10-Gb/s 10-mm on-chip serial link featuring a digital-intensive time-based decision feedback equalizer," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1203–1213, Apr. 2018.
- [40] I. Yi, M. Chae, S. Hyun, S. Bae, J. Choi, S. Jang, B. Kim, J. Sim, and H. Park, "A time-based receiver with 2-tap decision feedback equalizer for single-ended mobile DRAM interface," *IEEE J. Solid-State Circuits*, vol. 53, no. 1, pp. 144– 154, Jun. 2018.

- [41] P. Chiu, S. Kundu, Q. Tang, and C. H. Kim, "A 10Gb/s 10mm on-chip serial link in 65nm CMOS featuring a half-rate time-based decision feedback equalizer," in 2017 Symposium on VLSI Circuits, 2017, pp. C56–C57.
- [42] F. O'Mahony. ISSCC 2017 trends. [Online]. Available: http://isscc.org/wpcontent/uploads/2018/06/2017\_Trends.pdf
- [43] G. Gangasani, J. F. Bulzacchelli, M. Wielgos, W. Kelly, V. Sharma, A. Prati, G. Cervelli, D. Gardellini, M. Baecher, M. Shannon, T. Beukema, J. Garlett, H. H. Xu, T. Toifl, M. Meghelli, J. Ewen, and D. Storaska, "A 28.05Gb/s transceiver using quarter-rate triple-speculation hybrid-DFE receiver with calibrated sampling phases in 32nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, 2017, pp. C326–C327.
- [44] T. Norimatsu, T. Kawamoto, K. Kogo, N. Kohmu, F. Yuki, N. Nakajima, T. Muto, J. Nasu, T. Komori, H. Koba, T. Usugi, T. Hokari, T. Kawamata, Y. Ito, S. Umai, M. Tsuge, T. Yamashita, M. Hasegawa, and K. Higeta, "A 25Gb/s multistandard serial link transceiver for 50dB-loss copper cable in 28nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Jan. 2016, pp. 60–61.
- [45] G. Balamurugan, B. Casper, J. Jaussi, M. Mansuri, F. O'Mahony, and J. Kennedy, "Modeling and analysis of high-speed I/O links," *IEEE Trans. Adv. Packag.*, vol. 32, no. 2, pp. 237–247, May 2009.
- [46] G. E. Thomas, "Magnetic storage," in 1st Cambridge Computer Conf., 1949.
- [47] A. Ramachandran and T. Anand, "A 0.5-to-0.9V, 3-to-16Gb/s, 1.6-to-3.1pJ/bit wireline transceiver equalizing 27dB loss at 10Gb/s with clock domain encoding: integrated pulse width modulation (iPWM) in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2018, pp. 376–377.
- [48] A. Ramachandran, A. Natarajan, and T. Anand, "A 16Gb/s, 3.6pJ/bit wireline transceiver with phase domain equalization scheme: integrated pulse width modulation (iPWM) in 65nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 488–489.
- [49] A. Ramachandran, A. Natarajan, and T. Anand, "Line coding techniques for channel equalization: integrated pulse-width modulation and consecutive digit chopping," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 3, pp. 1192–1204, Mar. 2019.

- [50] S. Song, K. D. Choo, T. Chen, S. Jang, M. P. Flynn, and Z. Zhang, "A maximumlikelihood sequence detection powered ADC-based serial link," *IEEE Transactions on Circuits and Systems-I: Regular Papers*, vol. 65, no. 7, pp. 2269–2278, Jul. 2018.
- [51] Y. Frans, J. Shin, L. Zhou, P. Upadhyaya, J. Im, V. Kireev, M. Elzeftawi, H. Hedayati, T. Pham, S. Asuncion, C. Borrelli, G. Zhang, H. Zhang, and K. Chang, "A 56-Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16-nm FinFET," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 4, pp. 1101–1110, 2017.
- [52] L. Wang, Y. Fu, M. LaCroix, E. Chong, and A. Chan Carusone, "A 64-Gb/s 4-PAM transceiver utilizing an adaptive threshold ADC in 16-nm FinFET," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 2, pp. 452–462, 2019.
- [53] P. Upadhyaya, C. F. Poon, S. W. Lim, J. Cho, A. Roldan, W. Zhang, J. Namkoong, T. Pham, B. Xu, W. Lin, H. Zhang, N. Narang, K. H. Tan, G. Zhang, Y. Frans, and K. Chang, "A fully adaptive 19–58-Gb/s PAM-4 and 9.5–29-Gb/s NRZ wireline transceiver with configurable ADC in 16-nm Fin-FET," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 1, pp. 18–28, 2019.
- [54] B. Razavi, *Design of Analog CMOS Integrated Circuits*, 1st ed. McGraw-Hill, 2001.
- [55] E. Frlan, OIF's CEI 56G Interfaces Key Building Blocks for Optics in the 400G Data Center. Accessed: May 10, 2020. [Online]. Available: https://www.oiforum.com/wp-content/uploads/2019/01/150928\_Mkt-Focus-ECOC-Panel-OIF.pdf, Sep. 28, 2015.
- [56] P. Peng, J. Li, L. Chen, and J. Lee, "A 56Gb/s PAM-4/NRZ transceiver in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 110–111.
- [57] H. Hatamkhani, K.-L. J. Wong, R. Drost, and C.-K. K. Yang, "A 10-mW 3.6-Gbps I/O transmitter," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2003, pp. 97–98.
- [58] M. Tomlinson, "New automatic equaliser employing modulo arithmetic," *Electron. Lett.*, vol. 7, no. 5/6, pp. 138–139, Mar. 1971.

- [59] H. Harashima and H. Miyakawa, "Matched-transmission technique for channels with intersymbol interference," *IEEE Trans. Commun.*, vol. COM-20, no. 4, pp. 774–780, Aug. 1971.
- [60] Y. Gu and K. K. Parhi, "High-speed architecture design of tomlinson-harashima precoders," *IEEE Transactions on Circuits and Systems I*, vol. 54, no. 9, pp. 1929–1937, Sep. 2007.
- [61] M. Kossel, T. Toifl, P. A. Francese, M. Brändli, C. Menolfi, P. Buchmann, L. Kull, T. M. Andersen, and T. Morf, "A 10 Gb/s 8-Tap 6b 2-PAM/4-PAM Tomlinson-Harashima precoding transmitter for future memory-link applications in 22nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3268–3284, Dec. 2013.
- [62] J. Han, Y. Lu, N. Sutardja, K. Jung, and E. Alon, "A 60Gb/s 173mW receiver frontend in 65nm CMOS technology," *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, pp. C230–C231, Jun. 2015.
- [63] J. Im, D. Freitas, A. Roldan, R. Casey, S. Chen, A. Chou, T. Cronin, K. Geary, S. McLeod, L. Zhou, I. Zhuang, J. Han, S. Lin, P. Upadhyaya, G. Zhang, Y. Frans, and K. Chang, "A 40-to-56Gb/s PAM-4 receiver with 10-tap direct decision-feedback equalization in 16nm FinFET," in *Proc. IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 114–115.
- [64] K. K. Parhi, "High-speed architectures for algorithms with quantizer loops," in IEEE Int. Symp. Circuits and Systems (ISCAS), May. 1990, pp. 2357–2360.
- [65] V. Stojanović, A. Ho, B. Garlepp, F. Chen, and J. Wei, "Adaptive equalization and data recovery in a dual-mode (PAM2/4) serial link transceiver," in *Proc. IEEE Symp. VLSI Circuits*, 2004, pp. 348–351.
- [66] R. Narasimha, N. Warke, and N. Shanbhag, "Impact of DFE error propagation on FEC-based high-speed I/O links," in *IEEE Global Telecommunications Conference (GLOBECOM)*, Nov. 2009, pp. 1–6.
- [67] A. Lender, "The duobinary technique for high-speed data transmission," IEEE Trans. Commun. Electron., vol. 82, no. 2, pp. 214–218, May 1963.
- [68] J. Lee, M.-S. Chen, and H.-D. Wang, "Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120–2133, Sep. 2008.

- [69] M. Hossain and A. C. Carusone, "Multi-Gb/s bit-by-bit receiver architectures for 1-D partial-response channels," in *IEEE Transactions on Circuits and Systems-I: Regular Papers*, vol. 57, no. 1, Jan. 2010, pp. 270–279.
- [70] W.-J. Yun, S. Nakano, W. Mizuhara, A. Kosuge, N. Miura, H. Ishikuro, and T. Kuroda, "A 7 Gb/s/link noncontact memory module for multi-drop bus system using energy-equipartitioned coupled transmission line," in *Proc. IEEE ISSCC Dig. Tech. Papers*, Feb. 2012, pp. 52–53.
- [71] A. Kosuge, W. Mizuhara, N. Miura, M. Taguchi, H. Ishikuro, and T. Kuroda, "A 12.5Gb/s/link non-contact multi drop bus system with impedance-matched transmission line couplers and Dicode partial-response channel transceivers," in *Proc. IEEE Custom Int. Circuits Conf. (CICC)*, Sep. 2012, pp. 7.9.1–7.94.
- [72] I. S. Reed and G. Solomon, "Polynomial codes over certain finite fields," J. Soc. Ind. Appl. Math., vol. 8, no. 2, pp. 300–304, Jun. 1960.
- [73] E. Arıkan, "Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels," *IEEE Trans. Inform. Theory*, vol. 55, no. 7, pp. 3051–3072, Jul. 2009.
- [74] R. W. Hamming, "Error detecting and error correcting codes," Bell System Technical Journal, vol. 29, no. 2, pp. 147–160, 1950.
- [75] D. Wu, Y. Chen, Q. Zhang, Y. I. Ueng, and X. Zeng, "Strategies for reducing decoding cycles in stochastic LDPC decoders," *IEEE Tran. on Circuits and Systems II*, vol. 63, no. 9, pp. 873–877, Sep. 2016.
- [76] Y. L. U. et al., "An efficient combined bit-flipping and stochastic LDPC decoder using improved probability tracers," *IEEE Tran. on Signal Processing*, vol. 65, no. 20, pp. 5368–5380, Oct. 2017.
- [77] S. Jeon, W. Kwon, T. Yoon, J.-H. Yoon, K. Kwon, J. Yang, and H.-M. Bae, "A 20Gb/s transceiver with framed-pulsewidth modulation in 40nm CMOS," in *Proc. IEEE ISSCC Dig. Tech. Papers*, Feb. 2018, pp. 270–272.
- [78] V. Balan, O. Oluwole, G. Kodani, C. Zhong, S. Maheswari, R. Dadi, A. Amin, G. Bhatia, P. Mills, A. Ragab, and E. Lee, "A 130mW 20Gb/s half-duplex serial link in 28nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2014, pp. 438–439.

- [79] T. Musah, J. E. Jaussi, G. Balamurugan, S. Hyvonen, T. Hsueh, G. Keskin, S. Shekhar, J. Kennedy, S. Sen, R. Inti, M. Mansuri, M. Leddige, B. Horine, C. Roberts, R. Mooney, and B. Casper, "A 4-32 Gb/s bidirectional link with 3-tap FFE/6-tap DFE and collaborative CDR in 22 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 3079–3090, Dec. 2015.
- [80] F. Zhong, S. Quan, W. Liu, P. Aziz, T. Jing, J. Dong, C. Desai, H. Gao, M. Garcia, G. Hom, T. Huynh, H. Kimura, R. Kothari, L. Li, C. Liu, S. Lowrie, K. Ling, A. Malipatil, R. Narayan, T. Prokop, C. Palusa, A. Rajashekara, A. Sinha, C. Zhong, and E. Zhang, "A 1.0625-14.025 Gb/s multi-media transceiver with full-rate source-series-terminated transmit driver and floatingtap decision-feedback equalizer in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 3126–3139, Dec. 2011.
- [81] H. Wang and J. Lee, "A 21-Gb/s 87-mW Transceiver With FFE/DFE/Analog Equalizer in 65-nm CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 909–920, Apr. 2010.