Design of Low-Power NRZ/PAM-4 Wireline Transmitters by Yang, Hae Woong




Submitted to the Office of Graduate and Professional Studies of
Texas A&M University
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
Chair of Committee, Samuel Palermo
Committee Members, Aydin I. Karsilayan
Laszlo Kish
Rabi Mahapatra
Head of Department, Miroslav M. Begovic
December 2018
Major Subject: Electrical Engineering
Copyright 2018 Hae-Woong Yang
ABSTRACT
Rapid growing demand for instant multimedia access in a myriad of digital devices has pushed
the need for higher bandwidth in modern communication hardwares ranging from short-reach (SR)
memory/storage interfaces to long-reach (LR) data center Ethernets. At the same time, compre-
hensive design optimization of link system that meets the energy-efficiency is required for mobile
computing and low operational cost at datacenters. This doctoral study consists of design of two
low-swing wireline transmitters featuring a low-power clock distribution and 2-tap equalization in
energy-efficient manners up to 20-Gb/s operation. In spite of the reduced signaling power in the
voltage-mode (VM) transmit driver, the presence of the segment selection logic still diminishes the
power saving benefit.
The first work presents a scalable VM transmitter which offers low static power dissipation
and adopts an impedance-modulated 2-tap equalizer with analog tap control, thereby obviating
driver segmentation and reducing pre-driver complexity and dynamic power. Per-channel quadra-
ture clock generation with injection-locked oscillators (ILO) allows the generation of rail-to-rail
quadrature clocks. Energy efficiency is further improved with capacitively driven low-swing global
clock distribution and supply scaling at lower data rates, while output eye quality is maintained at
low voltages with automatic phase calibration of the local ILO-generated quarter-rate clocks. A
prototype fabricated in a general purpose 65 nm CMOS process includes a 2 mm global clock
distribution network and two transmitters that support an output swing range of 100-300mV with
up to 12-dB of equalization. The transmitters achieve 8-16 Gb/s operation at 0.65-1.05 pJ/b energy
efficiency.
The second work involves a dual-mode NRZ/PAM-4 differential low-swing voltage-mode (VM)
transmitter. The pulse-selected output multiplexing allows reduction of power supply and deter-
ministic jitter caused by large on-chip parasitic inherent in the transmission-gate-based multiplex-
ers in the earlier work. Analog impedance control replica circuits running in the background pro-
duce gate-biasing voltages that control the peaking ratio for 2-tap feed-forward equalization and
ii
PAM-4 symbol levels for high-linearity. This analog control also allows for efficient generation of
the middle levels in PAM-4 operation with good linearity quantified by level separation mismatch
ratio of 95%. In NRZ mode, 2-tap feedforward equalization is configurable in high-performance
controlled-impedance or energy-efficient impedance-modulated settings to provide performance
scalability. Analytic design consideration on dynamic power, data-rate, mismatch, and output
swing brings optimal performance metric on the given technology node. The proof-of-concept
prototype is verified on silicon with 65 nm CMOS process with improved performance in speed
and energy-efficiency owing to double-stack NMOS transistors in the output stage. The transmitter
consumes as low as 29.6mW in 20-Gb/s NRZ and 25.5mW in the 28-Gb/s PAM-4 operations.
iii
DEDICATION
To my mother, my father, my wife, and my son.
iv
ACKNOWLEDGMENTS
"Looking unto Jesus, the author and finisher of our faith,
who for the joy that was set before Him endured the cross, despising the shame,
and has sat down at the right hand of the throne of God."– Hebrew 12:2 (NKJV)
During my years at A&M, it has been a great journey in my life. College Station has been my
home. And, God willing, wherever I go, this town shall always remain my home in my heart. As
I reflect upon my graduate studies, the words dedication, perseverance, hard work, sacrifice, faith,
collaboration, cameraderie and love come to my mind. As we all know, for any man to accomplish
something, he must have a team – a very incredible, special team; people that he can learn from,
count on, and rely upon through everything – the highs and lows, the success and failure, and even
the joy and sorrow that happen both in and out of the Wisenbaker Building.
First and foremost, I would like to give thanks to the best Aggie that I know, my research ad-
visor Professor Samuel Palermo for all of his guidance, teaching, support, and faith he provided to
me thoughout my doctoral study. He has also provided me with the best opportunity to participate
in various fascinating research projects whenever and wherever he thought I needed to learn from.
With his encouragement, I could learn "Never give up" spirit even when things are not optimistic.
With his guidance, he has shaped me into a competant engineer, researcher, and lecturer. Word
cannot express how thankful I am.
I would like to express my thanks to some of other exellent A&M professors that have made
my expierence even more special. I want to thank Dr. Takis Zourntos who invited me to the
realm of microelectronics in my junior year at college. I am grateful to Dr. Aydin Karsilayan who
convinced me that electronics was even more fun with many interesting classes and discussions
since college and didn’t mind serving on my thesis committee. I also thank Dr. Laszlo Kish for
his exellent lecture on low-noise electronic design and being my thesis committee. I would like to
thank Dr. Rabi Mahapatra of Texas A&M’s Computer Science for serving as a committee member
v
of my Ph.D dissertation and advice to my oral exam.
I would like to extend my gratitude to many other professors at A&M and even other schools. I
thank Dr. Edgar Sanchez-Sinencio, Dr. Jose Silva-Martinez, Dr. Kamran Entesari, and Dr. Deepa
Kundur for their teaching of many disciplines in circuits and systems. I also thank Professor
Peter Howard of Mathematics for his excellant teaching of math modeling class. I want to thank
Professor Patrick Chiang of Oregon State University and his former graduate student Hao Li for
their great collaboration on my first successful tape-out. I also thank another great but "false"
Aggie, Professor Ben Yoo of UC Davis for an opportunity to expand my horizon with recent study
on photonic interconnect and advanced FinFET CMOS technology node. I want to thank Ella for
always being at the front to help students when we need.
All this journey would not have finished without my former and current colleagues in the re-
search group. I owe Younghoon a lot for introducing me into the low-power serial I/O research
as well as plentiful learning resources and encouragement. I thank Byungho for sharing many ex-
prience and support. I thank and miss former colleagues, Ehsan, Ayman, Shaun, Ahmed, Osama,
Cheng, Ashkan, Shengchang, Takayuki, Keytaek, Kunzhi, Ali, and late friend Alex Edward. I
thank my current colleagues Po-Hsuan, Yuanming, Yanghang, Ankur, Peng, Gaurav, and Hyun-
gryul for their assistance, hard work and camaraderie. I will treasure moments that I share with
coffee buddies from Korea, Kyoohyun, Sanghoon, Eric, and Sungjoon. I am very blessed to have
these bright minds around me for greater motivations and challenges.
When there seems to be no hope and faith after disappointments and failures, the last thing I
ever want to worry about would be whether or not I’ll still be loved or accepted by those dearest
to me. I never had that kind of burden because of you. I confess my thank and love to my parents
In-Cheol and Mi-Young for their unfailing love and giving me every opportunity in the quest for
learning. I want to give my thanks and indefinate love to my lovely wife Grace with her love and
sacrifice for being the best wife and greatest mother of my child I could ever ask for. I also pour out
unconditional love to my miracle son Heesoo. He would never know how thankful and proud I am
for being my kind, smart, healthy, faithful, and brave warrior. I give thanks to my sister Hyo-Jin
vi
for her prayer, love and dedication for being a sister of the worst brother since our early childhood.
And, I give thanks to my new parents, parents-in-law for entrusting their precious second-born of
beautiful twin daughters and their prayer for my family every morning even before sunrise.
I express my gratitude to former pastors of the A&M Korean Student Church, Sungcheol Youn,
Youngchang Jin, Dr. Sungsoo Kim, Ted Foote, Marie Mickey, and Scott Nelson for their love and
sincere prayer for me and my family. I thank Rev. Kwanyong Chae for delivering encouraging
messages on Sunday. It goes without saying my precious family in the church that has encouraged
and lifted each other up through tearful prayer and everlasting love.
Lastly and most importantly, I want to give thanks and praise to Christ Jesus, my Lord. I know
and realize without Him, I am nothing, so I give Him all the glory and honor as my doctoral study
finally comes to an end. Thank you for the love, and the mercy, and the goodness you poured out
upon me. Thank you for picking up when I stumbled. Thank you for getting me back on the right
track when I strayed away. Thank you for teaching me how to love His sheep and to live life for
His Kingdom and righteousness. Thank you, dear Lord.
As I close the final chapter of the dissertation, I confess and proclaim.
Soli Deo Gloria!
vii
CONTRIBUTORS AND FUNDING SOURCES
Contributors
This work was supported by a dissertation committee consisting of Professor Samuel Palermo
and Miroslav M. Begovic of the Department of Electrical and Computer Engineering and Professor
Rabi Mahapatra of the Department of Computer Science. And this work is collaborated with
Professor Patrick Chiang and Hao Li of Oregon State University.
The data analyzed for Chapter 3 was partially provided by Young-Hoon Song of the Samsung
Display. TheE analyses and derivations depicted in Chapter 2 were conducted in part by Timothy
Dickson of the IBM and were presented at IEEE Compound Semiconductor Integrated Circuit
Symposium (CSICS) in 2010.
All other work conducted for the dissertation was completed by the student independently.
Funding Sources
Graduate study was supported by a fellowship from Texas A&M University and a dissertation













TCO Total Cost of Ownership
CPU Central Processing Unit
DVFS Dynamic Voltage and Frequency Scaling
PAM-4 Pulse Amplitude Modulation-4



















DFE Decision Feedback Equalizer
IIR Infinite-Impulse Response
MSB Most-Significant Bit







QEC Quadrature Error Correction
PRBS Psudo-Random Bit Sequence
LSB Least-Significant Bit
PVT Process, Voltage, and Temperature







ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
CONTRIBUTORS AND FUNDING SOURCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
NOMENCLATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xviii
1. INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Low-power high-speed I/O system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pulse Amplitude Modulation-4 (PAM-4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Multi-Protocol I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2. REVIEW ON LOW-POWER WIRELINE TRANSCEIVERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Low-power-aware design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Transmitter design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Current-Mode Transmit Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Voltage-Mode Transmit Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 RX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Continuous-time linear equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Decision feedback equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 PAM-4 transceiver design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3. AN 8-16Gb/s, 0.65-1.05pJ/b, VM TX WITH ANALOG IMPEDANCE MODULATION
FFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
xii
3.1 Low-Power Transmitter Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Global Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Voltage-Mode Transmit Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Multi-Channel Transmitter Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Transmitter Channel Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Automatic Quadrature-Phase Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Impedance-Modulated Output Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Global Impedance Control and Modulation Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4. A RECONFIGURABLE NRZ/PAM-4 TRANSMIT DRIVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Static Timing Analysis on the Critical Timing Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Differential 4:1 Output-Multiplexing Transmit Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 High-Performance NRZ mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Energy-Efficient NRZ mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Impedance Control Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Design Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Low Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5.2 Maximum Data-rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.3 Maximum Output Swing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.4 Minimization of Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Recommendations For Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84




1.1 (a) An example of data center cooling management system, (b) A breakdown of
energy consumption of a data center [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 (a) NRZ signaling (1-Main, 1-Post), (b) PAM-4 signaling (Gray code, No Equal-
ization). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Recent trend of digital high-speed data network. (Reprinted with permission from [2]).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Multi-protocol for wireline digital communication in different applications . . . . . . . . . . 5
2.1 Simpllified high-speed digital wireline I/O (Dynamic power consuming building
blocks highlighted in color). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The effect of ISI in a wireline communication on a lossy backplane channel without
FFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 System level block diagram of feed-forward equalization with weighted coeffi-
cient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Data-recovery enabled by FFE in a form of pre-distortion at the transmit driver. . . . . 9
2.5 Transmit drivers in (a) Current-mode, (b) Voltage-mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Low-swing voltage-mode transmit drivers with (a) segmented voltage diver [3], (b)
with shunting resistor network [4], (c) with impedance modulation [5]. . . . . . . . . . . . . . . 15
2.7 (a) CTLE circuitry, (b) Simulation results RC degenerated CTLE circuit [6]. . . . . . . . . 17
2.8 Decision feedback equalizer utilizaing a FIR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 (a) Pulse response at 25-GSym/s, (b) Simulation result of eye-height with varying
number of FIR taps [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.10 Mismatch contribution in PAM-4 [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 PAM-4 Current-mode transmit driver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 1/4-rate FIR/IIR DFE PAM-4 Receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
xiv
3.1 Multi-channel serial-link transmitter architecture with dynamic power manage-
ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Low swing global clock distribution techniques: (a) CML buffer driving resistively
terminated on-die transmission line; (b) CMOS buffer driving distribution wire
throough a series coupling capacitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Simulated comparison of CML and capacitively driven clock distribution over a 2
mm distance: (a) output swing versus frequency; (b) power versus frequency. . . . . . . . 27
3.4 2-tap FIR equalization in low-swing voltage-mode drivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 10 Gb/s voltage-mode 2-tap FIR transmit equalization performance comparison.
(a) Channel frequency responses. The three backplane channels have 5.2" total
linecard traces and 12" (B12), and 20 bottom- (B20) and middle-layer (M20) back-
plane traces. The CPW channel is a single-board 5.8" FR4 trace and 0.6 m SMA
cable. (b) Simulated 10 Gb/s pulse response with M20 BP trace. (c) Simulated 10
Gb/s pulse response with CPW channel. (d) Residual ISI, normalizedto the main-
cursor amplitude, with ideal 50 Ω and impedance-modulated output drivers. Error
bars account for 15% RX termination mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 2-tap FIR equalization in low-swing voltage-mode drivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Capacitively driven global distribution and local quadrature-phase generation 
injection-locked oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Transmitter block diagram with quadrature-clock phase calibration details. . . . . . . . . . . 34
3.9 Output driver with impedance-modulated 2-tap equalizer: (a) transition-bit state;
(b) de-emphasis state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.10 Global replica-bias loops for output driver impedance and de-emphasis control. . . . . 39
3.11 Microphotograph of the 2-channel transmitter with on-chip 2 mm clock distribu-
tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.12 FR4 channel eye diagrams without and with automatic phase calibration at (a) 8
Gb/s and (b) 16 Gb/s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.13 (a) Measured equalization impedance versus de-emphasis amount with a 300 mVppd
output swing. (b) Low-frequency transmitter output waveforms with 3-12 dB de-
emphasis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.14 16 Gb/s eye diagrams: (a) without equalization and (b) with equalization. . . . . . . . . . . . 43
3.15 Transmitter eye diagrams and jitter decomposition at (a) 8 Gb/s and (b) 12 Gb/s. . . . 44
xv
3.16 Measured transmitter performance versus data rate: (a) energy efficiency; (b) power
breakdown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Block diagram for a Two-channel NRZ/PAM-4 Transmitter. . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Static timing analysis in the serierlizer circuit over scalable supply voltage: (a)
Timing diagram for data and clock path; (b) Propagation delays of digital logic
circuits over scalable supply; (c) Maximum data rate vs. phase selection. . . . . . . . . . . . . 50
4.3 Simplified circuit diagrams of VM transmit driver: (a) Input multiplexing; (b) Out-
put multiplexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Transmitter eye diagrams and jitter decomposition at (a) 8 Gb/s and (b) 12 Gb/s. . . . 54
4.5 Transmit driver for transitioning and de-emphasized bit state with associated impedance 
paths in (a) High-performance(HP) Mode (for both NRZ and PAM-4) and (b)
Energy-efficient(EE) mode (for NRZ only). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Global replica-bias loops for output driver impedance and level control: (a) de-
emphasis and PAM-4 opposite polarity path control in HP setting (disabled in EE
setting); (b) de-emphasis primary polarity path control; (c) full-swing impedance
control; (d) Mismatch-free IDAC circuitry used in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7 Impedance mapping of the signal paths in the output driver. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.8 Non-segmented output driver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.9 Parasitic elements in a differential termination scheme on transceiver [9]. . . . . . . . . . . . 65
4.10 Illunstrations of the maximum data-rate achievabed by lowerbound settling time at
95% of steady-state response generated by transmit equalizer driven by pre-driver
with an (a)Ideal transient-response, and (b)A realistic transient-response. . . . . . . . . . . . . 66
4.11 Simplified output driver segment with input signal profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.12 Gate tuning voltage needed for varying peaking ratio: (a) Impedance sensitivity
with varying control voltage and (b) Gate-control voltages on all impedance paths. . 70
4.13 Simulated impedance mismatch between output driver and its replicas with (a)
Small size analog impedance controlled NMOS (0.8 × W2) and (b) Large size
(1.2×W2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.14 Microphotograph of the 2-channel transmitter with a detailed layout of the output
stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.15 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
xvi
4.16 S-parameters of test channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.17 (a) Measured equalization impedance mapping in the EE mode. (b) Transmitter
output overlay of de-emphasis levels between 2∼12dB with fixed pattern running
at 8-Gb/s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.18 Measured NRZ TX output eyes and jitter performance with FR-4 channel, 215 − 1
PRBS, (a) at 16Gb/s in HP mode, (b) at 16Gb/s in EE mode, (c) at 20Gb/s in HP
mode, (d) at 20Gb/s in EE mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.19 Measured PAM-4 TX output eyes with 1" FR-4 channel, 215 − 1 PRBS, (a) with
phase calibration at 16Gb/s (b) with level density histogram at 16Gb/s (c) with
phase calibration at 28Gb/s (d) with level density histogram at 28Gb/s . . . . . . . . . . . . . . . 78




3.1 Transmitter Power Breakdown at 16-Gb/s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Transmitter Performance Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 A boolean function logic of switching signals in the dual-mode NRZ/PAM-4 trans-
mit driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Transmitter Power Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Transmitter NRZ and PAM-4 Performance Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
xviii
1. INTRODUCTION
1.1 Low-power high-speed I/O system
A rapid growing demand for instant multimedia access in a myriad of digital devices has pushed
the need for higher bandwidth in modern digital communication hardwares ranging from short-
reach (SR) memory/storage interface to long-reach (LR) data center Ethernets. The average rate
per capita of data-driven interactions with cloud service, ultra high-definition (UHD) video stream-
ing, mobile computing, machine-generated data (e.g. autonomous vehicles, security image, etc)
are expected to increase 20-fold in the next 10 years [10], creating the need to double the bandwidth
demand in the datacenter every 2-3 years.
Energy efficiency is the topmost performance goal in the design of modern wireless communi-
cation devices. In mobile computing, it is essential to extend battery and miniaturize devices for
portability as the improvement in battery capacity has not kept up with silicon technology. Energy
efficiency is even more of an issue for large scale high-performance computing (HPC) as it is es-
sential to reduce the operational costs of powering data centers as well as to maintain HPC system
reliability [11]. Keeping a large scale data center produces egregious heat that requires continual
cooling in a large server room, thus resulting in substantial total cost of ownership (TCO) of a HPC
server. As shown in Figure 1.1, the largest expenditure on energy bill comes from cooling [1]. Op-
timizing the power management or minimizing heat generation of the devices will bring a large
benefit of reducing energy.
Input and output (I/O) interface power in HPC system linking servers/storage or further uti-
lizing cross-connecting data centers are known to be one of the biggest bottleneck in meeting the
bandwitdh demand. While the performance of cutting-edge electronic circuits such as central pro-
cessing unit (CPU) and memory system has been improved exponentially according to the Moore’s
law, packaging devices so called "off-chip devices" (i.e chip bonding, silicon interposer, coerpper
channels, connectors, etc) could not pace with "on-chip" domain [12].
1
(a) (b)
Figure 1.1: (a) An example of data center cooling management system, (b) A breakdown of energy
consumption of a data center [1].
Supporting the dramatic growth in high-performance and mobile processorsâĂŹ I/O band-
width [13, 14] requires per-channel data rates to increase well beyond 10 Gb/s due to packaging
technology allowing only modest increases in I/O channel count. At these relatively high data rates,
complying with thermal design power limits in HPC systems and battery lifetime requirements in
mobile platforms necessitates improvements in I/O system energy efficiency [15, 16] and dynamic
power management such as the dynamic voltage and frequency scaling (DVFS) algorithm. Scaling
the power supply voltage with data rate is an effective technique to achieve nonlinear dynamic
power-scaling at reduced-speeds [17, 18]. In order to improve I/O energy efficiency at high data
rates, improvements in static and dynamic power consumption are required in a manner that al-
lows for robust operation at both low-voltage and with the growing mismatch found in nanometer
CMOS technologies.
1.2 Pulse Amplitude Modulation-4 (PAM-4)
It has been a much debated issue among standard study groups on how to increase the through-































Figure 1.2: (a) NRZ signaling (1-Main, 1-Post), (b) PAM-4 signaling (Gray code, No Equaliza-
tion).
bits encoded into a symbol with the same time-frame. PAM-4, which could become the last resort
for achieving 400-Gb/s data-center Ethernet interconnects, is likely to become a mainstream high-
speed serial interfaces such as FibreChannel. There is a strong trend in the adoption of PAM-4
modulation scheme in many high-speed data network systems as shown in Figure 1.3. PAM-4 re-
quires half the bandwidth to transmit the same amount of data as the equivalent NRZ throughput as
seen in Figure 1.2. Another advantage is that high data-rate operation on legacy backplane system
can be more successful with the PAM-4 resulting in lower return loss. For example high through-
put operation using PAM-4 can be achieved with only half as Nyquist freuqency as that of NRZ
without experiencing the big notch band. So, It is worthwhile to consider the PAM-4 operation on
inexpensive legacy backplane channels for the same throughput.
Meanwhile, the number of I/O count continues to grow to keep up with faster processor and
memory speed. As PAM-4 doubles the number of bits per symbol at the same baud rate as NRZ
coding, crosstalk that occurs in high-density I/O designs with closely spaced signal routing and
vias could be relaxed with wider spacing in PAM-4 operation. For PAM-4 modulation the average
transition density (TD) is 75% whereas the average TD of NRZ is 50%. Therefore the crosstalk
effect becomes more severe when PAM-4 modulation is utilized. To make the matter worse, full-
3
Figure 1.3: Recent trend of digital high-speed data network. (Reprinted with permission from [2]).
swing aggressor can produce higher noise to victim channels due to the tighter spacing between
voltage levels.
1.3 Multi-Protocol I/O
As illustrated in Figure 1.4, various wireline I/O standards exist with similar, but different spec-
ifications. Each wireline I/O standard is a market driving force that can give the most competitive
solution for a predefined application and ensure interoperability. If the ASIC vendors build custom
designs to-spec for every application, however, development time-to-market and cost associated
with it will rise. As a matter of course, however, compared to single-protocol links, multi-protocol
serial links incur high development cost and complexity, and usually suffer from performance
and efficiency drawbacks. If performance and power efficiency are kept above acceptable levels
imposed by different standards, multi-standard links will gain market opportunities thanks to IP
reuse. The trend of multi-protocol I/O will foster the need for adaptation of TX and RX equalizer
tap weights to achieve the target bit error rate (BER) while operating through various channels
specified in each standard.
As an extensive multitude of redundant building blocks between NRZ and PAM-4 codings
could be reused for each other, supporting dual-mode schemes in a single IP will gain market
4
Figure 1.4: Multi-protocol for wireline digital communication in different applications
opportunity by lowering development time to market and nonrecurring engineering (NRE) cost [20,
21]. Additionally, featuring scalable performance/power will augment interoperability between
various applications. Furthermore, it can also thrive from a trend of adopting ADC-based wireline
receivers that support advanced DSP modulation on line data and provide compatibility with multi-
level coding. Also, the use of multi-standard can thrive from multi-level signaling such as PAM-
4 which yields 1/2 baud-rate. A research trend of ADC-based wireline receivers will provide
compatibility with transmitter as a fine granularity DAC.
1.4 Dissertation Organization
With the emphasis on the increasing demand on power-efficient high-speed I/O link system and
growing interest in multi-level signaling such as PAM-4 in this chapter, the Chapter. 2 will move
5
onto the fundamental of the electrical wireline I/O transceiver systems and circuits particularly on
equalization and low-power design.
Starting in Chapter 3, a low-power scalable-data-rate voltage-mode transmitter is featured.
It introduces two main innovations. First, an impedance-modulated 2-tap equalizer is adopted
that employs analog control of the equalizer taps, thereby obviating output driver segmentation.
Secondly, capacitively driven low-swing global clock distribution and automatic phase calibration
of the local ILO-generated quarter-rate clocks enables improved energy efficiency with aggressive
supply scaling.
Another research project discussed in the Chapter 4 is improved upon the low-swing VM trans-
mit equalizer with impedance control that further expand its capability of generating PAM-4 signal.
Particular emphasis is given to advantage of pulse-selected output multiplexing. And some of the
power-saving techniques will be explained with design consideration to keep in mind. The final
chapter concludes this dissertation with some of the future work in progress.
6
2. REVIEW ON LOW-POWER WIRELINE TRANSCEIVERS
2.1 Low-power-aware design
Energy-efficiency is one of the most important performance goal in modern I/O link systems.
The dynamic power consumption in the digital CMOS circuits originates from switching activity
of logic gates inside I/O link system shown in Figire 2.1 as given by,
Pswitching = aCLVDD
2f (2.1)
where CL is the loading capacitance at the output node, VDD is the digital supply voltage, f is
the frequency, and a is the activity factor indicating a switching probability of power-dissipating
transitioning occurs [22]. In the transmitter the pre-driver stage is the most power dissipating
block as it is the buffer stage to the output driver responsible for generating voltage swing on low
impedance termination. At the receiver, the incoming signal is sampled, regenerated to CMOS
logic level before being deserialized. For common latches (e.g. StrongArm comparator) used in
receiver’s front-end the eq.(2.1) still holds.




























Figure 2.1: Simpllified high-speed digital wireline I/O (Dynamic power consuming building blocks
highlighted in color).
7
Figure 2.2: The effect of ISI in a wireline communication on a lossy backplane channel without
FFE.
could be usually reduced by technology scale-down. Parallelism is often a solution in many appli-
cation to cut the clock frequency down. However, it would add interconnect complexity which will
offset the overall dynamic power. Supply reduction allows quadratic saving of dynamic power.
However, that slows digital circuits. There has been a number of efforts to reduce the activity
factor with techniques such as clock gating and request-driven globally asynchronous locally syn-
chronous (GALS) technique [23].
2.2 Equalization
Many practical channels are bandwidth-limited and linearly distort the transmit signal in high-
speed digital communication. The dispersive nature of the channel causes significant spreading of
data pulse. The phenomenon commonly, called inter-symbol interference (ISI) is a dominant im-
pairment for signal integrity in high-speed wireline communication system as shown in Figure 2.2.
In order to mitigate the ISI on the bandwidth-limited physical medium, various equalization meth-
ods are used in both transmitter and receiver. The research goal in this work is to desgin the
energy-efficient transmit/receiving equalizers.
8
Figure 2.3: System level block diagram of feed-forward equalization with weighted coefficient.
Figure 2.4: Data-recovery enabled by FFE in a form of pre-distortion at the transmit driver.
2.3 Transmitter design
To mitigate the effect of ISI, equalization methods are often used in digital communication.
Applying FIR filter, the feed-forward equalization is popular in the transmitter side. Shown in
Figure 2.4, the 2-tap equalization with one main- and post-cursor can mitigate frequency dependent
loss by predistorting the subsequent bit before transmitting into the channel. As shown in the pulse
response where one logic 1 surrounded by logic 0s is sent, ISI is eliminated by pre-emphasizing
alpha amount whenever the data transitions. Hence, the post-cursor due to the ISI is suppressed.
9
The single most commonly used equalization method at the transmitter is the feedforward
equalizer (FFE). Figure 2.3 presents the block-diagram of (N+M+1)-tap FFE. The input signal is
fed through multiple delay line along with previous and future bits. These bits linearly summed








where the summation of all coefficients is one such that,
M∑
k=−N
|ck| = 1 (2.3)
The largest coefficient, c0 is called main tap, while those with smaller coefficient multiplied
by the previous and future bits are pre-cursor and post-cursor taps, respectively. The number
of coefficients and their values are chosen to compensate the precursor and postcursor that are
dependent on target application, throughput, and electrical channel.
Ideally, high-speed link transmitter should support a pre-emphasis equalization with fine granu-
larity and output swing control without altering termination impedance usually set by 50-Ω single-
ended equivalent. The impedance of the transmit driver needs to match the channel impedance to
prevent signal reflection and thus achieve low return loss.
Assuming a standard two-tap high-pass FIR filter with a negative postcursor tap [1-α, -α], the









and the amount of equalization peaking is















Figure 2.5: Transmit drivers in (a) Current-mode, (b) Voltage-mode.
channel loss. To transmit data on channels with high loss profile (such as CEI-25G-LR, IEEE
802.3bj, and InfiniBand EDR), transmit drivers that support high peak swing are valid candidates
with moderate power consumption at the output stage [24]. On the other hand, the cost of power
can be reduced with low output swing (< 1/2VDD) transmitters [3, 4, 5, 25, 26] under sufficient
data recovery capability at the receiver. In addition to the low-power advantage, the low voltage
differential swing (LVDS) presents another benefit over high-swing output voltage. Device gen-
erated electromagnetic interference (EMI) which is dependent on frequency, output voltage, and
slew rate can be reduced in the LVDS I/O [21, 27]
2.3.1 Current-Mode Transmit Equalizer
For many high performance low output swing transmitters current mode (CM) output driver
is commonly used. This structure forms a Norton-equivalent parallel termination. As seen in the
Figure 2.5a, this CM structure allows for FIR filter which performs FFE by simply splitting dif-
ferential branches with independent control according to equalizer weights and making impedance
matching with passive linear resistor for varying output swing. The output swing and de-emphasis
can be programmably adjusted by tail current source and the desired number of differential input
pairs. When bandwidth extension techniques such as inductive peaking are utilized in the current
11
mode driver, high-bandwidth operation as high as 56 Gb/s could be achieved [28]. However, due
to the resonance with high-Q inductance, the target applications are limited to narrow band.
While the CM drivers are still preferred structure for transmitters operating above 30 Gb/s
per lane, the design of CM drivers for sufficient output swing have become more challenging as
the nominal supply voltage has decreased in modern CMOS technology node. The CM transmit
equalizer requires enough compliance voltage to keep the differential pair and current source in
saturation for linearity. Therefore, the transistors used in the CM drivers are usually sized large
enough to fully steer the current between the differential pair. This requires strong pre-driver stage
with dominantly large power consumption at the high-speed. That also induces large parasitic
capacitance with heavy routing at the output nodes to sustain current and has adverse effect on
high speed operation
2.3.2 Voltage-Mode Transmit Equalizer
The quest for improved linearity and power efficiency raised interest in the use of VM trans-
mitters in NRZ primarily due to their potential to consume 4x less power than CM counter-
parts [29, 30, 4, 31, 32]. Significant static power savings are possible by utilizing low-swing
voltage-mode drivers for targetting short-range (SR) on-board chip-to-chip interconnect with chan-
nel loss limited to 15dB at half the symbol rate [33, 16, 3, 25]. For low impedance of channel
(typically 50 Ω single-ended equivalent), the VM transmit driver utilize the metal-oxide semicon-
ductor field-effect transistors (MOSFETs) as matching elements which should be usually large. In
addition, it is challenging to utilize MOSFETs as matching devices in the triode region where they
behave non-linearly and substantially vary on-resistance across process, voltage, and temperature
(PVT) corners. This led to an idea utilizing a replica-bias circuit running in the background and
tracking environmental change such as temperature [26]. Gate voltages generated by the repica
that emulate the transmit driver’s signal path force the transmitter to have the same impedance as
the replica circuit. In order to calibrate both pull-up and pull-down resistance, however, it needs to
be able to compare the voltages at both polarities.
While CM driver is relatively easy to realize a FIR filter structures at the transmitter by sum-
12
ming the outputs of parallel current-mode stages weighted by the filter tap coefficients onto the
channel and a parallel termination resistor [17], voltage-mode implementations are more difficult
with the Thevenin equivalent series termination control. As shown in Figure 2.6, VM typical trans-
mit drivers made of driver segments connected in parallel that are switched on or off by selection
logic to adjust TX driver impedance. The output impedence cannot be calibrated independently of
the equalization setting because the change in the number of parallel connected segments also af-
fect the number of segments assigned to each tap. These topologies often set the equalizer tapsâĂŹ
weighting via output stage segmentation [4, 5, 3, 25], which adds complexity to the high-speed
predriver circuitry and degrades the transmitter dynamic power efficiency.
As seen in the Figure 2.6, each segmented output driver elements are driven by pre-driver stage
running at full data-rate. Equalization setting of the VM transmit drivers in general are achieved by
assigning slices to each tap weight. Even larger MOSFETs may be needed to realize small offset
for fine FFE tuning, resulting in the power and area overhead. The increased number of selection
logic segments followed by pre-drivers will results in larger switching load capacitance, and thus
increased dynamic power consumption.
One of the earliest VM transmit equalizer designs incorporating 2-tap equalization was imple-
mented by creating a voltage-divider path between VREF to GND [30]. When the data transitions
to opposite polarity, full swing of the output voltage is generated by the parallel combination of
the RP and RN in this equivalent circuit diagram as shown in Figure 2.6a. The segment selection
logic takes full-rate input data and 1-UI delayed data to encode associated impedances with parallel
combination. The differential output voltage level is de-emphasized by creating a voltage divider
from VREF to GND. In a parallel combination of RP and RN the impedance is terminated to the
impedance of channel. The major issue from this topology, however, stems from its inefficient




{1 + 4α (1− α)} (2.6)
where Isig, RT , and α are static signaling current drawn from a voltage regulator, termination
13
resistance, and equalization peaking ratio, respectively. We can see from (2.6), the signaling power
increases as the signal is de-emphsized.
In order to tackle this signaling power-inefficiency from the conventional VM transmit equal-
izer, an alternative type of driver is reported to maintain constant current for de-emphasis of output
voltage swing (Figure 2.6b). Extra path in parallel with differential channel is used in order not to





The total supply path impedance can be held constant to match the channel impedance. The result-
ing current stays constant for all output voltage swing as seen in (2.7). However, extra path create
another set of segmentation for impedance control . Because of highly non-linear impedance
mapping, decoding and predriver complexity becomes higher than conventional shunt path VM
transmit equalizer. At the full data rate digital power becomes higher in this transmit driver [4, 25].
To reduce the signaling power, the most effective method is to use the least current for the
data run-length greater than one producing the de-emphasised output. The impedance modulated
transmit equalizer shown in Figure 2.6c, the output swing is determined by the voltage divide






By the impedance modulation driver’s output impedance de-emphasized signal output is gen-
erated. When the matching constraint is removed, the least current will be drawn for the lowset
swing. In practice, a channel termination will never be 50-Ω single-ended equivalent. By increas-
ing the total supply path impedance can sacrifice the impedance matching to the channel. With
environmental change in the channel and imperfect matching due to parasitic elements (e.g., ESD
































Figure 2.6: Low-swing voltage-mode transmit drivers with (a) segmented voltage diver [3], (b)
with shunting resistor network [4], (c) with impedance modulation [5].
15
degrade return loss and increase in ISI if the output impedance of the transmit driver is not matched
to the channel and does not absorb them [5]. In spite of the reduced signaling power, the presence
of the segment selection logic still diminishes the power saving benefit.
2.4 RX
In contrast to transmitters that deliver the known data through channel, receivers do not know
the data a priori but need to recover the digital message in the first place. While the challenging
condition for the transmitter includes its requirement of driving a relatively low impedance of
channel for sufficient swing, the receiver generally takes signal from the terminated input such that
more efficient hardware implementation could be made at the analog front-end (AFE) due to the
higher input impedance level.
As the I/O density has increased with increasing HPC power, back-channel through which opti-
mized TX equalization setting is updated becomes unaffordable. While the transmitter can support
the straightforward FFE, real-time adaptation of the the equalization in the RX without having to
receive test pattern from TX and to send updated TX equalization settings upon evaluation of raw
waveform is attractive. Therefore, the equalization burden can be shared with receiver.
2.4.1 Continuous-time linear equalization
Since dispersive electrical channels often introduce ISI from their low-pass filtering response,
an inverse high-pass transfer function can effectively flatten the channel’s insertion loss such that
the receiver can receive well-defined signal. An analog circuit known as the continuous-time linear
equalizer (CTLE) can compensate channel loss by yielding a zero from the digitally programmable
source-degenerated resistor and capacitor as shown in Figure 2.7a [6]. The CS reduces the impact
of source degeneration and effectively boost magnitude response at high frequencies.














Figure 2.7: (a) CTLE circuitry, (b) Simulation results RC degenerated CTLE circuit [6].



















= 1 + gmRS, (2.12)
respectively.
As the resistive source degeneration inherently manifest the reduced gain for enhanced lin-
earity, the CTLE’s high-pass filtering is mimicked by suppressing the low-frequency gain rather





















Figure 2.8: Decision feedback equalizer utilizaing a FIR filter.
an amount of noise from crosstalk or reflections from imperfect termination will make the high-
frequency noise will appear even larger and more detrimental in multi-level signaling (MLS) such
as PAM-4. Nonetheless, the CTLE has been an effective equalization method in the receiver due
to its capability of mitigating the precursors.
2.4.2 Decision feedback equalization
With the fundamental limitation of relative SNR degradation when the low frquency gain is
reduced with CTLE, it could be combined with a type of discrete-time (DT) equalizer from which a
quantized bit feed it back through FIR filtering. Noise amplification plagued in the linear equalizer
no longer contaminate the bit decided by the digital filter. The decision feedback equalizer (DFE)
uses ISI information about previously received data to subtract their ISI from the incoming bit.
The channel loss will be compensated without the relative amplification of noise from crosstalk
and reflections.
TX transmits data x(n) over a channel with impulse response h(n). At the receiver input,
18
y(n) = x(n) ∗ h(n) contains ISI information about the transmitted bit x(n) from previously trans-
mitted data. The receiver employs an N-tap DFE to cancel ISI from N postcursors. At the output
of the DFE becomes
yd(n) = y(n) +
N∑
i=1
hi sgn{yd(n− i)}. (2.13)
A decision as to the polarity of the transmitted bit is made by feeding yd (n) to the input of a
slicer. Noting that sgn{yd(n)} = x(n) for error-free operation, we can express yd(n) as a function
of the current bit x(n) as well as N previous bits.
yd(n) = y(n) +
N∑
i=1
hi x(n− i) = {x(n) ∗ h(n)}+
N∑
i=1
hi x(n− i) (2.14)










Considering a low-pass channel with transfer function














hi = −h(i). (2.18)
The key constraint lies in the tight timing path. As seen in the Figure 2.8, a small analog input
signal regenerate up rail-to-rail output through a slicer, feed it back through tap weighting, sum it,
and then allow that to settle to sufficient precision within 1-UI such that,
19
tCLK−Q + th1 + tsetup + tsummer < 1UI. (2.19)



































Figure 2.9: (a) Pulse response at 25-GSym/s, (b) Simulation result of eye-height with varying
number of FIR taps [7].
Increasing I/O speed over highly lossy channels necessitate efficient approach to cancel long-
tail post-cursor ISI. In the pulse response shown in Figure 2.9a indicates exponentially decaying
RC-limited channel. Sufficient cancellation of post-cursor ISI requires a large number of FIR
taps, as shown by Figure 2.9b, resulting in large power consumption. In order to meet the 1-
UI timing constraint shown in (2.19) compact layout floorplan is critical in the DFE design. In
order to address this, continuous-time tap can be incorporated begining at the second post-cursor
in a form of infinite-impulse response (IIR) filter by scaling time-constant. This approach allows
several post-cursor ISI to be subtracted simultaneously with low power. A key challenge with this

















Figure 2.10: Mismatch contribution in PAM-4 [8].
2.5 PAM-4 transceiver design
An ability to deliver a wider range of termination voltage gives VM output driver an addi-
tional benefit that improves the linearity more importantly in PAM-4. As the impact of impedance
mismatch on PAM-4 signal could be as high as 3x on NRZ signal, the high-swing output is pre-
ferred in the PAM-4 to preserve high SNR. In the design in [34], hybrid combination of VM and
CM accommodates the output swing greater than its supply voltage without degrading linearity.
Nonetheless, relatively low output swing in the wireline transmit drivers proves efficient as it did
so in NRZ modulation for moderate loss short-reach (SR) applications from energy and linearity
standpoints [35].
As the DFE in PAM-4 receiver generally assigns identical weights between slicers of 2-bit
flash ADC in order to cancel post-cursor ISI from twelve different level transitions, the impact of
non-linearity plays more significant role in PAM-4 than in NRZ [7]. As shown in Figure 2.10,
timing skew between the most significant bit (MSB) and the least significant bit (LSB) causes
static delay thus making clock and data recovery (CDR) at the receiver side more challenging.
Gain mismatch will degrade nonlinearity with the eye height quantified by ratio of level mismatch
















Figure 2.11: PAM-4 Current-mode transmit driver.
rates in twelve different transitions.
2.5.1 Transmitter
A common approach to designing extremely high-speed PAM-4 transmitters is to employ
current-mode (CM) output driver for its superior immunity on the power supply noise [36, 35].
As depicted in Figure 2.11, weighted branch currents for MSB and LSB effectively sum onto
termination resistors with feed-forward equalization (FFE) combined with ease. However, maxi-
mum swing level is challenged by the linearity constraint and power in the CM transmit drivers.




(AVDD − VOV ) (2.20)
where the VOV is the overdrive voltage of MOSFET.
22
2.5.2 Receiver∗
Figure 2.12 shows the a dual-mode NRZ/PAM4 DFE receiver block diagram. In PAM4 mode
symbol detection is achieved with a 2-bit flash ADC consisting of three comparators with threshold
voltages of 0, ±2/3 relative to the post-equalized differential amplitude, while in NRZ mode all
thresholds are set to zero. A quarter-rate architecture is employed to reduce clock buffer power and
allow for longer comparator reset time, which minimizes hysteresis and allows smaller pre-charge
transistor loading for improved evaluation delay. In order to minimize the critical first-tap feedback
delay and maximize the equalization cancellation range, an FIR tap is utilized to cancel the first
post-cursor ISI. This multi-level FIR tap is efficiently realized by feeding back the flash ADC 3-bit
thermometer-coded output bits directly to three equally weighted summer inputs embedded in the
comparatorsâĂŹ first stage, removing any SR-latch and external summer delay from this critical
path. Long-tail ISI is efficiently cancelled with 2 IIR taps, with one tap starting from the second
post-cursor to cancel fast time constant ISI and the other beginning at the third post-cursor to
mitigate the slow time constant ISI. In order to minimize the comparator’s internal loading, these
IIR taps are subtracted from the sampled input with a current integration summer that precedes the
comparators.
∗ c©[2017] IEEE. Parts of this chapter are reprinted from "A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4 
SerDes in 65-nm CMOS", by Ashkan Roshan-Zamir, Osama Elhadidy, Hae-Woong Yang, and Samuel Palermo, IEEE 

























































Figure 2.12: 1/4-rate FIR/IIR DFE PAM-4 Receiver.
24
3. AN 8-16Gb/s, 0.65-1.05pJ/b, VM TX WITH ANALOG IMPEDANCE MODULATION
FFE∗
3.1 Low-Power Transmitter Design Techniques
A typical low-power multi-channel serial-link transmitter architecture is shown in Fig. 3.1.
In order to amortize clocking power, the output of a global clock generation circuit, such as a
phase-locked loop (PLL), is distributed to all of the transmit channels. Here efficient global clock
distribution techniques, such as low-swing CML signaling [17, 26], are often employed in high
channel count systems which span several mm. Each transmit channel performs parallel data seri-
alization, and implements equalization to compensate for frequency-dependent channel loss. This
section reviews key low-power design techniques employed in this design, including capacitively
























Figure 3.1: Multi-channel serial-link transmitter architecture with dynamic power management.
∗ c©[2014] IEEE. Parts of this chapter are reprinted from "An 8-16 Gb/s, 0.65-1.05 pJ/b, Voltage-Mode Transmitter
With Analog Impedance Modulation Equalization and Sub-3 ns Power-State Transitioning", by Young-Hoon Song,
Hae-Woong Yang, Hao Li, Patrick Chiang, and Samuel Palermo, IEEE J. Solid-State Circuits., Nov 2014.
25
3.1.1 Global Clock Distribution
Distributing high-frequency clock signals over on-chip wires with multi-mm lengths is chal-
lenging due to wire RC parasitics that limit bandwidth, resulting in amplified input jitter and exces-
sive power dissipation with repeated full-swing CMOS signaling [38]. As shown in Figure 3.2a, in
order to reduce clocking power and avoid excessive jitter accumulation, low-swing non-repeated
global clock distribution with an open-drain CML buffer driving on-die restively terminated trans-
mission lines has been previously implemented [17]. However, maintaining a minimum clock
swing at high frequencies can still result in significant static power dissipation due to the transmis-
sion lines’ loss and relatively low-impedance. While reduction of this static power is possible with
inductive termination of the distribution wire [26], this creates a narrow-band resonant structure
that prohibits scaling the per-channel data rates over a wide range. Another non-repeated tech-
nique to drive long wires involves AC-coupling a full-swing CMOS driver to the distribution wire
through a series capacitor, as shown in Figure 3.2b. Relative to simple DC-coupling, this technique
allows for smaller drivers due to the reduced effective load capacitance, savings in signaling power
due to the reduced voltage swing on the long-wire, and bandwidth extension due to the inherent

























Figure 3.2: Low swing global clock distribution techniques: (a) CML buffer driving resistively
terminated on-die transmission line; (b) CMOS buffer driving distribution wire throough a series
coupling capacitor.
26
In order to compare the CML-based and capacitive-coupled low-swing clock distribution tech-
niques, the global distribution circuitry of Fig. 3.2 are both designed for a 0.25 V low-frequency
amplitude. The 65 nm CMOS simulation results of Figure 3.3 show that, relative to CML clock
distribution with 50 Ω termination, this capacitively driven approach offers 1.7X bandwidth exten-
sion and 73.1% power savings when distributing a differential 4 GHz clock over a 2 mm distance.
Also, the power of the capacitively driven approach reduces significantly at lower clock frequen-
cies. This provides the potential for further power savings at a given data rate, provided that there
is efficient multi-phase clock generation and low-to-high-swing conversion at the local transmit
channels. Also, no major phase noise penalty is observed with the 0.25V capacitively driven dis-
tribution, as simulations with 4-GHz LC-oscillator driving the input buffer show that at the end of
the distribution wire there is only a 0.1-dB degradation at a 1-MHz offset.








































Figure 3.3: Simulated comparison of CML and capacitively driven clock distribution over a 2 mm
distance: (a) output swing versus frequency; (b) power versus frequency.
3.1.2 Voltage-Mode Transmit Equalization
While it is relatively easy to implement FIR equalizer structures at the transmitter by summing
the outputs of parallel current-mode stages weighted by the filter tap coefficients onto the channel
27
and a parallel termination resistor [17], voltage-mode implementations are more difficult due to
the series termination control. As shown in Figure 3.4, these voltage-mode topologies often set the
equalizer taps’ weighting via output stage segmentation [5, 3, 25, 4]. One approach is to distribute
the output segments among the main and post-cursor taps to form a voltage divider that produces
the four signal levels necessary for a 2-tap FIR filter [3]. Here, all segments operate in parallel
during a transition (or X[n] 6= X[n− 1]) to yield the maximum signal level, while the post-cursor
segments shunt to the supplies to produce the de-emphasis level for run lengths greater than one
(X[n] = X[n− 1]). As ideally all the segments have equal conductance, a constant channel match
is achieved independent of the equalizer setting. However, shunting the post-cursor segments to the
supplies results in dynamic current being drawn from the regulator powering the output stage and a
significant increase in current consumption with higher levels of de-emphasis [25]. To address this,
adding a shunt path in parallel with the channel can either eliminate dynamic current variations [4]
or allow for a decrease in current consumption with higher levels of de-emphasis [25]. Further
power reduction is possible if a constant channel match is sacrificed by implementing the different
output levels via impedance modulation, allowing for minimum output stage current [5]. Here all
segments are on during a transition to yield the maximum signal level, while for run lengths greater

























Figure 3.4: 2-tap FIR equalization in low-swing voltage-mode drivers.
28













































































































































Figure 3.5: 10 Gb/s voltage-mode 2-tap FIR transmit equalization performance comparison. (a)
Channel frequency responses. The three backplane channels have 5.2" total linecard traces and
12" (B12), and 20 bottom- (B20) and middle-layer (M20) backplane traces. The CPW channel is a
single-board 5.8" FR4 trace and 0.6 m SMA cable. (b) Simulated 10 Gb/s pulse response with M20
BP trace. (c) Simulated 10 Gb/s pulse response with CPW channel. (d) Residual ISI, normalizedto
the main-cursor amplitude, with ideal 50 Ω and impedance-modulated output drivers. Error bars
account for 15% RX termination mismatch.
As shown in the 10 Gb/s pulse response simulation results of Figure 3.5, the amount of residual
29
ISI with a 2-tap equalizer depends on the equalization technique and channel type. In order to
compare an impedance-modulated driver with an ideal 50-Ω driver, equal de-emphasis settings are
utilized and the residual ISI is quantified by summing the absolute values of five pre-cursors and
fifty post-cursors and normalizing by the main cursor value. For 20" backplane channels, an ideal
50 Ω driver displays similar residual ISI for middle- (M20) and bottom-trace (B20) channels with
13.1 dB and 11.7 dB de-emphasis, respectively. When an impedance-modulated driver is used,
Figure 3.5b shows that reflections with the middle-trace channel (M20) degrade the residual ISI
performance by 26.9% relative to the 50-Ω driver. However, this performance difference shrinks
to only 12.8% for the bottom-trace channel (B20). With a shorter 12" bottom-trace channel (B12)
that offers less overall ISI, but also less reflection attenuation, with 9.4 dB de-emphasis the residual
ISI improves for both drivers and the relative ISI increase is less than 18.4% with the impedance-
modulated driver. For the well-designed single-board co-planar waveguide (CPW) channel used
in the Section V experimental results, which has performance comparable to channels proposed
for high-density I/O systems [3], with 6.0 dB de-emphasis the ISI performance of the two drivers
is almost identical. The impact of receive-side termination mismatch is also considered, with the
error bars of Figure 3.5d showing that a±15% mismatch of the ideal 100-Ω differential termination
results in less than a 2% difference in the relative performance of the transmitters.
While impedance-modulated equalization may yield the best signaling current consumption,
the output stage segmentation associated with this and other approaches can result in significant
complexity and power consumption in the predriver logic. Overall, this predriver dynamic power,
which increases with data rate and equalizer resolution, should be addressed in order to not dimin-
ish the benefits offered by a voltage-mode driver.
30



























AVDD = 1V 0.5V
Figure 3.6: 2-tap FIR equalization in low-swing voltage-mode drivers.
Fig. 3.6 shows a conceptual diagram of the proposed multi-channel transmitter architecture,
with 10 transmitter channels spanning across a 2 mm distance. All transmitters share both a global
regulator to set the nominal output swing, and two analog loops to set the driver output impedance
31
during the maximum and de-emphasized levels of the implemented 2-tap FIR equalizer. Utilizing a
single global regulator to provide a stable bias signal that is distributed to all the channels provides
for independent fast power-state transitioning of each output driver, as explained in more detail in
Section 3.3. The sharing of these global analog blocks allows for their power to be amortized by
the channel number and improves the overall I/O energy efficiency.
In order to reduce dynamic power, low-swing clocks are maintained throughout the global
distribution and local generation of the quarter-rate clocks used by the transmitters. Rather than
distributing four quarter-rate clocks globally, which offers challenges in maintaining low static
phase errors and power consumption, a differential quarter-rate clock is distributed globally in a





is present on the long global distribution wires from the voltage divider formed by the series
coupling capacitor, Cs, and the clock wire capacitance, Cw. The value is set for a swing of Vdd/4,
which is 250 mV for the 4-GHz clocks used in 16-Gb/s operation with a 1V supply. These low-
swing distributed clocks are then buffered on a local basis by AC-coupled inverters with resistive
feedback for injection into a two-stage injection-locked oscillator (ILO) which produces four full-
swing quadrature clocks that are shared by a two-channel bundle. Utilizing a Vdd/4 distribution
swing allows the ILO to achieve a locking range greater than 250-MHz, which ensures locking
over 5% power supply variations. Simulation results show that the clock swing degrades by only
1% at the end of the 2 mm distribution wire. As transmit architectures which utilize quarter-rate
clocks for serialization are sensitive to timing offsets amongst the four clock phases, particularly
with the aggressive supply scaling employed in this low-power design, digitally calibrated buffers




















Figure 3.7: Capacitively driven global distribution and local quadrature-phase generation injection-
locked oscillator.
Fig. 3.7 shows the two-stage ILO schematic, where quadrature output phase spacing is im-
proved by AC-coupling the injection clocks, adding dummy injection buffers, and optimizing the
locking range via digital control of the injection buffers’ drive strength. The ILO employs cross-
coupled inverter delay cells which, relative to current-starved delay cell-cells [16], generate a rail-
to-rail output swing with better phase spacing over a wide frequency range. Coarse frequency
control is achieved via a dedicated power supply equal to DVDD, but separated on-chip for noise
isolation. The gated analog voltage, EN_VCTL, finely controls the ILO frequency by setting the
delay cell pull-down strength. While not implemented in this prototype, a periodically activated
control loop could set EN_VCTL such that the ILO free-running frequency is equal to the injection
clock [39] to reduce quadrature phase errors and provide increased robustness to PVT variations.
This analog control voltage can also be rapidly switched between GND and its nominal value,
enabling fast power-up/shut-down of the clock signals on a two-channel resolution.
33
















































Duty Cycle Correction 
(5bits)
Sample and 
Count ‘1’ for 
Pattern A
Sample and 
Count ‘1’ for 
Pattern B
Compare





























Figure 3.8: Transmitter block diagram with quadrature-clock phase calibration details.
A block diagram of a transmitter channel is shown in Fig. 3.8. The transmitter exhibits two
operating modes to provide transmitter equalization at higher data rates, while dramatically scaling
energy efficiency at lower data rates by reducing the digital serialization and pre-driver supply
(DVDD) and disabling equalization when it is not required. While an external supply is used to
set the scalable DVDD in this prototype, an adaptive switching regulator [18] could efficiently
generate this supply. Eight bits of parallel input data are serialized with an initial 8:4 multiplexer
followed by two parallel 4:1 stages that produce the main and post-cursor tap signals for the 2-tap
34
equalizer implemented in the differential low-swing impedance-modulated voltage-mode driver.
The serialized data passes through level-shifting pre-drivers [16] that boost the voltage swing by a
full scalable supply value, DVDD, above the nominal nMOS threshold voltage, enabling reduced
output stage transistor sizing for given impedance value. Power is saved by disabling the post-
cursor tap pre-driver at lower data rates where equalization is not required. The clocks which
synchronize the serialization are produced by passing the ILO quadrature outputs through buffers
with duty-cycle and quadrature spacing correction via 5 bits of p-n strength and 5 bits of delay
capacitance adjustment, respectively. Two of these phases are divided by two to perform the initial
8:4 multiplexing, while all four phases pass through conventional CMOS logic to generate the
pulse-clock signals that switch the secondary 4:1 CMOS muxes.
3.3.1 Automatic Quadrature-Phase Calibration
While a transmitter architecture which utilizes quarter-rate clocks for serialization allows for
reduced supply voltages in the data path, this low-voltage operation results in increased phase-
spacing variations amongst the critical serialization clock signals [16]. The resultant output de-
terministic jitter from static phase errors and duty cycle distortion of the quadrature clocks can
severely degrade eye height and timing margins for data rates well in excess of 10 Gb/s. This
design addresses this important issue and enables high-speed operation at low supply voltages by
implementing the closed-loop calibration scheme detailed in Fig. 3.8. In calibration mode, the
transmitter output for two complementary fixed patterns is sampled with a comparator clocked
by an asynchronous 100 MHz signal. The uniformly spaced output samples obtained by em-
ploying this asynchronous clock provide information about the duty cycle and quadrature phase
spacing errors [40, 41]. First, the duty cycle is corrected by comparing the count value obtained
for a âĂIJ1100âĂİ output pattern and its complement, followed by an FSM that adjusts the p-n
strength of the local clock buffers. Second, quadrature phase correction is realized by utilizing a
âĂIJ1010âĂİ pattern and its complement, with the FSM then adjusting the relative delay of the
buffers through capacitive tuning.
35
3.3.2 Impedance-Modulated Output Driver
Figgure 3.9 shows the low-swing all-nMOS output stage, where a new impedance modulation
technique [5] is introduced. In addition to the M1 switch transistors controlled by the main-cursor
data, extra transistors M3-5 are stacked to achieve 2-tap impedance-modulated equalization. Ana-
log control of the stacked transistor impedance values provides the potential for high-resolution
equalization tap control with a non-segmented output stage, dramatically reducing pre-driver com-
plexity and resulting in significant power savings. During a transition bit in equalization mode
(Figure 3.9a) the maximum output swing is achieved with nearly a 50 output impedance, when
both the higher-impedance single-transistor M3 and lower-impedance two-transistor paths (M4
and M5) controlled by the post-cursor data are activated in parallel.
Rtran_bit = (RM4 +RM5) ||RM3 +RM1 = Zo (3.2)
where is the characteristic channel impedance (50-Ω). The sizing overhead of this effective three-
transistor stack is minimized because the switch transistors controlled by the main and post-cursor
data see a large level-shifted overdrive voltage, VLS = DVDD + VTHN, when turned on. Only the
higherimpedance single-transistor M3 pull-up/pull-down path is activated for run-lengths greater
than one (Figure 3.9b), with the de-emphasis level set by the analog control voltages, VzmeqUP
and VzmeqDN, provided by the global de-emphasis impedance modulation loop.




where α is the equalization coefficient (Figure 3.4) and the peaking ratio between the maximum

















































































Figure 3.9: Output driver with impedance-modulated 2-tap equalizer: (a) transition-bit state; (b)
de-emphasis state.
37
series impedance-control transistor M2 in the pull-up/pull-down paths,where the control voltages,
VzcUP and VzcDN, are provided by the global impedance control loop. Furthermore, the post-
cursor pre-drivers are disabled to save power.
3.3.3 Global Impedance Control and Modulation Loop
The global replica-bias loops that produce the impedance control bias voltages for the 2-tap
transmitter output stages are shown in Figure 3.10. A 50-Ω channel match is obtained with the left
circuit that contains two feedback loops which force a value of (3/4)VREF and (1/4)VREF on the
positive and negative outputs, respectively, of a replica transmitter loaded by a precision off-chip
100-Ω resistor. Configuring the circuit in non-equalization mode places the stacked M2 impedance
control transistors in the feedback loops to produce the VzcUP and VzcDN control voltages that
bias the M2 transistors of the output stages. In equalization mode, the stacked parallel M3 and M4-
5 paths are placed in the feedback loops to produce the VzceqUP and VzceqDN control voltages
that bias the output stages’ M4 transistors to achieve a 50-Ω match during a transition bit.
De-emphasis-level reference voltages (3/4)VREF-(1/2)αVREF and (1/4)VREF+(1/2)αVREF
are used in the right circuit to produce the M3 bias voltages, VzmeqUP and VzmeqDN, for the
high resistance values used when the data run-length is greater than one. For all settings the M1
and M5 replica switch transistors bias is generated by a diode-connected nMOS whose source is
connected to the scalable DVDD, producing a voltage level, VLS = Vthn + DVDD, consistent with
the level shifting pre-driver output.
High-resolution equalization settings are possible with low power overhead via a low-frequency
global DAC to set the de-emphasis voltage levels used in the replica bias loop. This compares
favorably with achieving tap value control via a highly segmented output stage, which requires
complex pre-driver circuitry switching at the full data rate [5, 3, 25, 4]. While there is some power
overhead associated with the global analog feedback loops, power amortization in a multi-channel
































EQ Mode - ON





















































Figure 3.11: Microphotograph of the 2-channel transmitter with on-chip 2 mm clock distribution.
Figure 3.11 shows a die microphotograph of the proposed transmitter, fabricated in a general
purpose 65 nm CMOS process. While chip area constraints prevented a full 10-channel prototype,
40
the concept is accurately emulated by placing a two-transmitter bundle at the end of a snaked on-
chip 2 mm clock distribution. Each transmitter channel occupies 0.006 mm , and the combined area
of the injection-locked oscillator, global impedance control and modulation loop, bias circuitry,
and global regulator is 0.014 mm. ESD diodes with 40 fF parasitic capacitance are present at the
high-speed transmitter outputs.
Without Phase Calibration
DVDD = 0.75V at 8Gb/s
144ps 103ps 133ps 120ps
With Phase Calibration
DVDD = 0.75V at 8Gb/s
121ps 126ps 126ps 127ps
Without Phase Calibration
DVDD = 1V at 16Gb/s
With Phase Calibration
DVDD = 1V at 16Gb/s
58.7ps 64.1ps 60.3ps 66.9ps
61.2ps 61.2ps 63ps 64.6ps
Figure 3.12: FR4 channel eye diagrams without and with automatic phase calibration at (a) 8 Gb/s
and (b) 16 Gb/s
The functionality of the automatic phase calibration is demonstrated with a chip-on-board test
setup, with the die directly wirebonded to the FR4 board and the transmitters driving short 2 traces.
Lower data rates display worse inherent phase spacing performance due to the reduced voltage op-
eration, with Figure 3.12 showing a 28.5% uncorrected eye width variation at 8 Gb/s and a 0.75 V
supply. These phase errors are reduced to 4.7% when the closed-loop phase calibration is enabled.
41






























TXVmax = 300mVppd with 3, 6, 9, and 12dB EQ
(b)
Figure 3.13: (a) Measured equalization impedance versus de-emphasis amount with a 300 mVppd
output swing. (b) Low-frequency transmitter output waveforms with 3-12 dB de-emphasis.
At 16 Gb/s and 1 V operation, the phase calibration loop improves the eye width variation from
an uncorrected 13.1% to 5.4%, limited by nonlinearities in the duty-cycle tuning range. Note that
while a 1 V DVDD is required for 16-Gb/s operation, transient simulations indicate that the level-
42
16Gb/s with NO EQ
(a)
     




Figure 3.14: 16 Gb/s eye diagrams: (a) without equalization and (b) with equalization.
shifted pre-drive signals generate a maximum and which does not exceed 1.1 V in the switched
output stage transistors due to the stacked design. These voltage levels are below the 10-year
lifetime requirements.
A channel consisting of a 5.8" FR4 trace and a 0.6 m SMA cable (Figure 3.5a), with 15.5-dB
loss at 8-GHz, is used to characterizethe transmitter’s equalization capabilities. Figure 3.13 shows
that the global impedance modulation loop precisely controls the required impedance for a given
equalization coefficient to within 7% of the ideal value, while low-frequency output patterns with
a peak 300mV output swing verify the equalizer’s functionality up to the maximum 12 dB setting.
The transmitter transient performance at a maximum 16-Gb/s data rate is verified with the 27 − 1
PRBS eye diagrams shown in Figure 3.14, where a previously near-closed eye is opened to a 55
mV height and 33.4 ps width when the impedance-modulation equalization is enabled.
In order to demonstrate the transmitter’s scalable energy efficiency over data rate, the eye dia-
grams of Figure 3.15 are produced with the same channel as Fig. 16 and with a minimum 50 mV
eye height and ∼0.5UI eye width. From these eye diagrams, the total jitter can be decomposed
into deterministic and random components of 31.6 ps and 2.27 psrms, respectively, at 8-Gb/s, and
25.2 ps and 1.19 psrms at 12-Gb/s. For this performance level the transmitter achieves 8-16 Gb/s
operation at 0.65-1.05 pJ/b (Figure 3.16a) by optimizing the transmitter’s scalable supply and out-
43








Figure 3.15: Transmitter eye diagrams and jitter decomposition at (a) 8 Gb/s and (b) 12 Gb/s.
put swing and disabling equalization at the lowest 8-Gb/s data rate. While the dynamic clocking
and serialization power dominates, as shown in the power versus data rate of Figure 3.16b and the
detailed-16 Gb/s power breakdown of Table 3.1, scaling the digital supply reduces this contribution
and allows for overall improved energy efficiency at lower data rates.
Table 3.2 compares this work with recent voltage-mode transmitters with 2-tap equalization.
The low-voltage architecture allows for a dramatic increase in data rate at near 1 pJ/b energy
efficiency, which is only achieved at 10 Gb/s in [25], while the impedance-modulated equalization
is capable of obtaining open eyes over the highest loss channel. Moreover, as indicated by the
16 Gb/s power breakdown in Table 3.1, further improvements in energy efficiency are possible































































Figure 3.16: Measured transmitter performance versus data rate: (a) energy efficiency; (b) power
breakdown.
45
Table 3.1: Transmitter Power Breakdown at 16-Gb/s
advanced CMOS process that allows for reduced dynamic power.
3.5 Chapter Summary
A low-power, scalable, high-speed transmitter architecture is presented. In order to reduce
clocking power, low-swing clocks are maintained throughout the capacitively driven low-swing
global distribution and local ILO quadrature phase generation. Improved dynamic power con-
sumption is achieved with aggressive supply scaling at lower data rates, while automatic quadra-
ture phase calibration allows for uniform output eyes at low voltages. By realizing a 2-tap equalizer
with analog-controlled impedance modulation, output stage current is reduced and driver segmen-
tation is obviated, allowing for reduced pre-driver complexity and further dynamic power sav-
ings. Employing a global regulator that provides a replica-bias voltage to the transmit channels,
46
Table 3.2: Transmitter Performance Comparisons
along with staggered switching of the output stage decoupling capacitance, allows for rapid en-
abling/disabling of the output drivers on a per-channel basis. Leveraging the proposed transmitter
design can allow for low-power operation with the capabilities to efficiently support equalization
for high-data-rate operation and scalable supply voltage management.
47
4. A RECONFIGURABLE NRZ/PAM-4 TRANSMIT DRIVER∗
Nonlinear driver impedance mapping of highly segmented output stage in aforementioned VM
drivers requires complex pre-driver circuitry switching at the full data-rate, thus resulting in sig-
nificant parasitic at the output nodes and dynamic power consumption. In [29], analog impedance
control loops provide high-granularity 2-tap FFE weight control which allows elimination of digi-
tal segmentation and amortization of static power induced by the impedance control loops in high-
density channel I/O, dramatically mitigating pre-driver complexity and dynamic power. Our work
in this transaction is improved upon the low-swing VM transmit equalizer with impedance control
which obviated output stage segmentation. However, our work differs from [29] as follows: 1) A
pulse-selected multiplexing technique in the output stage allows higher bandwidth with reduced
parasitic capacitance and resulting deterministic jitter (DJ), 2) Incorporation of 2-tap FFE which
necessitated 3-stack NMOS transistors in the past work is realized with 2-stack NMOS transistors
reducing additional dynamic power, 3) PAM-4 modulation is readily realized by exploiting 4-level
1-tap FFE used in NRZ coding.
4.1 System Architecture
Shown in Fig. 4.1 is an overall transmitter architecture of the proposed dual-mode NRZ/PAM-
4 two channel transmitter. The NRZ performance is configurable with the need for impedance
match to the characteristic impedance of channels in high-performance (HP) mode or for signal-
ing efficiency in energy-efficient (EE) mode by disabling some of building blocks as indicated by
grey lines. Two or potentially higher density of TX channels share a global voltage regulator and
impedance control loops for saving per-lane TX power. For the 1/4-rate operation of the trans-
mitter, a 1/2-rate externally differential clock is divided by two to produce the 1/4-rate quadrature
clocks followed by CMOS buffers with duty-cycle correction (DCC) and quadrature error correc-
∗ c©[2018] IEEE. Parts of this chapter are reprinted from "A low-power dual-mode 20-Gb/s NRZ and 28-Gb/s PAM-
4 voltage-mode transmitter", by Hae-Woong Yang, Ashkan Roshan-Zamir, Young-Hoon Song, and Samuel Palermo,



















































































































Figure 4.1: Block diagram for a Two-channel NRZ/PAM-4 Transmitter.
tion (QEC) via 5b of p-n strength and 5b MOS-cap adjustment, respectively [29].
Sixteen bits of low speed parallel 215 − 1 pseudo-random bit sequence (PRBS) input data are
serialized by 16:8 multiplexer followed by a path-selector which directs the initially serialized data
to either NRZ or PAM-4 data path. An optimal choice of sampling phase at each serializer stage is
essential to perform a wide range of data-rates. For NRZ operation, initially multiplexed 8b data
pass through a latch-based 8:4 multiplexer stage followed by four parallel intermediate serialization
stage to where current and 1-UI delayed bits are returned in a recursive sequence with relaxed
timing margin and power. On the other hand, PAM-4 data are split by 4b LSB and MSB portions
after gray-coding for simultaneous inputs of intermediate serializer stage. In order to collaborate
with output driver slices in the output multiplexing (OM) structure, the intermediate serialization
stage is segmented by four slices for parallelism. Each slice in the intermediate serializer shown
in Fig. 4.1 includes retiming flip-flops, data multiplexers, encoder logic, predrivers, and pulser
with propagation delays tclk−q, tmux, tencode, and tpulse, respectively. The proposed low-swing
differential VM transmit driver featuring 4:1 output multiplexing obivates the nonlinear driver
impeadnce mapping of highly segmented output stage.






























































































Figure 4.2: Static timing analysis in the serierlizer circuit over scalable supply voltage: (a) Timing
diagram for data and clock path; (b) Propagation delays of digital logic circuits over scalable
supply; (c) Maximum data rate vs. phase selection.
driver that sets either the de-emphasis level in NRZ mode or the middle symbol levels in PAM-4
mode. The background analog control allows to track environmental changes and PVT varia-
tions [29]. The transmitter provides configurable performance in NRZ modulation. In the high-
performance (HP) controlled-impedance setting, the output impedance of the transmit driver is
50
matched to the nominal characteristic impedance of the channel on any data pattern thus absorbs
signal echo reflected at the receiver by allowing extra current drawn to the output driver [30].
Significant power can be saved in the energy-efficient (EE) impedance-modulated setting with sig-
naling efficiency in the transmit driver and disabling some of building blocks at the expanse of
output termination [31, 29].
tclk−q + tmux + tencoder + 1 · UI < n · UI + td,pulse + tskew (4.1)
td,pulse + tskew + (n+ 1) · UI < tclk−q + tmux + tencoder + 4 · UI (4.2)
n− 3
[tclk−q + tmux + tencoder − tskew]Supply@750mV
< fo <
n− 1
[tclk−q + tmux + tencoder − tskew]Suppy@1.05V
(4.3)
4.2 Static Timing Analysis on the Critical Timing Path
The retiming flip-flops generate the necessary 1-UI delay for the 2-tap FFE in NRZ mode by
returning data in a recursive sequence with relaxed timing margin and power consumption while
in PAM-4 mode the parallel 8b input is retimed into two 4b MSB and 4b LSB paths with a same
clock. The data is propagated through combinational logic to be translated into switching signals
upon different configurations (HP or EE setting) before being captured by the time-interleaved
pulse clocks in the level-shifted pre-drivers.
The objective of STA is to make an optimal choice of sampling phase at the serialize stage to
operate over a wide range of data rates by defining minimum and maximum possible data rates [42].
The allowable data-rates are restricted by contamination delays introduced by the combination of
main and post 1-UI data in NRZ as shown in Fig. 4.2b. The lowest and the highest data rates are
determined by the TA and TB as shown by (4.1) and (4.2), respectively.










CTG,on = 2CeffW + CgW
CTG,off = 2CeffW
Cinv = 3fCgW, RTG = R
τinput_mux = RTG(CTG,on + 3CTG,off + Cinv)










Cnand = 1.5CeffW,  Cinv = 3fCgW,  Rnand = R
τoutput_mux = Rnand(Cnand + Cinv)
 = R(1.5CeffW + 3fCgW)
×f
(b)
Figure 4.3: Simplified circuit diagrams of VM transmit driver: (a) Input multiplexing; (b) Output
multiplexing.
52
sumption at lower data rates. Propagation delays over the scalable supply are obtained from post-
layout simulation as shown in Fig. 4.2c. The allowable data-rate with this process node used in
this work is shown in Fig. 4.2d. For n = 3, up to 20Gb/s NRZ operation is acheivable. For n > 3,
however, the maximum data rates are limited below what is achievable mathematically with clock
bandwidth.
4.3 Differential 4:1 Output-Multiplexing Transmit Driver
Major advantage of output multiplexing stems from saving dynamic power consumption achiev-
able through relaxed switching speed due to time-interleaved parallel switching in the pre-driver
stage which allows reduction of supply voltage [18, 16]. The output multiplexing can also ben-
efit from elimination of deterministic jitter (DJ). Conceptual illustrations of the input and output
multiplexing circuits and timing diagrams are shown in Fig. 4.3. For clarity, only one polarity
of differential inputs is shown. An equivalent RC models are defined to perform an analysis at
the critical node. In addition to the time-constant formed by on-resistance of a transfer gate and
turn-on capacitance (CTG,on), each of three turn-off transfer gates contribute junction capacitance
(2CeffW ) to larger time-constant (τinput_mux) in the input multiplexing and thus lower edge rate
as depicted in Fig. 4.3a. This results in significant ISI at the pre-driver input. Although, data
edge will be sharpened by the following inverter-based pre-driver which has a fan-out of f , the
degraded DJ remains up to the output. Assuming same resistance as transfer gate’s on-resistance,
the time-interleaved NAND gates pre-charged by low-speed data in the output multiplexing al-
low independent caputuring of data by a pulse clock with relaxed timing-margin. And only one
switch segment is activated at each pulse-clock without being affected by prior and successive bit
values as illustrated in the Fig. 4.3b. The reduction of self-capacitance at internal nodes by sepa-
rating predriver segments for different phases improves the edge rate thus effectively extends the
bandwidth of output stage and improves the DJ. Additionally, this NAND gate does not require
complementary pulse clocks and power to generate them. Without the time-matching requirement
of the differential pulse clocks, the output multiplexing will further improve jitter performance and





















































































M5t M6t M7t M8t M9t

















































M5t M6t M7t M8t M9t


































Figure 4.5: Transmit driver for transitioning and de-emphasized bit state with associated
impedance paths in (a) High-performance(HP) Mode (for both NRZ and PAM-4) and (b) Energy-
efficient(EE) mode (for NRZ only).
One of the challenges associated with scalable-supply operation with voltage-mode output
drivers involves maintaining proper match to the characteristic impedance of channel with re-
duced supply for low power without size overhead of switch NMOS in the output driver. The
level-shifting pre-driver stage is proven effective by producing pulse-sampled data boosted up to
55
scalable DVDD+Vthn when turned on such that size of the switch NMOS transistor can become
smaller [18, 16, 29]. In order to incorporate 2-tap FFE in the transmit driver, however, the size over-
head imposed by the 3-stack NMOS transistors, two switch NMOS transistors and one impedance
controlled NMOS in series as shown in Figure 4.4a, on the low-impedance path put constraints on
the edge rate and power efficiency due to the large loading capacitance driven by the pre-driver
stage [29]. Introducing an encoder logic to provide performance scalability in this work, a trans-
mit driver with 2-NMOS stack is proposed allowing for dramatic size reduction of switch NMOS
transistors, thus higher bandwidth and dynamic power saving. In order to allow the serialization
and predriver logic to operate at the reduced supply voltage, the final 4:1 serialization is performed
in the output multiplexing driver of Fig. 4.5. This driver consists of three parallel segments with
all NMOS transistors. Each segment has single top/bottom analog-controlled transistors, which
control the output impedance and set either the de-emphasis level in NRZ mode or the middle
symbol levels in PAM-4 mode, and four parallel slices of middle digitally-switched transistors that
perform the output multiplexing.
4.3.1 High-Performance NRZ mode
In many practical wireline I/O transmitters, equalization techniques are used to compensate
for bandwidth-limited channels by linearly pre-distorting transmit signal to compensate for ISI.
However, the controlled impedance for termination is of critical concern. Insertion loss deviation
(ILD) due to reflections can become more serious than frequency-dependent loss as the reflection
becomes another major source of ISI. For consistent electrical performance in the digital I/O link
where the variation tolerance of TX output impedance is low (e.g. legacy backplane, low-cost con-
nector, and multi-drop bus (MDB)), the transmit equalizer can be configured to high-performance
(HP) controlled-impedance setting with power penalty. Fig. 4.5a shows the VM differential 4:1
output multiplexing transmit driver circuitry and signal paths for the HP setting. The low-swing
VM driver is comprised of 2-stack pull-up/pull-down NMOS transistors. The NMOS switches
(M1-M3) in each slice are driven by level-shifted pre-driver outputs to close signal paths for dif-
ferential full/de-emphasized swing outputs. Similarly, LSB and MSB data encoded in the logic
56
slices that drive M1-M3 with pre-drivers to form signal paths for full/mid-level PAM-4 symbols.
In addition to the M1-M3 switch transistors, other NMOS transistors M6-M8 are stacked to control
impedance of highlighted branches for 2-tap FFE in NRZ mode or middle symbol generation in
PAM-4 mode with the gate voltages VXXHP, VXXHPF, and VXXHPD produced by the AICLs that
emulates corresponding signal paths.
For a transitioning bit in the HP setting in NRZ and a gray-coded symbol 2b’00 or 2b’10 in
PAM-4, the maximum output swing is generated by closing the lower impedance (M1 and M6)
and higher impedance (M2 and M7) paths as highlighted in red dash-dot line. In the parallel
combination of these paths the output impedance is terminated to the characteristic impedance of
channel such that






where Z0 and α are the characteristic channel impedance and the peaking ratio between full
and de-emphasized output voltage swings, respectively. De-emphasized output swing is produced
by creating a voltage divider path from VREF to GND. In order to prevent glitches when full-
swing bits turn to de-emphasized level for data run-length greater than one in NRZ, only the high-
impedance path (M2 and M7) is de-activated. And another path (M3 and M8) is activated while
the low-impedance path (M1 and M6) remains closed as indicated by blue dash line such that,






As seen in (4.4) and (4.5), it is worth noting that a controlled channel match is achieved during
both transitioning and run-length greater than one. Similarly, for PAM-4 operation, the gray-coded
symbols 2b’01 and 2b’11, the mid-level output voltage where the α is set to 0.33, is produced with
low output impedance allowing fast transitions to mid-level symbols thanks to a low time-constant.






Table 4.1: A boolean function logic of switching signals in the dual-mode NRZ/PAM-4 transmit
driver



































































☺ Dynamic power is greatly reduced 

































LSB MSB Y[n] YD[n] YF[n] Y[n]* YD[n]* YF[n]*
PAM-4














where the VREF is the differential peak-to-peak output voltage swing level and Vde−emp, the de-
emphasized voltage level.
4.3.2 Energy-Efficient NRZ mode
A major disadvantage with the HP setting, however, is its signaling inefficiency involving in
generating lower de-emphasized output swing as shown in (4.7). For signaling efficiency, it is de-
sirable that lower current is drawn from the voltage regulator output for de-emphasized swing when
58
the transmit driver is generating long stream of 0s/1s. Among a few low-swing VM output drivers,
impedance modulating transmit driver introduced in [31] yields increased output impedance so as
to produce de-emphasized output such that induced current scales linearly with output voltage
swing for signaling efficiency. As the TX output impedance becomes greater than ZO for de-
emphasis, however, reflections from the RX end will not be absorbed at the TX output resulting
in uncompensated ISI. Some studies have found that impedance discontinuity is unavoidable due
to the package parasitic capacitance, bonding inductance and process variation in the transmitter
output stage. Given a good termination at the receiver side, however, the impact of impedance dis-
continuities becomes less significant over longer channels, because the resulting reflections must
experience higher attenuation before bouncing at the transmitter output [31, 29].
Fig. 4.5b shows transmit driver circuitry with signaling paths configured to energy-efficient
(EE) impedance-modulated setting for signaling efficiency. For impedance control on the asso-
ciated paths gate voltages, VXXEEF and VXXEED on M5 and M9, respectively, are generated by
impedance control loop for both maximum and de-emphasized output swings. During a transition-
ing bit period in EE setting, the maximum output swing is achieved with ZO output impedance,
when higher impedance M3-M9 and lower impedance M1-M5 paths are activated in parallel as















By increasing the output impedance of the transmit driver for run-length greater than one, only
the higher impedance (M3-M9) path remains closed to produce de-emphasized output swing as
































































































Figure 4.6: Global replica-bias loops for output driver impedance and level control: (a) de-
emphasis and PAM-4 opposite polarity path control in HP setting (disabled in EE setting); (b)
de-emphasis primary polarity path control; (c) full-swing impedance control; (d) Mismatch-free
IDAC circuitry used in (a).





whereas the current drawn for full-swing in EE mode is same as the one in HP mode shown
by (4.6). From (4.10), the signaling current is reduced linearly with de-emphasis level.
Additionally, dynamic power from the pre-driver is significantly reduced in EE setting by dis-
abling all pre-drivers that drive low impedance path for run-length greater than one. With the
predriver stage directly toggled by data X[n] (or X[n − 1]) in the 3-stack NMOS transmit equal-
izer used in [29], pre-drivers associated with X[n− 1] (or X[n− 1]) should wastefully drive large
60
Figure 4.7: Impedance mapping of the signal paths in the output driver.
M2tn and M2bn while the entire 3-stack low impedance path should be disabled byX[n] (orX[n])
as shown in Fig. 4.4a. In the proposed transmit driver where the pre-drivers producing encoded
data Y [n], YD[n], and YF [n] for 2-stack NMOS transistors, however, only pre-drivers that drive
relevant NMOS transistors in long stream of 0s/1s are activated, cutting down the number of
power-consuming transistion occurences of pre-drivers, i.e. effectively reduce the activity factor,
a shown in (2.1). Table. 4.1 summarizes the boolean function table of encoder logic.
4.4 Impedance Control Loops
Impedance Control Loops running in the background can continuously track variation in device
parameters due to environmental changes such as temperature. In order to provide transmit driver
with proper NMOS biasing voltages that controls de-emphasized swing level for NRZ and mid-
dle level PAM-4 modulation, replica-based impedance control loops are implemented as shown in
Fig.4.6. Only two high-precision SMD resistors are used to emulate receiver-side termination. By
61
applying desired reference voltages of the de-emphasized output swings of NRZ or PAM-4 middle
levels to the replica circuits (a) and (b), nodes that emulate positive and negative polarities of out-
put are forced to be the reference voltages such that output driver’s NMOS gate-biasing voltages
are generated through negative feedback loops. The impedance controlled NMOS transistors in
the two channel transmit drivers are driven by a set of inverters powered by supply voltages gener-
ated by the impedance control loops, thus eliminating interaction between different loops [26, 29].
For all settings, replica switch transistors bias is generated by a diode-connected NMOS transistor
whose source is connected to the scalable DVDD, producing a voltage level, VLS=DVDD+Vthn,
consistent with the level shifting pre-driver output as shown in Fig. 4.6a. In the EE setting, nodes
at VUPHPD and VDNHPD are tied to GND to disable the impedance path, and the control loop cir-
cuitries in (a) including error amplifiers and IDAC are completely disabled reducing considerable
amount of power. In Fig. 4.6b, gate control voltages VUPHPD and VDNHPD for impedance con-
trol of primary polarity path are obtained by a pair of error amplifiers (B and C) enabled for HP
mode only. During the HP mode, a constant channel match is achieved independent of the data
pattern. In EE setting, another pair of error amplifiers (A and D) are used to produce bias volt-
ages VUPEED and VDNEED that set higher impedance modulated path while disabling the former
pair of error amplifiers used in HP setting. And inverters that produce gate voltages, VUPHPD and
VDNHPD are disabled to shut off lower impedance paths needed in HP setting only. Whether it is
HP or EE setting, full swing control is achieved by parallel combination of a higher- and a lower-
impedance paths emulated by the feedback loop in Fig(c). Single-ended full-swing levels can track
the reference amplitudes at 3/4VREF and 1/4VREF, thus producing gate-biasing voltages for output
driver. It is worth noting that pull-up and pull-down impedance are independently controlled in
all three control loops. Equalization settings with high resolution is achievable by implementing
low-frequency global DACs to produce reference voltage levels at the reference voltage input to
the error amplifier’s bias circuits in the replicas. This compares favorably with achieving tap value
control via a highly nonlinearly segmented output stage, which requires complex pre-driver cir-
cuitry switching at the full data rate [30, 4, 31, 32]. While there is some power overhead associated
62
with the global analog feedback loops, power amortization in a multi-channel system minimizes
the impact on the overall transmitter energy efficiency.
Shown in Fig. 4.6d is the IDAC that provide current biasing for the replica in Fig.4.6a of oppo-
site polarity path only enabled in the HP setting. It is essential to guarantee good current matching
between pull-up and pull-down output of the IDAC for the linearity of the PAM-4. Two cascode
arms between M3-M4 and M5-M6 are capable of suppressing presumable current mismatch caused
by substantial voltage difference between nodes X and Y, namely VXY [43]. Copied from pull-up
output, due to the channel-length modulation (λ 6= 0) Thanks to output impedance boosted by
cascoding, the current mismatch is determined by the voltage mismatch between P and Q nodes
expressed by






(AVDD− Vbias − |VTHP|)2 (λVPQ) (4.11)
which, compared to the simple current mirror, |VSD| mismatch is suppressed to
VPQ =
VXY
(gm6 + gmb6) rO6
(4.12)
The second current mismatch is challenged by NMOS VDS voltages as low as 31.25mV for VREF =
100mV and α = 0.125 to remain in saturation. In order to provide VDS matching between M8 and
M9, a negative feedback with an error amplifier is incorporated with a low-voltage cascode current
mirror to properly bias the gate of the M7.
4.5 Design Consideration
The main performance goal in the design of the transmitter is following: 1) low power, 2)
maximum data-rate, 3) wide output swing range, 4) Minimizing mismatch between replica circiut
and output driver. Depending on the target performance, one or more of these metrics will affect
the design choices and sizing of the transistors in the output driver.
The width of the output driver NMOS transistors is usually deternmined by the trade-offs men-
tioned above whereas the channel length of the transistors are set to minimum. Since the impedance




































Figure 4.8: Non-segmented output driver.
large for its relatively low on-resistance. As discussed earlier, however, having a large MOS-
FET switch will increase the dynamic power consumption due to the fan-out requirement of the
pre-driver stage. Therefore, there is unfavorable tradeoff between the matching requirement and
dynamic power consumption. Systematic approach to designing optimal energy-efficiency is pro-
vided in the following subsections.
4.5.1 Low Dynamic Power
An energy packet per cycle results in a dynamic power dissipation as defined by (2.1) where
a half of it is consumed in charging the load capacitance and the other half in dissipated while




In order to minimize the dynamic power in the pre-driver stage that drives non-segmented output





















Figure 4.9: Parasitic elements in a differential termination scheme on transceiver [9].
switched by discrete voltage level as seen in Figure 4.11 such that parasitic capacitance on the
gates seen by predrivers is small.
4.5.2 Maximum Data-rate
The maximum data-rate of the transmitter is defined by the time-constant at the final output
node. For simplicity, when the ideal transient input by the predriver is applied to the transmit
driver input as depicted in Figure 4.10a, the output voltage of the transmit driver can be derived as,











where VTXP (t) is single-ended output of the positive polarity, RUP,TOT is total output resistance
of pull-up path, and C is the capacitance seen at the output node accounting for capacitance due













Figure 4.10: Illunstrations of the maximum data-rate achievabed by lowerbound settling time at
95% of steady-state response generated by transmit equalizer driven by pre-driver with an (a)Ideal
transient-response, and (b)A realistic transient-response.
66
In order to set the settled output swing, the rule-of-thumb steady-state swing of VTXP (t) for
signal integrity is defined by 95 percent of 3/4VREF , thus,











where the t95% should be under 1-UI. Therefore,
t95% = RUP,TOTC ln 13.3 < 1UI (4.16)
From the observation in (4.16), it is clear that the maximum data-rate can be achieved only by
minimizing the capacitance seen at the output. So, it is critical to minimize the width of switching
MOSTETs at the output driver not just to reduce the dynamic power consumption but to achieve
the higher data-rates. Practically speaking, as shown in Figure 4.10b the realistic transient response
with the finite rising time in the pre-driver will extend settling time, t95% to t‘95%. For the energy-
efficient impedance-modulated setting,RUP,TOT could be as high as 320-Ω, the maximum data-rate
could be reduced by the higher time-constant at the output node unless the output capacitance is
far dominated by packaging parasitic.
4.5.3 Maximum Output Swing
A simple analysis is performed to show the maximum differential output swing of the VM
output driver. It is clear that the maximum and minimum voltage outputs at each polarity are
1/4VREF and 3/4VREF , respectively. Due to the constraint of keeping all NMOS transistors in
deep-triode region by definition,
VDS  VGS − VTHN . (4.17)




(VGS − VTHN) (4.18)
67
where VGS and VTHN are gate-to-source voltage and NMOS threshold voltage, respectively. How-
ever, this requirement is too stringent to meet, therefore, the linear region can be redefined with a
















Figure 4.11: Simplified output driver segment with input signal profile.
In order to find the maximum differential output swing in the linear region, the worst condition
can be tested at the pull-up path since the VGS is less but the VTHN is higher than pull-down
NMOS transistor whose body is tied to substrate, and thus lower overdrive voltage. Therefore
larger NMOS transistors are nornally used for upper double-stack NMOS transistors. Next, with
68
the double-stack NMOS segment in the output driver shown in Figure 4.11, the analysis is a little
more complex adding some degree of freedom. By defining condition based on (4.19), for the














and for the lower transistor,
VREF − VX <
1
2
(VG − VX − VTHN2) (4.21)





(2VG + VR − VTHN1 − 2VTHN2) . (4.22)
assuming VTHN1 = VTHN2 = 400mV with the level-shifted VR = DVDD+VTHN , the theoretical
output swing level with linearity can be calculated as high as 523mV. It is shown that level-shifted
pre-driver output not only relaxes the size constraint of the output driver’s switching transistors but
also raise the output swing level allowing extended linear range.
4.5.4 Minimization of Mismatch
The question is how to optimize the performance and energy-efficiency with sizing of the final
stage where the double-stack NMOS transistors in series on each signal path of the output driver
form a relatively low impedance as,
RUP,TOT = RUP1 +RUP2 (4.23)
where the RUP1 and RUP2 are on-resistance of lower and upper NMOS double-stack transistors on
pull-up segments. Again, the worst condition is assumed at VTXP = 3/4VREF on the low pull-up













Error Amp Dynamic range
(b)
Figure 4.12: Gate tuning voltage needed for varying peaking ratio: (a) Impedance sensitivity with
varying control voltage and (b) Gate-control voltages on all impedance paths.
70
the Figure 4.11 can be expressed as,
RUP1 =
1





µnCox (WUP2/Lmin) (VG,ctrl − VX − VTHN2)
(4.25)
where the VPredrv is the discrete level-shifted pre-driver output voltage when the logic is high, VX
is the node voltage between upper and lower NMOS transistors, and VG,ctrl is the gate voltage
generated by inverter whose supply voltage is provided by an error amplifier in the impedance
control loop. For large VDS the deep-triode approximation becomes no longer valid as the DC
resistance that determine the accurate swing and equalization control when utilizing impedance
control loop and AC resistance that is critical for instantaneous change of impedance level by
switching [44, 45].
Vctrl = VX + VTHN2 +
WUP1 (VPredrv − VTXP − VTHN1)
µnCoxWUP1RUP,TOT (WUP2/Lmin) (VPredrv − VTXP − VTHN1)−WUP2
(4.26)
In any given MOSFET technology, the variance of VTH amaong adjacent transistors reduced





whereAV TH is a process-dependent constant. If we assume the difference between two random




+ σ2VTH2 , (4.28)
where ∆VTH = VTH1 − VTH2.
71









Figure 4.13: Simulated impedance mismatch between output driver and its replicas with (a) Small





































































Figure 4.14: Microphotograph of the 2-channel transmitter with a detailed layout of the output
stage.
In order to reduce the mismatch we will only increase W in quadratic scale according to the
Pelgrom’s law while the channel-length of all NMOS transistors are kept minimum for highest
bandwidth. The gate-voltages on upper and lower NMOS transistors on the pull-up path are contin-
uously impedance-controlled and discrete switching. It is desirable to have minimum size NMOS
for the switching NMOS transistor and large one for the impednace controlled NMOS transistor
for reducing the mismatch. At the same time tuning gate-voltage produced by impedance control
loops should be bounded within the output swing level of the error amplifier used in the impedance
control loops.
It is difficult to keep the all NMOSs in the transmit driver in the linear region over a wide range
of output impedance values. Figure 4.12 shows the simulation results of resulting gate-voltage
for varying width of impedance-controlled upper NMOS transistor and lower switched NMOS





















Agilent 1169A Active 
Differential Probe
Figure 4.15: Measurement setup.
in the output driver and the replicas will lead the impedance to be out of the specification because
the tuning voltage at the gate may not be tuned close to the center of the spec.
From (4.23), (4.24), and (4.25), it can be observed that for small WUP1 for low parasitic capac-
itance seen by pre-driver and output node, the size burden is moving to the upper NMOS transistor
where the continuous gate voltage is tuned by impedance control loop. Excessively large NMOS
used for low mismatch will result in voltage insensitive impedance tuning. Shown in Figure 4.13
is validation of variance of matching between replicas and output driver through Monte Carlo
simulations.
4.6 Measurement Results
A prototype chip is fabricated on 65nm general-purpose CMOS technology and the chip mi-
crograph is shown in Fig. 4.14. A differential 1/2-rate external clock source is used for triggering
and clocking input (Figure 4.15). A built-in PRBS generator to facilitate testing is used. Two-
channel transmitters are implemented in order to verify potential capability for high-density I/O






Figure 4.16: S-parameters of test channel.
with forwarded-clocking [46, 38]. Each transmitter channel occupies 0.029mm2 and entire global
impedance control loops 0.031mm2. In order to minimize the time-constant induced by wiring
parasitic coupling, four parallel switched transistors (M1-M3 in Fig. 4.5) in the output stage that
utilize output multiplexing are placed adjacent to each pre-driver and level-shifter bundle’s output.
The chip is mounted on the PCB utilizing chip-on-board packaging to minimize bonding induc-
tance. Four different lengths of PCB channels are used in order to characterize the transmitter with
various insertion loss levels for wide range of speed as shown in Fig. 4.16. The I/O channels con-
sist of 1.3", 10", 16", and 28" RO4350B channel with grounded coplanar waveguide (GCPWG),
and end-launch SMA connectors.
The global impedance modulation loops precisely control the required 2-tap FFE weight for
the configurable TX equalization range 2∼12dB across the tunable swing level 100∼400mVppd.
In the EE setting, transmit driver path impedance for de-emphasis relative to the peaking ratio
(Fig. 4.17b) is obtained by measuring the output voltage level using a low-speed fixed pattern
75

















































Figure 4.17: (a) Measured equalization impedance mapping in the EE mode. (b) Transmitter
output overlay of de-emphasis levels between 2∼12dB with fixed pattern running at 8-Gb/s.
with only short IO channel on the prototype PCB. The deviation from the theoretical impedance
is confined to 11.4% at α = 0.375 (12dB) as shown in Fig. 4.17a. A time-domain overlay of the
































Figure 4.18: Measured NRZ TX output eyes and jitter performance with FR-4 channel, 215 − 1
PRBS, (a) at 16Gb/s in HP mode, (b) at 16Gb/s in EE mode, (c) at 20Gb/s in HP mode, (d) at
20Gb/s in EE mode
its effectiveness to provide high-resolution and wide-range control of de-emphasis with analog
impedance control loops.
Fig. 4.18 shows the equalized eye-diagrams and jitter performance of the 16 Gb/s and 20 Gb/s
NRZ 215 − 1 PRBS pattern measured from a real-time oscilloscope after 16" RO4350B channel
in both HP and EE settings. Given as performance constraints to demonstrate scalable power-
efficiency over data rate, 50 mV eye-height and 1/2-UI eye-width are chosen [29]. The determin-




























Figure 4.19: Measured PAM-4 TX output eyes with 1" FR-4 channel, 215 − 1 PRBS, (a) with
phase calibration at 16Gb/s (b) with level density histogram at 16Gb/s (c) with phase calibration at
28Gb/s (d) with level density histogram at 28Gb/s
ps and 780 fs in the EE setting in NRZ mode, respectively.
The scope measures transmitter’s PAM-4 operations at 16-Gb/s and 28-Gb/s rate over a 1.3"
RO4350B channel, as shown in Fig. 4.19. The DCC and QEC circuitry allows for uniform eyes
out of the 1/4-rate transmitter. There is much debate within standards study group on newly de-
fined measurement protocols on PAM-4 linearity, jitter analysis, and accordingly test equipment
configurations [19] In order to quantify the linearity of PAM-4 operation, we conducted verti-
cal histogram measurements that yield 95% and 94.7% level separation mismatch ratio (RLM) at
16-Gb/s and 28-Gb/s rates, respectively, thus the analog-control approach is validated to ensure
linearity of the PAM-4 levels in the low-swing VM mode transmitter.
78
Table 4.2: Transmitter Power Breakdown
Global Impedance Control, 
IDAC & Voltage Regulator [mW]
(amortized across 2 TX)
Serializer, Encoder, Clock [mW]
Global Clocking Buffer [mW]


























EE HP EE HP
1.37 2.01
(DVDD=0.7V) (DVDD=1.05V) (DVDD=910mV)
While power supplies are separated for measurement yet local clock distribution network, se-
rializer, encoder, and pre-driver stages share the same supply voltage. Table 4.2 shows the power
breakdown fo the 20-Gb/s NRZ and 28-Gb/s PAM-4 case, and 8-Gb/s operation over lossy 28"
channel. Signaling inefficiency and additional static power caused by the replica bias circuitry
for opposite polarity path control in the HP setting (Fig. 4.6a) and its IDAC (Fig. 4.6d) produc-
ing matching pull-up/down currents for biasing impose power overhead in the HP mode in return
for proper termination to the channel. For the dc-balanced PRBS sequence the dynamic power
consumption could be reduced by 25% in the EE setting as the encoder logic circumvents un-
necessary switching of pre-drivers for data run-length greater than one thereby lowering activity
factor. Fig. 4.20a compares power dissipations of transmitter driving short and long backplane
channels between EE and HP settings for wide range of data-rates in NRZ operations. As the
data-rate escalates, the overall power consumption is dominated by dynamic power whereas the
static power does not scale by bandwidth but by output swing and de-emphasis level. The scalable
supply voltages are configured 700mV, 875mV, and 1.05V at 8, 16, and 20-Gb/s, respectively. The
energy-saving in the EE mode is more eminent for driving more lossy channel where the required
output swing and peaking ratio are higher to achieve the aforementioned performance constraints.
79
Table 4.3: Transmitter NRZ and PAM-4 Performance Comparison
P-C. Chiang
(ISSCC`14)





















































 PLL is excluded. PRBS generator running at 2.5-Gb/s is not excluded.
2,3

















Impedance control loop for opposite polarity path control and IDACs are disabled and dynamic
power is reduced due to the reduction of pre-driver power with the lower activity factor. However,
static power dissipation due to the signaling and impedance control loops remain fairly constant
regardless of data-rate change in PAM-4 operation as shown in Fig. 4.20b. Dynamic power con-
sumption is kept low in 28-Gb/s PAM-4 operation with 1/2-symbol rate and relaxed timing margin
allowing lower supply voltage at 910mV. Sub-mW/Gb/s power consumption is achieved in PAM-4
up to 28-Gb/s operation. Despite the hardware overhead introduced by dual-mode reconfigurabil-
ity and additional stage at the early serialization stage rendering sixteen parallel input instead of
eight used in [29], the energy efficiency is higher at 16-Gb/s EE setting thanks to output mul-
tiplexing with reduced supply voltage and smaller loading capacitance seen by the pre-drivers.
The performance of low-swing (VREF < 0.5VDD) NRZ and PAM-4 transmitters in literature are
summarized in Table 4.3. From the best of our knowledge, this transmitter is the first low-swing
dual-mode NRZ/PAM-4 transmitter for moderate channel loss compensation.
80
4.7 Chapter Summary
A low-power reconfigurable NRZ/PAM-4 transmitter. The dual-mode operation on a single
PHY is achieved by reusing a multitude of redundant building-blocks between NRZ and PAM-4.
Performance scalability in NRZ modulation allows higher energy-efficiency on applications where
the insertion loss variation tolerance is high by shutting off the opposite polarity path control loop
and associated IDAC. Higher signal integrity can be accomplished by providing impedance match
to the channel in the high-performance NRZ setting. Signaling efficiency coupled with dynamic
power saving thanks to the elimination of wasteful switching in the pre-driver stage is effective in
the energy-efficient mode.
81
67% 63% 60% 46%




















Figure 4.20: Power breakdown for scalable bandwidth in, (a) NRZ and (b) PAM-4.
82
5. CONCLUSIONS AND FUTURE WORK
5.1 Conclusion
Scalable voltage-mode transmitters which offers low static power dissipation and adopts an
impedance-modulated 2-tap equalizer with analog tap control, thereby obviate driver segmentation
and reducing pre-driver complexity and dynamic power. Energy efficiency is further improved
with capacitively driven low-swing global clock distribution and supply scaling at lower data rates,
while output eye quality is optimized at low supply voltages with automatic DCC/QEC phase
calibration of the local ILO-generated rail-to-rail quarter-rate clocks. A prototype fabricated in a
general purpose 65 nm CMOS process includes a 2 mm global clock distribution network and two
transmitters that support an output swing range of 100-300 mV with up to 12 dB of equalization.
The transmitter achieves 8-16 Gb/s operation at 0.65-1.05 pJ/b energy efficiency.
Secondly, low-power dual-mode NRZ/PAM-4 differential low-swing voltage-mode transmitter
exibits a quarter-rate output multiplexing architecture for low-power operation. The output multi-
plexing architecture allows for low-power operation at increased up to 20-Gb/s NRZ and 28-Gb/s
PAM-4 data rates. With the double-stack NMOS transistors used in the output driver and replicas,
the capacitance looking into the output driver could be greatly reduced. In NRZ mode, 2-tap feed-
forward equalization is realized with analog replica-bias tap control that is configurable in high-
performance controlled-impedance or energy-efficient impedance-modulated settings. This analog
control also allows for efficient generation of the middle levels in PAM-4 operation. Fabricated in
GP 65nm CMOS, the transmitter supports an improved output swing range of 100-400mVppd with
up to 12dB of equalization in NRZ mode and achieves energy efficiencies of 1.48 and 0.91pJ/b at
20-Gb/s NRZ and 28-Gb/s PAM-4 data rates. Operation in the NRZ energy-efficient impedance-
modulation setting allows for power savings of up to 32% relative to the controlled-impedance
setting.
83
5.2 Recommendations For Future Work
By the time of finishing this dissertation, more research is being carried out as a part of multi-
core interconnect system. The prototype chip is being designed on 14nm LP FinFET technology
targeting 370fJ/b at 25 Gb/s NRZ. There is much room to improve this work. While the scal-
able supply for reduced data-rate is one of the best power-reduction contributor owing to the its
quadratic saving of dynamic power, the scalable supplies were provided externally rather than
internal supply regulators or DC-DC converters.
Adaptive equalization on the transmitter is being investigated by providing feedback to trans-
mitter through back-channel if necessary regarding how to optimize TX equalization settings for
signal quality. TX will send a set of test patterns and RX will evaluate for training.
Power consumption will be further reduced the with the scaled replica in the analog impedance
control loops. As the scaling factor increases, the static power consumption in the replica would
be reduced by the same factor. However, the larger mismatch between the replicas and the output
driver for excessive power reduction will result in inaccurate control of the TX equalizer. Two
high-precision SMD resistors placed off-chip is egregious hardware overhead in a dense parallel
interface. It is recommended to replace them with programmable resistor bank by allowing an
external resistor for reference and replicate multiple resistors by calibration circuit.
84
REFERENCES
[1] “Top 10 energy-saving tips for a greener data centers.” Web, April 2007.
[2] “Toward 400G 56/64G PAM4 bit error rate test solution(ieee802.3 and CEI).” Web, February
2017.
[3] K. . J. Wong, H. Hatamkhani, M. Mansuri, and C. . K. Yang, “A 27-mW 3.6-Gb/s I/O
Transceiver,” IEEE J. Solid-State Circuits, vol. 39, pp. 602–612, April 2004.
[4] W. D. Dettloff, J. C. Eble, L. Luo, P. Kumar, F. Heaton, T. Stone, and B. Daly, “A 32mW
7.4Gb/s protocol-agile source-series-terminated transmitter in 45nm CMOS SOI,” in IEEE
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 370–371, Feb 2010.
[5] R. Sredojević and V. Stojanović, “Fully digital transmit equalizer with dynamic impedance
modulation,” IEEE J. Solid-State Circuits, vol. 46, pp. 1857–1869, Aug 2011.
[6] S. Gondi and B. Razavi, “Equalization and clock and data recovery techniques for 10-gb/s
cmos serial-link receivers,” IEEE Journal of Solid-State Circuits, vol. 42, pp. 1999–2011,
Sept 2007.
[7] O. Elhadidy, A.-R. Zamir, H.-W. Yang, and S. Palermo, “A 32 Gb/s 0.55 mW/Gbps PAM4 1-
FIR 2-IIR tap DFE receiver in 65-nm CMOS,” in Proc. IEEE Symp. VLSI Circuits, pp. C224–
C225, June 2015.
[8] S. A. G. Zhang, H. Zhang and B. Jiao, “A Tutorial on Pam4 Signaling for 56G Serial Link
Applications,” in DesignCon, Jan 2017.
[9] W. Bae and D.-K. Jeong, “A power-efficient 600-mvpp voltage-mode driver with indepen-
dently matched pull-up and pull-down impedances,” International Journal of Circuit Theory
and Applications, vol. 43, no. 12, pp. 2057–2071.
[10] “Data Age 2025: The Evolution of Data to Life-Critical.” Web, April 2017.
85
[11] W.-c. Feng, “The importance of being low power in high performance computing,” Cyber-
infrastructure Technology Watch Quarterly (CTWatch Quarterly), vol. 1, no. 3, pp. 11–20,
2005.
[12] W. Bae, G. Jeong, and D. Jeong, “A 1-pj/bit, 10-gb/s/ch forwarded-clock transmitter using a
resistive feedback inverter-based driver in 65-nm cmos,” IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 63, pp. 1106–1110, Dec 2016.
[13] F. O’Mahony, G. Balamurugan, J. E. Jaussi, J. Kennedy, M. Mansuri, S. Shekhar, and
B. Casper, “The Future of Electrical I/O for Microprocessors,” in Proc. IEEE Int. Symp.
VLSI Design, Automation and Test, pp. 31–34, April 2009.
[14] B. Leibowitz, R. Palmer, J. Poulton, Y. Frans, S. Li, J. Wilson, M. Bucher, A. M. Fuller,
J. Eyles, M. Aleksic, T. Greer, and N. M. Nguyen, “A 4.3 GB/s Mobile Memory Interface
With Power-Efficient Bandwidth Scaling,” IEEE J. Solid-State Circuits, vol. 45, pp. 889–898,
April 2010.
[15] F. O’Mahony, J. E. Jaussi, J. Kennedy, G. Balamurugan, M. Mansuri, C. Roberts, S. Shekhar,
R. Mooney, and B. Casper, “A 47× 10 Gb/s 1.4 mW/Gb/s Parallel Interface in 45 nm CMOS,”
IEEE J. Solid-State Circuits, vol. 45, pp. 2828–2837, Dec 2010.
[16] Y. H. Song, R. Bai, K. Hu, H. W. Yang, P. Y. Chiang, and S. Palermo, “A 0.47-0.66 pJ/bit,
4.8-8 Gb/s I/O Transceiver in 65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 48, pp. 1276–
1289, May 2013.
[17] G. Balamurugan, J. Kennedy, G. Banerjee, J. E. Jaussi, M. Mansuri, F. O’Mahony, B. Casper,
and R. Mooney, “A Scalable 5-15 Gbps, 14-75 mW Low-Power I/O Transceiver in 65 nm
CMOS,” IEEE J. Solid-State Circuits, vol. 43, pp. 1010–1019, April 2008.
[18] J. Kim and M. A. Horowitz, “Adaptive Supply Serial Links With Sub-1-V Operation and
Per-Pin Clock Recovery,” IEEE J. Solid-State Circuits, vol. 37, pp. 1403–1413, Nov 2002.
[19] “IEEE P802.3ap Task Force Channel Model Material,” Feb 2015.
86
[20] J. Kim, A. Balankutty, A. Elshazly, Y. Y. Huang, H. Song, K. Yu, and F. O’Mahony, “A
16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 1–3, Feb 2015.
[21] A. Roshan-Zamir, O. Elhadidy, H. Yang, and S. Palermo, “A reconfigurable 16/32 gb/s dual-
mode nrz/pam4 serdes in 65-nm cmos,” IEEE J. Solid-State Circuits, vol. 52, pp. 2430–2447,
Sept 2017.
[22] A. P. Chandrakasan and R. W. Brodersen, “Minimizing power consumption in digital CMOS
circuits,” Proc. IEEE, vol. 83, pp. 498–523, Apr 1995.
[23] M. Krstic, E. Grass, C. Stahl, and M. Piz, “System integration by request-driven gals design,”
IEE Proceedings - Computers and Digital Techniques, vol. 153, pp. 362–372, Sept 2006.
[24] J. F. Bulzacchelli, C. Menolfi, T. J. Beukema, D. W. Storaska, J. Hertle, D. R. Hanson,
P. Hsieh, S. V. Rylov, D. Furrer, D. Gardellini, A. Prati, T. Morf, V. Sharma, R. Kelkar,
H. A. Ainspan, W. R. Kelly, L. R. Chieco, G. A. Ritter, J. A. Sorice, J. D. Garlett, R. Callan,
M. Brandli, P. Buchmann, M. Kossel, T. Toifl, and D. J. Friedman, “A 28-gb/s 4-tap ffe/15-
tap dfe serial link transceiver in 32-nm soi cmos technology,” IEEE Journal of Solid-State
Circuits, vol. 47, pp. 3232–3248, Dec 2012.
[25] Y. Lu, K. Jung, Y. Hidaka, and E. Alon, “Design and Analysis of Energy-Efficient Recon-
figurable Pre-Emphasis Voltage-Mode Transmitters,” IEEE J. Solid-State Circuits, vol. 48,
pp. 1898–1909, Aug 2013.
[26] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally, and M. Horowitz, “A 14-
mW 6.25-Gb/s Transceiver in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol. 42, pp. 2745–
2757, Dec 2007.
[27] “White paper: The evolution of high-speed transceiver technology,” Nov 2002.
[28] J. Lee, P. Chiang, P. Peng, L. Chen, and C. Weng, “Design of 56 gb/s nrz and pam4 serdes
transceivers in cmos technologies,” IEEE Journal of Solid-State Circuits, vol. 50, pp. 2061–
2073, Sept 2015.
87
[29] Y. H. Song, H. W. Yang, H. Li, P. Y. Chiang, and S. Palermo, “An 8-16 Gb/s, 0.65-1.05 pJ/b,
Voltage-Mode Transmitter With Analog Impedance Modulation Equalization and Sub-3 ns
Power-State Transitioning,” IEEE J. Solid-State Circuits, vol. 49, pp. 2631–2643, Nov 2014.
[30] H. Hatamkhani, K.-L. J. Wong, R. Drost, and C.-K. K. Yang, “A 10-mW 3.6-Gbps I/O Trans-
mitter,” in Proc. IEEE Symp. VLSI Circuits, pp. 97–98, June 2003.
[31] R. Sredojević and V. Stojanović, “Digital link pre-emphasis with dynamic driver impedance
modulation,” in Proc. IEEE Custon Integr. Circuits Conf. (CICC), pp. 1–4, Sept 2010.
[32] Y. Lu, K. Jung, Y. Hidaka, and E. Alon, “A 10gb/s 10mw 2-tap reconfigurable pre-emphasis
transmitter in 65nm lp cmos,” in Proc. IEEE Custon Integr. Circuits Conf. (CICC), pp. 1–4,
Sept 2012.
[33] Y. Song and S. Palermo, “A 6-gbit/s hybrid voltage-mode transmitter with current-mode
equalization in 90-nm cmos,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 59, pp. 491–495, Aug 2012.
[34] M. Bassi, F. Radice, M. Bruccoleri, S. Erba, and A. Mazzanti, “A High-Swing 45 Gb/s Hybrid
Voltage and Current-Mode PAM-4 Transmitter in 28 nm CMOS FDSOI,” IEEE J. Solid-State
Circuits, vol. 51, pp. 2702–2715, Nov 2016.
[35] J. Lee, P. C. Chiang, P. J. Peng, L. Y. Chen, and C. C. Weng, “Design of 56 Gb/s NRZ and
PAM4 SerDes Transceivers in CMOS Technologies,” IEEE J. Solid-State Circuits, vol. 50,
pp. 2061–2073, Sept 2015.
[36] C. Menolfi, T. Toifl, R. Reutemann, M. Ruegg, P. Buchmann, M. Kossel, T. Morf, and
M. Schmatz, “A 25Gb/s PAM4 transmitter in 90nm CMOS SOI,” in IEEE Int. Solid-State
Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 72–73 Vol. 1, Feb 2005.
[37] R. Ho, T. Ono, R. D. Hopkins, A. Chow, J. Schauer, F. Y. Liu, and R. Drost, “High Speed
and Low Energy Capacitively Driven On-Chip Wires,” IEEE J. Solid-State Circuits, vol. 43,
pp. 52–60, Jan 2008.
88
[38] B. Casper and F. O’Mahony, “Clocking analysis, implementation and measurement tech-
niques for high-speed data links-a tutorial,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56,
pp. 17–39, Jan 2009.
[39] M. Mansuri, J. E. Jaussi, J. T. Kennedy, T. Hsueh, S. Shekhar, G. rugan, F. O’Mahony,
C. Roberts, R. Mooney, and B. Casper, “A Scalable 0.128âĂŞ1 Tb/s, 0.8âĂŞ2.6 pJ/bit, 64-
Lane Parallel I/O in 32-nm CMOS,” IEEE J. Solid-State Circuits, vol. 48, pp. 3229–3242,
Dec 2013.
[40] L. min Lee, D. Weinlader, and C. . K. Yang, “A sub-10-ps multiphase sampling system using
redundancy,” IEEE J. Solid-State Circuits, vol. 41, pp. 265–273, Jan 2006.
[41] L. Xia, J. Wang, W. Beattie, J. Postman, and P. Y. Chiang, “Sub-2-ps, static phase error
calibration technique incorporating measurement uncertainty cancellation for multi-gigahertz
time-interleaved T/H circuits,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, pp. 276–
284, Feb 2012.
[42] A. A. Hafez, M. S. Chen, and C. K. K. Yang, “A 32-48 Gb/s Serializing Transmitter Using
Multiphase Serialization in 65 nm CMOS Technology,” IEEE J. Solid-State Circuits, vol. 50,
pp. 763–775, March 2015.
[43] B. Razavi, Design of Analog CMOS Integrated Circuits (Irwin Electronics & Computer En-
ginering). McGraw-Hill Education, 2000.
[44] G. S. Jeong, S. H. Chu, Y. Kim, S. Jang, S. Kim, W. Bae, S. Y. Cho, H. Ju, and D. K. Jeong, “A
20 gb/s 0.4 pj/b energy-efficient transmitter driver utilizing constant- rmGrmm bias,” IEEE J.
Solid-State Circuits, vol. 51, pp. 2312–2327, Oct 2016.
[45] K. L. Chan, K. H. Tan, Y. Frans, J. Im, P. Upadhyaya, S. W. Lim, A. Roldan, N. Narang,
C. Y. Koay, H. Zhao, P. Chiang, and K. Chang, “A 32.75-gb/s voltage-mode transmitter with
three-tap ffe in 16-nm cmos,” IEEE Journal of Solid-State Circuits, vol. 52, pp. 2663–2678,
Oct 2017.
89
[46] G. Balamurugan and N. Shanbhag, “Modeling and mitigation of jitter in multiGbps source-
synchronous I/O links,” in Proceedings 21st International Conference on Computer Design,
pp. 254–260, Oct 2003.
90
