High Speed Reconfigurable NRZ/PAM4 Transceiver Design Techniques by Roshan Zamir, Ashkan
  
  
HIGH SPEED RECONFIGURABLE NRZ/PAM4 TRANSCEIVER DESIGN  
TECHNIQUES 
 
A Dissertation 
by 
ASHKAN ROSHAN ZAMIR  
 
Submitted to the Office of Graduate and Professional Studies of 
Texas A&M University 
in partial fulfillment of the requirements for the degree of  
DOCTOR OF PHILOSOPHY 
Chair of Committee,  Samuel Palermo 
Committee Members, Sebastian Hoyos 
 Krishna Narayanan 
 Behbood B. Zoghi 
Head of Department, Miroslav M. Begovic 
 
May 2018 
 
Major Subject: Electrical Engineering 
 
Copyright 2018 Ashkan Roshan Zamir
 ii 
 
ABSTRACT 
 
While the majority of wireline standards use simple binary non-return-to-zero (NRZ) 
signaling, four-level pulse-amplitude modulation (PAM4) standards are emerging to 
increase bandwidth density. This dissertation efficient implementations for high speed   
NRZ/PAM4 transceivers. The first prototype includes a dual-mode NRZ/PAM4 serial I/O 
transmitter which can support both modulations with minimum power and hardware 
overhead. A source-series-terminated (SST) transmitter achieves 1.2Vpp output swing and 
employs lookup table (LUT) control of a 31-segment output digital-to-analog converter 
(DAC) to implement 4/2-tap feed-forward equalization (FFE) in NRZ/PAM4 modes, 
respectively. Transmitter power is improved with low-overhead analog impedance control 
in the DAC cells and a quarter-rate serializer based on a tri-state inverter-based mux with 
dynamic pre-driver gates. The transmitter is designed to work with a receiver that 
implements an NRZ/PAM4 decision feedback equalizer (DFE) that employs 1 finite 
impulse response (FIR) and 2 infinite impulse response (IIR) taps for first post-cursor and 
long-tail ISI cancellation, respectively. Fabricated in GP 65-nm CMOS, the transmitter 
occupies 0.060mm2 area and achieves 16Gb/s NRZ and 32Gb/s PAM4 operation at 10.4 
and 4.9 mW/Gb/s while operating over channels with 27.6 and 13.5dB loss at Nyquist, 
respectively. The second prototype presents a 56Gb/s four-level pulse amplitude 
modulation (PAM4) quarter-rate wireline receiver which is implemented in a 65nm 
CMOS process. The frontend utilize a single stage continuous time linear equalizer 
(CTLE) to boost the main cursor and relax the pre-cursor cancelation requirement, 
 iii 
 
requiring only a 2-tap pre-cursor feed-forward equalization (FFE) on the transmitter side. 
A 2-tap decision feedback equalizer (DFE) with one finite impulse response (FIR) tap and 
one infinite impulse response (IIR) tap is employed to cancel first post-cursor and long-
tail inter-symbol interference (ISI). The FIR tap direct feedback is implemented inside the 
CML slicers to relax the critical timing of DFE and maximize the achievable data-rate.  In 
addition to the per-slice main 3 data samplers, an error sampler is utilized for background 
threshold control and an edge-based sampler performs both PLL-based CDR phase 
detection and generates information for background DFE tap adaptation. The receiver 
consumes 4.63mW/Gb/s and compensates for up to 20.8dB loss when operated with a 2-
tap FFE transmitter. The experimental results and comparison with state-of-the-art shows 
superior power efficiency of the presented prototypes for similar data-rate and channel 
loss.  The usage of proposed design techniques are not limited to these specific prototypes 
and can be applied for any wireline transceiver with different modulation, data-rate and 
CMOS technology. 
 
 
 
 
 
 
 
 
 iv 
 
To my mother and father, 
Soheila and Manouchehr. 
My inspiration, my motivation, my joy 
 
 v 
 
ACKNOWLEDGEMENTS 
 
First and foremost, I would like to thank my knowledgeable advisor Professor Samuel 
Palermo for believing in me and giving me the opportunity to learn. Thank you for all the 
time you spend guiding me throughout my research, putting up with all my shortcomings, 
teaching me how to organize a research, and exposing me to collaborative research.  
I would like to thank all my friends and colleagues in Professor Palermo’s group. 
Osama, Noah, Ehsan, Ayman, Kunzhi, Binaho, Shengchang, Shiva, Takayuki, Yang-
Hang, Ankur, and Shashank, thank you for all your helps, for all the things I learned from 
you, and for all your kind feedbacks and support, during these years. 
I would like to thank my friends and colleagues in HP labs, Cheng Li, Di Liang and 
Chong Zhang for their contribution in my research and many things that I have learned 
from them. I would like to thank my friends and colleagues in Texas Instruments, Arlo 
Aude, Soumya Chanramouli, Steven Finn, Lee Sledjeski, John Hamilton,Patrick Crinion, 
T. K. Chin, Waqus Haque, Khalid Jakoush and Amit Rane for supporting my research, 
many things I have learned from them, their kindness, and all their helps. 
I would like to thank Professor Sanchez-Sinencio, Professor Silva-Martinez, Professor 
Hoyos, and Professor Entesari in Analog and Mixed-Signal Center in Texas A&M 
University for their teachings. 
 Special thanks to Professor Hoyos, Professor Narayanan, and Professor Zoghi for 
serving in my committee and their professional and kind feedback.  
 vi 
 
I would like to thank NSF, Semiconductor Research Lab, HP labs and Texas Instruments 
for supporting my research. 
I would like to thank all my friends who made my time in College Station enjoyable and 
fun. Thank you Paria, Mohammad, Payman, Pooria, Atrian, Farzad, Babak, Omid, 
Arghavan, Pedram, Tamara, Milad, Ana, and Sajad. Without you, life in college would 
have been much harder.   
I would like to thank my friends from the other side of the world whose endless kindness 
and support, made my life far away from home, Iran, tolerable and less stressful. Thank 
you Mostafa, Alireza, Ali, Morteza, Mohammad, Hamid, Reza, and Roozbeh for 
reminding me friendship is stronger than any border and miles of distance cannot break it. 
Thanks for cheering me up when I was down with your call and texts. I am sorry if I 
couldn’t return the favor. 
Lastly I would like to thank my sister and the only one, Kiana. Thanks for cheering me 
up by visiting me anytime you could, making fun of me and letting me make fun of you. 
Thank you mom and dad for inspiring me, believing in me, supporting me and pushing 
me forward. Your love and support was the only force pushing me forward through all 
difficulties. Thanks for giving me everything I could have asked for.  
  
 vii 
 
CONTRIBUTORS AND FUNDING SOURCES 
 
Contributors 
This work was supported by a dissertation committee consisting of Professor Palermo 
(advisor), Professor Hoyos, and Professor Narayanan of Department of Electrical and 
Computer Engineering and Professor Zoghi of Department of Engineering Technology 
and Industrial Distribution. 
All other work conducted for the dissertation was completed by the student 
independently. 
Funding Sources 
This work was supported in part by the Semiconductor Research Corporation under 
Grant 1836.143 through the Texas Analog Center of Excellence and in part by National 
Science Foundation under grant EECS-1202509. 
 
  
 viii 
 
TABLE OF CONTENTS 
 
              Page 
ABSTRACT ..............................................................................................................  ii 
DEDICATION ..........................................................................................................  iv 
ACKNOWLEDGEMENTS ......................................................................................  v 
CONTRIBUTORS AND FUNDING SOURCES .....................................................  vii 
TABLE OF CONTENTS ..........................................................................................  viii 
LIST OF FIGURES ...................................................................................................  x 
LIST OF TABLES ....................................................................................................  xiv 
1. INTRODUCTION ...............................................................................................  1 
2. BACKGROUND ON MIXED SIGNAL TRANSCEIVERS .............................  3 
  2.1 Introduction ..........................................................................................  3 
  2.2 Transceiver Architectures ....................................................................  3 
  2.3 Transmitter Circuits ..............................................................................  12 
   2.3.1   Serializer ....................................................................................  12 
   2.3.2   Output Driver ............................................................................  16 
  2.4 Receiver Circuits ..................................................................................  27 
   2.4.1   CTLE .........................................................................................  27 
   2.4.2   Sampler ......................................................................................  29 
  2.5 Conclusion ............................................................................................  33 
 
3. DUAL-MODE 16/32 GB/S NRZ/PAM4 TRANSMITTER ...............................  34 
  3.1 Introduction ..........................................................................................  34 
  3.2 System Architecture .............................................................................  37 
  3.3 Transmitter Architecture ......................................................................  41 
   3.3.1  4-to-1 Serializer ..........................................................................  42 
   3.3.2  Pseudo-Analog-Controlled Output Driver .................................  44 
  3.4 Experimental Results ............................................................................  49 
  3.5 Conclusion ............................................................................................  57 
 ix 
 
4. PAM4 56 GB/S RECEIVER WITH THRESHOLD AND DFE               
ADAPTATION ...................................................................................................  58 
 
  4.1 Introduction ..........................................................................................  58 
  4.2    System Analysis ...................................................................................  61 
  4.3    Receiver Architecture ...........................................................................  63 
  4.4    Threshold and DFE Tap Adaptation ....................................................  72 
  4.5    Experimental Results ............................................................................  76 
  4.6    Conclusion ............................................................................................  84 
 
5. CONCLUSION ...................................................................................................  85 
REFERENCES ..........................................................................................................  87 
 
 
 x 
 
LIST OF FIGURES 
FIGURE                                                                                                                        Page 
 
2.1 Insertion loss of a sample back-plane wireline channel. .............................  4 
2.2  Channel pulse response. .............................................................................  5 
2.3 Transmitter FFE equalization. ....................................................................  6 
2.4 Receiver FFE equalization. .........................................................................  7 
2.5 Receiver DFE equalization. ........................................................................  8 
2.6 Eye diagrams with (a) NRZ and (b) PAM4 data. .......................................  11 
2.7 A CMOS 2:1 T-gate based serializer. .........................................................  13 
2.8 A tri-state based 2:1 serializer.....................................................................  14 
2.9 A CMOS 4:1 T-gate based serializer... .......................................................  15 
2.10 A current mode 2:1 serializer. .....................................................................  16 
2.11 A current mode output driver. .....................................................................  17 
2.12 A single-ended terminated current mode driver. ........................................  18 
2.13 A differentially terminated current mode driver. ........................................  19 
2.14 FFE implementation in a current mode driver. ...........................................  20 
2.15 A high swing voltage mode driver. .............................................................  21 
2.16 A low swing voltage mode driver. ..............................................................  22 
2.17 A single-ended terminated voltage mode driver. ........................................  23 
2.18 A differentially terminated voltage mode driver. .......................................  24 
2.19 A segmented voltage mode driver. .............................................................  26 
2.20 A passive CTLE block diagram. .................................................................  27 
 xi 
 
2.21 An active CTLE block diagram. .................................................................  28 
2.22 An active CTLE with shunt peaking. ..........................................................  29 
2.23 A single stage dynamic amplifier sampler. .................................................  30 
2.24 A strong-arm sampler. ................................................................................  31 
2.25 Two-stage double tail sampler block diagram. ...........................................  31 
2.26 A two-stage dynamic amplifier with regeneration......................................  32 
2.27 A current mode sampler block diagram. .....................................................  33 
3.1 Conceptual dual-mode NRZ/PAM4 transceiver architecture with TX FFE     
and RX DFE equalizers. .............................................................................  36 
 
3.2 Refined electrical channel (a) S21 response, (b) 16GS/s pulse response,          
(c) 32Gb/s PAM4 timing margin with 2-tap pre-cursor TX FFE and          
various RX DFE feedback filter configurations, (d) and 16GS/s pulse     
response with 2-tap TX FFE and RX DFE with 1-FIR and 2-IIR feedback   
taps. .............................................................................................................  39 
 
3.3 Dual-mode NRZ/PAM4 transceiver architecture. ......................................  40 
3.4 NRZ/PAM4 transmitter with lookup table-based FFE equalizer and         
pseudo-analog impedance control...............................................................  42 
 
3.5 Dynamic tri-state inverter-based 4-to-1 serializer: (a) schematic, (b)          
timing diagram (PMOS path), and (c) simulated performance comparison    
with a conventional pass-gate design. .........................................................  43 
 
3.6 (a) Conventional SST output driver segment. (b) Proposed output driver 
segment with pseudo-analog impedance control. (c) Simulated output 
impedance vs. process corners. ...................................................................  45 
 
3.7 Impedance control loop: (a) different operation modes, (b) NMOS control  
OTA output voltage VON in different modes, and (c) FSM flow chart. ....  47 
 
3.8 Monte Carlo simulations of the output driver S11 for different process     
corners with ±3σ error bars for mismatch at a given corner included. .......  48 
 
3.9 Chip micrograph of (a) transmitter and (b) receiver. ..................................  49 
 xii 
 
3.10 Measured transmitter output impedance versus differential output voltage       
for (a) positive output pin and (b) negative output pin. ..............................  50 
 
3.11 Level separation mismatch ratio (RLM) measurement results for (a)       
nominal PAM4 level settings and (b) optimized PAM4 level settings. ......  50 
 
3.12 Dual-mode NRZ/PAM4 transceiver test setup. ..........................................  52 
3.13 32Gb/s PAM4 eye diagrams over channel 1: (a) without TX equalization,       
(b) with optimal 2-tap TX-only FFE settings, and (c) with the 2-tap TX        
FFE settings co-optimized with the RX DFE to yield maximum timing   
margin. 16Gb/s NRZ eye diagrams over channel 2: (d) without TX 
equalization, (e) with optimal 4-tap TX-only FFE settings, and (f) with           
the 4-tap TX FFE settings co-optimized with the RX DFE to yield       
maximum timing margin.............................................................................  54 
 
3.14 Transceiver equalizer settings and bathtub curves for (a) channel 1 at               
32 Gb/s PAM4 and (b) channel 2 at 16 Gb/s NRZ. ....................................  55 
 
3.15 32 Gb/s power breakdown of (a) transmitter and (b) receiver. ...................  57 
4.1 Conceptual block diagram of a PAM4 transceiver with transmitter and    
receiver side equalization, equalizer and threshold adaptation, and clock 
recovery circuit. ………….. .......................................................................  61 
 
4.2 Refined electrical channel. (a) S21 response. (b) 28GS/s pulse responses      
with various equalizer configurations, (c) 56 Gb/s PAM4 voltage margin,      
and (d) 56 Gb/s PAM4 timing margin with 2-tap pre-cursor TX FFE and 
various RX equalizer configurations. .........................................................  63 
 
4.3 56Gb/s PAM4 receiver with threshold and DFE tap adaptation.................  64 
4.4 Equalizer data-path. ....................................................................................  66 
4.5 Single stage CTLE (a) block diagram and frequency response with         
different (b) capacitor DAC settings, and (c) resistor DAC settings. .........  67 
 
4.6 (a) CML buffer with DFE FIR-tap and threshold control. (b) Simulated 
normalized FIR-tap offset weight versus differential input amplitude. ......  69 
 
4.7 (a) Simulated comparator offset versus threshold DAC code, (b) simulated   
FIR weight vs FIR DAC code. ....................................................................  69 
 
4.8 Block diagram of IIR MUX, filter and summer. ........................................  70 
 xiii 
 
4.9 IIR time-constant versus resistor and capacitor DAC settings. ..................  71 
4.10 PAM4 PLL-based CDR. .............................................................................  72 
4.11 Background sampler threshold adaptation algorithm. ................................  74 
4.12 PAM4 DFE FIR and IIR-tap adaptation logic tables. .................................  76 
4.13 Chip micrograph of 56Gb/s PAM4 receiver. ..............................................  77 
4.14 High speed PAM4 receiver test setup. ........................................................  78 
4.15 (a) 56 Gb/s eye-diagram before channel 2 without equalization and (b)         
after channel 2 with 2-tap pre-cursor FFE. .................................................  79 
 
4.16 Measured DFE tap adaptation working over (a) channel 1 and (b)            
channel 2, and measured sampler threshold adaptation working over (c)  
channel 1 and (d) channel 2. Note, edge sampler values are omitted and       
only error sampler#1 is shown for clarity.  .................................................  80 
 
4.17 Measured 56Gb/s receiver timing bathtub curves working over (a)          
channel 1, and (b) channel 2, and receiver voltage bathtub curves working   
over (c) channel 1, and (d) channel 2. .........................................................  81 
 
4.18 Measured PAM4 jitter tolerance working over channel 2. .........................  82 
4.19 56Gb/s power breakdown of the receiver. ..................................................  83 
 xiv 
 
LIST OF TABLES 
TABLE                                                                                                                          Page 
 
3.1 Transceiver Performance Summary ............................................................  56 
4.1 Performance Summary................................................................................  84 
 1 
 
 
1. INTRODUCTION 
 
While most of household and personal devices are moving towards using wireless 
networks and transceivers [1-12], optical interconnects [13-20] and wireline transceivers 
[21-25] provide higher data-rate, lower latency and more power efficient solution for 
many applications. Data centers networks is the most prominent example of such 
applications where growing zettabyte range traffic of them requires ultra-high speed, low 
power transceivers. 
New standards and application are emerging every year for wireline applications. The 
industry demand is always towards higher data-rate communications. Due to limited 
number of input-output (IO) pins on commercial packages and density constraints, high-
speed links, often serialize the data on the transmitter side before sending it on channel. 
The data will be de-serialized back on the receiver side [21]. 
The power efficiency and circuit bandwidth are benefiting from advances in CMOS 
process. However, wireline communication channels’ bandwidth have not followed the 
same trend. Thus, channel impairments such as dielectric loss, skin effect, reflections and 
cross-talk have become more prominent affecting quality of communication. This results 
in bit errors in communication if no measure is taken.  
Equalization is employed on both receiver and transmitter side to compensate for the 
channel impairments. Transceivers can be implemented utilizing fully digital equalizers 
with digital-to-analog (DAC) based transmitters [22] and analog-to-digital (ADC) based 
receivers [23-25]. This allows for easy implementation of complex equalizers in digital 
domain which are prone to process variations can compensate for significant channel loss. 
 2 
 
 
However, DAC based and ADC based designs along with digital signal processors (DPS), 
required to implement equalization, can be very power hungry. On the contrary, mixed 
signal transceivers can provide a more power efficient solution for low to medium loss 
channels.  
This research targets the design of efficient mixed signal transmitter and receivers 
operating at >32Gb/s data-rates with focus on four-level pulse amplitude modulation 
(PAM4). While the design are done for specific data-rates using a certain process node, 
the proposed techniques can be extended to higher data-rates. Most of the proposed 
techniques can benefit from CMOS scaling as well.  
This dissertation is organized as follows. Section 2 discusses challenges associated with 
system level and circuit level high speed mixed-signal transceiver design. The system 
level considerations and trade-offs have been discussed in this section. The rest of the 
section, explains design topologies and trade-offs for some critical transmitter and receiver 
circuit blocks. A reconfigurable dual-mode 16/32 Gb/s NRZ/PAM4 transmitter design is 
detailed in section 3. A 56 Gb/s PAM4 receiver design details has been discussed in 
section 4. Details of clock and recovery circuit (CDR) is discussed in this section. PAM4 
threshold and DFE adaptations are implemented and discussed in details in this section. 
Finally, section 5 concludes this dissertation. 
 
 
 
 
 3 
 
 
2. BACKGROUND ON MIXED SIGNAL TRANSCEIVERS* 
 
 
2.1 Introduction 
This section briefly explains challenges and trade-offs in wireline transceiver design. 
Section 2.1 discusses the system level design challenges and solutions. Section 2.3 focuses 
on trade-offs and circuit level solutions for critical blocks in transmitter design which 
includes the final serializer and the output driver. Section 2.4 compares different circuit 
implementation for critical receiver circuits including continuous time linear equalizers 
(CTLE) and samplers. The main target of this section is to provide the reader with 
background information about mixed signal transceiver design, required for understanding 
the remainder of this dissertation.  
 
2.2 Transceiver Architectures 
The main target of wireline transceiver design is achieve bit error free transmission over 
a channel with limited bandwidth, such as the one shown in Fig. 2.1, while consuming as 
low power as possible. The most common modulation used for wireline applications is 
non-return-to-zero (NRZ) modulation. This is achieved by sending a positive voltage at 
the output for a one symbol and a negative voltage for a zero, sending on bit at a time.  
 
 
*© 2018 IEEE. Part of section 2.2 is reprinted, with permission, from A. Roshan-
Zamir, O. Elhadidy, H. W. Yang and S. Palermo, "A Reconfigurable 16/32 Gb/s 
Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS," IEEE Journal of Solid-State 
Circuits, vol. 52, no. 9, pp. 2430-2447, Sept. 2017. 
 4 
 
 
 
Figure 2.1: Insertion loss of a sample back-plane wireline channel. 
 
The Skin effect, dielectric loss and reflection causes by impedance discontinuities in the 
channel, results in dispersion of data while traveling through the channel. In order to 
characterize this, pulse response of the channel is often used as depicted in Fig. 2.2. The 
one bit period (Tb) long input pulse will spread over multiple bit periods. This results in 
reduced peak at the output of channel and transmitted single bit at the input of channel 
will affect multiple of bits at the output of the channel. This will cause inter-symbol-
interference (ISI) at the output of channel, resulting in detection errors at the receiver side. 
 
 5 
 
 
a
0
Tb to
c(t)
y(t)
Channel
t
to-Tb to+Tb to+2Tb ...
Figure 2.2: Channel pulse response. 
 
To overcome ISI in wireline systems, equalization is used at transmitter and receiver 
sides. On transmitter side feed-forward equalization (FFE) is often used to cancel out the 
post-cursor and pre-cursor ISI [26-29]. Fig. 2.3 shows the block diagram of such equalizer. 
FFE equalizer cancels the channel distortion by pre-distorting the signal before the 
channel. As the input data is a digital signal, digital delay units (flip-flops) can be engaged 
to generate the taps. Due to supply voltage limitations, FFE equalization is done by 
attenuating the main cursor. This limits effectiveness of FFE equalizers when number of 
taps increase.  
 
 6 
 
 
Z
-1
Z
-1
S
u
m
m
e
r
Channel
a-1
Z
-1
a0
a1
In
Out
 
Figure 2.3: Transmitter FFE equalization. 
 
FFE equalization can also be achieved at receiver side [30-33] as depicted in Fig. 2.4. 
As the signal is already attenuated by the channel, the dynamic range would be enough to 
implement high pass filtering by amplifying high frequency contents rather that 
attenuating low frequency contents. However, implementing FFE equalization at receiver 
side can be quite challenging as delay elements should be implemented in an analog 
manner rather than simple digital delays.  
 
 7 
 
 
Z
-1
Z
-1 S
u
m
m
e
r
Channel
Z
-1
In
a-1
a0
a1
Out
 
Figure 2.4: Receiver FFE equalization. 
 
To implement receiver side equalization using only digital delay elements, the decision 
feed-back equalizer (DFE) is engaged [34-37]. Fig. 2.5 shows block diagram of such 
equalizer. DFE equalizers can efficiently cancel post-cursor ISI without noise and 
crosstalk amplification. However, they cannot cancel pre-cursor ISI and the loop should 
settle in 1 unit interval (UI), which can be very challenging at high data-rates. Loop-
unrolling can be utilized to relax critical timing of a DFE [38], [39]. This will increases 
the number of samplers by a factor of 2 if only the first tap in unrolled. Two or more 
number of taps can be unrolled which will cause exponential increase in number of slicers 
and can be prohibitive in terms of power and area.  
 8 
 
 

Z
-1
d0
d1
Channel
 
Figure 2.5: Receiver DFE equalization. 
 
Each DFE finite impulse response (FIR) tap could cancels only one post-cursor ISI. 
Thus, multiple FIR taps are required to compensate for high loss channels with multiple 
significant post-cursor ISI terms. Alternatively, infinite impulse response (IIR) taps can 
be utilized to cancel multiple ISI terms [40], [41]. A continuous time linear equalizer 
(CTLE) can also be utilized at receiver side to compensate for the long-tail ISI. CTLE 
implementations are discussed in more details in section 2.4. 
Another method to reduce the effect of low frequency channel response is to use more 
advance modulation. four-level pulse amplitude modulation (PAM4)  signaling is the most 
popular alternative for conventional NRZ signaling which has been proposed for very high 
data-rates in new standards [42], [43]. It allows 2 bits per symbol transmission reducing 
the system bandwidth by a factor of 2. However, multiple challenges are associated with 
PAM4 signaling. PAM4 transceivers require more stringent circuit linearity, equalizers 
which can implement multi-level inter-symbol interference (ISI) cancellation, and 
 9 
 
 
improved sensitivity. While PAM4 modulation allows for a longer unit interval (UI) time, 
the reduced voltage margins necessitate increased comparator sensitivity. Moreover, 
PAM4 modulation is more sensitive to residual ISI. Thus, multiple taps of equalizer is 
required to minimize the residual ISI. 
 While pessimistic from a BER perspective, peak distortion analysis [44] provides a 
rapid approach to find the worst-case eye opening and is utilized to highlight the 
differences in ISI sensitivity between NRZ and PAM4 modulation at the same symbol 
rate. Fig. 2.2 shows a conceptual pulse response y(t) produced by sending an ideal pulse 
c(t) with duration Tb across a channel. This pulse response has the cursor value at t = t0 
and ISI terms at Tb offsets before and after this cursor instant. 
First consider the NRZ modulation case, where there are two symbols, y1 = y(t) and y0 
= -y(t). Assuming linearity, the worst case high and low levels, v1 and v0 respectively, of 
the eye diagram at the sampling point, to, are calculated by 
𝑣1 = 𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
 
 
𝑣0 = −𝑦(𝑡𝑜) + ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|.
∞
𝑖=−∞
𝑖≠0
 (2.1) 
Thus, the NRZ PDA eye height, shown in Fig. 24(a), is 
𝐴𝑁𝑅𝑍 = 𝑣1 − 𝑣0 = 2(𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|).
∞
𝑖=−∞
𝑖≠0
 (2.2) 
Note that the ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|
∞
𝑖=−∞
𝑖≠0
 term equals the sum of the absolute value of all post- 
and pre-cursor ISI values determined from the pulse response. This represents the 
maximum amount of ISI that can be added or subtracted from a symbol with the worst-
 10 
 
 
case symbol sequence. In the common case where all ISI values are positive, a lone-pulse 
sequence of a single 1 preceded and followed by all 0s is the worst-case pattern that sets 
the minimum high level. 
Now consider the PAM4 case, where there are four symbols, y11 = y(t), y10 = 1/3 y(t), 
y01 = -1/3 y(t) and y00 = -y(t). As shown in Fig. 2.6(b), assuming linearity this results in 
three eyes that are bounded by six levels which can be calculated by  
𝑣11 = 𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
  
𝑣10ℎ =
1
3
𝑦(𝑡𝑜) + ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
  
𝑣10𝑙 =
1
3
𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
  
𝑣01ℎ = −
1
3
𝑦(𝑡𝑜) + ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
  
 
𝑣01𝑙 = −
1
3
𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|,
∞
𝑖=−∞
𝑖≠0
  
𝑣00 = −𝑦(𝑡𝑜) + ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|.
∞
𝑖=−∞
𝑖≠0
 (2.3) 
 11 
 
 
a
-a
0v
1v
a
-a
a/3
-a/3
00v
11v
hv10
lv10
hv01
lv01
(a) (b)
ANRZ APAM4
 
Figure. 2.6: Eye diagrams with (a) NRZ and (b) PAM4 data. 
 
Thus, the PAM4 PDA eye heights are 
𝐴𝑃𝐴𝑀4 = 𝑣11 −  𝑣10ℎ = 𝑣10𝑙 − 𝑣10ℎ = 𝑣10𝑙 − 𝑣00
= 2(
1
3
𝑦(𝑡𝑜) − ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|).
∞
𝑖=−∞
𝑖≠0
 
(2.4) 
Note that although the ideal voltage margin with PAM4 modulation is 1/3 the ideal 
voltage margin with NRZ modulation, the PAM4 symbols suffer from the same amount 
of ISI ∑ |𝑦(𝑡𝑜 − 𝑖𝑇𝑏)|.
∞
𝑖=−∞
𝑖≠0
 While for the same data rate a PAM4 pulse response will often 
be much better than its NRZ counterpart for typical wireline channels, it is worth noting 
that this heightened PAM4 ISI sensitivity necessitates an increased level of ISI 
cancellation. This confirms, while PAM4 has better spectral efficiency relative to NRZ 
signaling, this doesn’t make it the superior modulation option for all systems. The optimal 
modulation is a function of the target data rate, channel loss profile. Power and circuit 
constraints should also be considered while choosing optimal modulation. 
Overall, in order to design a mixed signal transceiver for a certain application, factors 
including, standard requirements, channel profile, circuit constrains, power efficiency 
 12 
 
 
should be taken into account. This will lead to selection of appropriate modulation and 
type and complexity of equalization at transmitter and receiver side. 
Statistical analysis is a convenient method to plan the architecture, based on system 
requirements. Statistical analysis along with an estimation of how circuit level complexity 
can translates into power consumption can provides us a metric to make architectural 
decisions. The system design procedure described in system analysis subsection of section 
3 and 4 are based on this approach. 
 
2.3 Transmitter Circuits 
In this section some key circuit blocks of a transmitter will be discussed. Different 
implementations of these key blocks are investigated and trade-offs will be discussed to 
for choosing the optimal structure for a certain architecture. 
 
2.3.1 Serializer 
The final serializer is one of the most critical blocks in a transmitter, as it must maintain 
enough bandwidth to support the full-rate output.  Fig. 2.7 shows a CMOS 2:1 
transmission gate (T-gate) based serializer [45]. The CMOS serializer suffers from 
stacking at the output full-rate node which reduces the driving power. Both NMOS and 
PMOS transistors are required to transfer zeros and ones. Thus both switches are turned 
on at the same time. However, essentially, only one of them are required to turn on 
depending on the polarity of the input data which translates into extra switching power 
consumption.   
 13 
 
 
D0
CLKB
CLK
D1
CLK
CLKB
 
Figure. 2.7: A CMOS 2:1 T-gate based serializer. 
 
A tri-state inverter based 2:1 serializer is proposed in [46] to overcome stacking at the 
output node and reduce the switching power as illustrated in Fig. 2.8 
 14 
 
 
D1
Out
D0
D0
D0
CLK
CLKB
 
Figure. 2.8: A tri-state based 2:1 serializer. 
 
The serializer requires half-rate differential clock. Using 2:1 serializer requires half-rate 
flip-flops to retime the data and half-rate clock distribution. This could be very challenging 
and power hungry when trying to transmit very high data-rates. 
In many high data-rate applications, instead, a quarter-rate architecture is used where 
the final stage of serialization involves a 4:1 serializer. This allows engaging only quarter-
rate flip-flops for retiming and quarter-rate clock distribution which can reduce power 
consumption of the system. Fig. 2.9 shows a 4:1 CMOS T-gate based serializer. 
 15 
 
 
90
o
180
o
270
o
0
o
D0
CLK0
CLK270
CLK180
CLK90
Out
 
Figure. 2.9: A CMOS 4:1 T-gate based serializer. 
 
In addition to stacking at the output full-rate node, the serializer suffers from significant 
self-loading caused by 4 parallel branches connected to the output node. 
Current mode serializers can also be used instead of CMOS serializers [47] as depicted 
in Fig. 2.10. This will reduce self-loading of the serializer and achieve higher bandwidth. 
The disadvantage is that current mode serializers consume static current which is 
independent of the data-rate. 
 16 
 
 
D1
CLKB
I
CLK
RR
D0
Out
 
Figure. 2.10: A current mode 2:1 serializer. 
 
2.3.2 Output Driver 
Output driver is the last stage of a transmitter, connecting the transmitter to the channel. 
Since the output driver should provide the current to drive the channel, termination load, 
and pads capacitive parasitic, it could be a power hungry block especially in very low 
power applications. 
Fig. 2.11 shows schematic of a current mode output driver [48]. Here Rc resistance, the 
parallel termination, is set to match the channel impedance to minimize reflections. In 
order to ensure good channel matching in presence of fabrication tolerances, a resistor 
DAC is often used to tune the output termination. It should be noted as a passive resistor 
is used as the termination, the output resistance is relatively independent of the output 
 17 
 
 
voltage. Having a fixed matched resistance in the drain, the output swing of the driver is 
set by the tail current. The output swing of the current mode drivers can be quite large as 
each side can go all the way up to supply voltage and the limitation on low minimum 
voltage is set by compliance voltage of tail current plus drain-source voltage of the 
differential pair transistors in triode region which can be very low. This allows for 
maximum peak to peak differential output swing of slightly less than twice the supply 
voltage. 
 
Data
I
RR
 
Figure. 2.11: A current mode output driver. 
 18 
 
 
I
R=Z0 R=Z0
R=Z0 R=Z0
Z0
Z0
Vd
 
Figure. 2.12: A single-ended terminated current mode driver. 
 
Fig. 2.12 shows a current mode driver while connected to the channel with single-ended 
termination on the receiver side. Writing the voltage and current equations we have: 
𝑉𝑑,1 = (
𝐼
2
) 𝑅 (2.5) 
𝑉𝑑,0 = − (
𝐼
2
) 𝑅 (2.6) 
𝑉𝑑,𝑝𝑝 = 𝐼𝑅 (2.7) 
𝐼 = 
𝑉𝑑,𝑝𝑝
𝑅
 (2.8) 
Where Vd,1 is the differential voltage at the receiver side when transmitting a one, Vd,0 
is the same when transmitting a zero, Vd,pp is the peak-to-peak differential received signal 
amplitude and I is the driver tail current.   
 
 19 
 
 
I
R=Z0 R=Z0
R=2Z0
Z0
Z0
Vd
 
Figure. 2.13: A differentially terminated current mode driver. 
 
Fig. 2.13 shows the same driver when terminated differentially on the receiver side after 
the channel.  Writing voltage and current equations for we have: 
𝑉𝑑,1 = (
𝐼
4
) (2𝑅) (2.9) 
𝑉𝑑,0 = − (
𝐼
4
) (2𝑅) (2.10) 
𝑉𝑑,𝑝𝑝 = 𝐼𝑅 (2.11) 
𝐼 = 
𝑉𝑑,𝑝𝑝
𝑅
 (2.12) 
Implementing feed-forward equalization (FFE) can be easily achieved in current mode 
drivers by adding extra parallel differential pairs connected to the output [49] as illustrated 
in Fig. 2.13. It should be noted that the middle level generation associated with FFE does 
not affect the driver’s termination matching. Another good characteristic of current mode 
 20 
 
 
drivers is that the current they drain from supply voltage is independent of the current 
transmitted symbol and previous symbols, in case of FFE equalization.  
 
D-1
I-1
RR
D0
I0
D1
I1
 
Fig. 2.14: FFE implementation in a current mode driver. 
 
Figure. 2.14 shows schematic of a voltage mode driver. Here, a series combination of 
Rc passive resistor and triode resistance of the switch provides a source series 
termination (SST) and matching to the channel.  
 21 
 
 
VDD
R
R
MN
MP
Data
 
Figure. 2.15: A high swing voltage mode driver. 
 
In simple voltage mode SST drivers, the supply voltage of the driver sets the output 
swing. The differential peak to peak output swing of the driver is equal to supply voltage 
in a voltage mode driver. The minimum supply voltage of the driver in Fig. 2.15 is equal 
to: 
𝑉𝑠 =  |𝑉𝑇𝐻𝑃| + 𝑉𝑂𝐷𝑃 (2.13) 
To ensure PMOS transistors will turn on while their gate is pulled down to ground. As 
this minimum voltage can be quite high, for very low power applications, this type of 
driver is often referred to as a high swing voltage mode driver [50], [51]. 
  
 22 
 
 
An alternative to using NMOS and PMOS transistors as switches, is to use an all NMOS 
structure [52] as shown in Fig. 2.16.  
 
VS
Data
MH
ML
 
Figure. 2.16: A low swing voltage mode driver. 
 
Here as both top and bottom switches are NMOS transistors, the top transistors can turn 
on even with very low supply voltages. However, engaging an all NMOS structure 
enforcers a maximum supply voltage to ensure triode operation of top transistors. It can 
be shown that the maximum supply voltage is equal to: 
𝑉𝑠 =
4
3
(𝑉𝐷𝐷 − 𝑉𝑇𝐻𝐻 − 𝑉𝑂𝐷𝐻) (2.14) 
 23 
 
 
When differential termination is engaged on the receiver side, while this will increase 
to: 
𝑉𝑠 = 2(𝑉𝐷𝐷 − 𝑉𝑇𝐻𝐻 − 𝑉𝑂𝐷𝐻) (2.15) 
When single-ended termination is used at the receiver side. As this limits the maximum 
driver output swing, for high performance applications, this driver is often referred to as a 
low swing driver. 
 
R=Z0 Z0
Vd
VS
Z0
VS
R=Z0
R=Z0
R=Z0
 
Figure. 2.17: A single-ended terminated voltage mode driver. 
 
Fig. 2.17 shows a voltage mode driver when connected to a channel and terminated in a 
single-ended manner at the receiver side. Writing voltage and current equations for we 
have: 
𝑉𝑑,1 =
𝑉𝑆
2
 (2.16) 
 24 
 
 
𝑉𝑑,0 = −
𝑉𝑆
2
 (2.17) 
𝑉𝑑,𝑝𝑝 =  𝑉𝑆 (2.18) 
𝐼 =
𝑉𝑆
2𝑅
 (2.19) 
𝐼 =
𝑉𝑑,𝑝𝑝
2𝑅
 (2.20) 
 
 
 
Figure. 2.18: A differentially terminated voltage mode driver. 
 
Fig. 2.18 shows the same voltage mode driver when terminated differentially at the 
receiver side. Writing voltage and current equations for we have: 
 
R=Z0 Z0
Vd
VS
Z0
VS
R=Z0
R=2Z0
 25 
 
 
𝑉𝑑,1 =
𝑉𝑆
2
 (2.21) 
𝑉𝑑,0 = −
𝑉𝑆
2
 (2.22) 
𝑉𝑑,𝑝𝑝 =  𝑉𝑆 (2.23) 
𝐼 =
𝑉𝑆
4𝑅
 (2.24) 
𝐼 =
𝑉𝑑,𝑝𝑝
4𝑅
 (2.25) 
Compared to a current mode driver, for a similar voltage swing, the single-ended 
terminated voltage mode driver consumes half the current. This is while the differentially 
terminated voltage mode driver consumes a quarter current compared to the current mode 
counterpart for the similar output swing. 
FFE equalization is more challenging to implement in voltage mode drivers compared 
to current mode drivers due to the fact that voltage mode drivers are terminated in a series 
manner. The driver should be modified to generate the extra levels associated with FFE 
equalization while maintaining the channel matching. One popular approach to implement 
FFE equalizer is to divide the output driver to a multi-segment driver [50]. Segments will 
provide and overall termination to match the channel. However, each segment can be 
connected to the current or one of the previous symbols (Fig. 2.19). Here, the middle levels 
will be generated by shunting the current from positive output port to ground and from 
supply voltage to the negative output port. The tap weights will be set by the number of 
segments assigned to each tap. Redundant segments are often employed for impedance 
matching which can be enabled or disabled.   
 26 
 
 
It should be noted that implementing segmentation at the output driver requires complex 
logic in the pre-driver which results in increased power consumption of the pre-driver. 
The shunt current required when shunting current for middle level voltages will cause 
variation in current from supply that will cause fluctuation on the supply voltage. 
 
 
Figure. 2.19: A segmented voltage mode driver. 
 
The tap selection and enabling and disabling of the segments is often implemented by 
tap select MUXes preceding the SST segments. This allows for flexible tap weight 
assignment and convenient impedance matching. However, similar to a T-gate serializers, 
tap select MUXes causes stacking in the full-rate path which reduces the system 
bandwidth and increase the power consumption to achieve similar driving strength.  
Rterm
MP
Rterm
MN
VDD
Out
TapX
TapY
Tap Select
Enable
 27 
 
 
2.4 Receiver Circuits 
In this section some key circuit blocks of a receiver will be discussed. Different 
implementations of these key blocks are investigated and trade-offs will be discussed to 
for choosing the optimal structure for a certain architecture. 
2.4.1 CTLE 
CTLE is a linear equalizer often used in the receiver frontend as the first stage of 
equalization. Their purpose is create a high pass filter to compensate the low pass profile 
of the channel, creating close to flat frequency response. CTLEs can be in a passive 
manner as depicted in Fig. 2.20.  
 
Channel
In Out
 
Figure. 2.20: A passive CTLE block diagram. 
 
Passive CTLEs are very linear and easy to implement as they only require passive 
elements. But they cannot provide any gain at Nyquist. Resistor and capacitor DACs are 
used to tune the low frequency gain and high frequency peaking to match the target 
channel profile. 
 28 
 
 
Active CTLEs [53] are more popular as they can provide gain ay Nyquist as illustrated 
in Fig. 2.21. This, however, limits the linear range of the equalizer. Here Rd along with 
load capacitance sets the bandwidth. The low frequency gain (and input linear range) is 
set by Rs resistor DAC while the high frequency peaking can be controlled by Cs capacitor 
DAC. 
 
In
Out
 
Figure. 2.21: An active CTLE block diagram. 
 
As maintaining bandwidth while providing sufficient gain might be challenging when 
dealing with very high data-rates, shunt peaking is often employed in active CTLEs to 
increase the equalizer bandwidth [37]. However, due to usage of inductors this 
implementation consumes significant area. As channel losses might exceed the viable 
peaking of a single stage CTLE and the channel loss profile often doesn’t match simple 
R-C type profile, multi-stages of CTLE is often engaged to compensate for medium to 
 29 
 
 
high loss channels. These CTLEs occupies significant chip areas and can consume 
significant amount of power. 
 
In
Out
 
Figure. 2.22: An active CTLE with shunt peaking. 
 
2.4.2 Sampler 
Sampler is one of the most critical blocks in the receiver design. Their decision time 
limits the maximum data-rate that the receiver can operate at. Their gain and noise 
performance is a significant contributor is overall system sensitivity and the maximum 
channel loss the receiver can handle without bit errors. 
The single stage dynamic amplifier of Fig. 2.22 can be used as a sampler [54]. While 
the single stage implementation can provide a high bandwidth, it suffers from low gain. 
 30 
 
 
This will result in poor sensitivity. The output swing will also be dependent on the input 
amplitude. 
 
Data
CLK
Out
CLK
 
Figure. 2.22: A single stage dynamic amplifier sampler. 
 
Strong-arm sampler of Fig. 2.23 is one of the most popular structures used in high speed 
receivers [55]. It can provide relatively high bandwidth while the regenerative NMOS and 
PMOS pairs can provide a high gain. They provide rail-to-rail output without consuming 
static power. Only single-ended clock is required for clocking the strong-arm sampler.  
 31 
 
 
Data
CLK
Out
CLK CLK
 
Figure. 2.23: A strong-arm sampler. 
 
Due to stacking in strong-arm samplers, the supply voltage scaling in advanced 
processes negatively affect their performance. The modified two-stage double tail 
implementation of [56] reduces stacking and illustrated in Fig. 2.24. However, 
complimentary clocks are required in first and second stage which can be sensitive to 
differential clock misalignment.  
Data
Clk
Vo+Vo-
VX
Clk
 
Figure. 2.24: Two-stage double tail sampler block diagram. 
 32 
 
 
 
An alternative implementation is presented in [57] and depicted in Fig. 2.25. It utilizes 
a two stage dynamic amplifier along with a regenerative NMOS and PMOS pair in parallel 
with the final stage to achieve high gain and rail-to-rail output swing. Similar to a strong-
arm sampler, this sampler requires only a single-ended clock. 
 
Clk
Data
Vo
VX
 
Fig. 2.25: A two-stage dynamic amplifier with regeneration. 
 
Samplers can also be implemented in a current mode manner [58] as depicted in Fig. 
2.26. Here, by setting the Rd resistance a compromise between gain and bandwidth can 
be achieved. It should be noted that the current mode sampler can potentially provide 
higher bandwidth than strong-arm and two-stage counterparts due to single stage 
implementation. However, it suffers from static power consumption which is independent 
of the data-rate. 
 
 33 
 
 
Data
CLKB
I
CLK
RR
Out
 
Figure. 2.26: A current mode sampler block diagram. 
 
2.5 Conclusion 
This section has summarized the key challenges associated with system and circuit 
design in wireline mixed-signal transceivers. Effect of frequency dependent channel loss 
on data transmission has been discussed. Equalization approaches at transmitter and 
receiver side are explained to overcome ISI caused by limited channel bandwidth. A short 
summary of critical circuit blocks of transmitter and receiver has been given. Different 
implementation approaches and trade-offs have been discussed. 
  
 34 
 
 
3. DUAL-MODE 16/32 GB/S NRZ/PAM4 TRANSMITTER* 
3.1 Introduction 
Improvements in high-speed serial I/O bandwidth density and energy efficiency are 
necessary to support the dramatic growth in global IP traffic, which is projected to reach 
2 zettabytes per year by 2019 [59]. While high-performance I/O circuitry can leverage 
technology improvements, unfortunately the bandwidth of the electrical channels used for 
inter-chip communication has not scaled in the same manner. This merits serious 
consideration of four-level pulse amplitude modulation (PAM4) which, relative to simple 
binary non-return-to-zero (NRZ) signaling, offers higher spectral efficiency, lower loss at 
the Nyquist frequency, and relaxed clock speeds. These advantages have led to 
implementation of PAM4 modulation in various high-speed I/O standards [42, 43]. In 
order to support PAM4 modulation, there has been recent developments in current-mode 
[22, 49, 60, 61], voltage-mode [62], and hybrid transmitters [63], and both analog-to-
digital converter (ADC)-based [61, 64, 65], and mixed-signal receivers [49, 60, 66]. 
Relative to NRZ-based systems, PAM4 transceivers require more stringent circuit 
linearity, equalizers which can implement multi-level inter-symbol interference (ISI) 
cancellation, and improved sensitivity.  
On the transmitter side, source-series-terminated (SST) voltage-mode drivers enable the 
high output swing required for PAM4 modulation with high linearity achieved up to 
 
 
*© 2018 IEEE. Part of this section is reprinted, with permission, from A. Roshan-
Zamir, O. Elhadidy, H. W. Yang and S. Palermo, "A Reconfigurable 16/32 Gb/s 
Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS," IEEE Journal of Solid-State 
Circuits, vol. 52, no. 9, pp. 2430-2447, Sept. 2017. 
 35 
 
 
differential output swings equal to the nominal output stage supply [39]. Further 
improvements in output swing are possible with advanced hybrid drivers employing 
current boosting [63]. Voltage mode drivers also offer reduced static power consumption 
relative to current-mode drivers. Although, at higher data-rates this static power advantage 
becomes a smaller percentage of the total transmitter power consumption. Key reasons for 
this include large clocking power and that these voltage-mode drivers often use output-
stage segmentation to achieve equalization setting and impedance control. The presence 
of equalization tap-select muxes that must pass the full-rate signal in the output segments 
[39] can introduce on-chip ISI and including digitally-controlled redundant segments for 
impedance control [50] results in increased output stage area and power. Another key 
transmitter bottleneck is the final serializer, where efforts have been made to minimize 
power consumption in both current-mode [67] and voltage-mode [68] implementations. 
Equalization is often also implemented at the receiver to support higher channel loss, 
with the most common blocks employed being a continuous-time linear equalizer (CTLE) 
and a decision feedback equalizer (DFE). Continuous time linear equalization is effective 
at cancelling both pre-cursor and long-tail ISI. However, CTLE amplifiers must be 
designed with sufficient bandwidth to support the full rate signal and linearity to support 
PAM4 modulation. Decision feedback equalization is often used due to the effectiveness 
of cancelling ISI without amplifying noise or crosstalk [69]. However, a key challenge 
associated with DFE architectures involves optimizing the critical feedback path to allow 
for ISI cancellation beginning at the first post-cursor. While PAM4 modulation allows for 
a longer unit interval (UI) time, the reduced voltage margins necessitate increased 
 36 
 
 
comparator gain to achieve a symbol decision in one UI. Another issue is that DFEs which 
employ common FIR feedback filters can require a large tap count (>10) to cancel long-
tail ISI [70]. An efficient solution for this is to employ IIR feedback filters which can 
cancel smooth exponentially decaying ISI with a minimal number of taps [41, 57, 68, 71], 
in a manner similar to a continuous time equalizer. Finally, a PAM4 DFE must implement 
the necessary hardware with the required linearity to support multi-level ISI subtraction. 
While PAM4 has better spectral efficiency relative to NRZ signaling, this doesn’t make 
it the superior modulation option for all systems. The optimal modulation is a function of 
the target data rate, channel loss profile, and process technology, with the majority of 
standards utilizing simple binary non-return-to-zero (NRZ) signaling. As serial I/O 
transceivers are often designed to support different channels and standards, this motivates 
dual-mode transceivers with flexible equalization (Fig. 3.1) to seamlessly support both 
NRZ and PAM4 modulation with minimal hardware and power overhead. 
 
  
Figure. 3.1: Conceptual dual-mode NRZ/PAM4 transceiver architecture with TX FFE and 
RX DFE equalizers. 
 
Channel
S
e
ri
a
li
z
e
r
D
T
X
[N
:0
]
D
e
s
e
ri
a
li
z
e
r
D
R
X
[N
:0
]
TX FFE 
Equalizer
Z
-1
Z
-1

Z
-1
RX DFE 
Equalizer
M
o
d
u
la
ti
o
n
 
s
e
le
c
ti
o
n
 
L
o
g
ic
N
R
Z
/P
A
M
4
 
D
A
C
a
-a
a/3
-a/3
-2a/3
PAM4 Modulation
0
2a/3
a
-a
0
NRZ Modulation
 37 
 
 
This section presents a quarter-rate 16/32Gb/s dual-mode NRZ/PAM4 SerDes datapath 
which can be configured to work in both modes with minimal hardware overhead [52]. 
Section 3.2 investigates the equalization requirements of the proposed transceiver with 
statistical bit error rate (BER) modeling results of transmit-side FFE used with receive-
side DFE structures with either FIR or IIR feedback taps. The high-swing voltage-mode 
SST transmitter which utilizes an efficient tri-state inverter-based mux with dynamic pre-
driver gates, a lookup table (LUT) controlled 31-segment output DAC to implement FFE 
without any full-rate tap-select muxes, and low-overhead analog impedance control is 
detailed in Section 3.3. Section 3.4 discusses the receiver that saves power with a quarter-
rate DFE that directly samples the input from the termination and achieves efficient 
equalization with 1-FIR tap for the large first post-cursor ISI and 2-IIR taps for long-tail 
ISI cancellation [39]. Experimental results from a general purpose (GP) 65nm CMOS 
prototype are presented in Section 3.5. Finally, Section 3.6 concludes this section. 
 
3.2 System Architecture 
High-speed link signal integrity suffers from ISI caused by channel skin effect, dielectric 
loss, and reflections. The proposed transceiver is designed to support refined electrical 
channels with minimal performance degradation due to reflections, such as the one shown 
in Fig. 3.2(a) which displays a smooth low-pass frequency response and 13.5 dB loss at 
8GHz. This causes attenuation and dispersion of a 16GS/s data pulse at the channel output. 
The resultant time-domain ISI in Fig. 3.2(b) is well characterized by a fast rising side with 
only one significant pre-cursor ISI term, a fast-decaying short-tail ISI term that dominates 
 38 
 
 
through the third post-cursor, and a slow-decaying long-tail ISI term that continues out to 
higher post cursor locations [41, 69, 71, 72]. While the first pre-cursor ISI term is small, 
it can significantly degrade performance in PAM4 systems due to this modulation being 
more sensitive to residual ISI, as further quantified in the Appendix. Thus, transmitter FFE 
should be utilized to cancel this pre-cursor term and RX DFE can compensate for the post-
cursor terms. 
The slow-decaying long-tail ISI can have a large impact and necessitate a large tap count 
in DFEs with conventional FIR feedback filters [70]. Utilizing the 16GS/s pulse response 
in a statistical BER simulator, the 32Gb/s PAM4 timing margin is compared in Fig. 3.2(c) 
assuming a 2-tap TX FFE for pre-cursor cancellation and various configurations of RX 
DFE feedback filters. While 4 FIR DFE taps can achieve a BER<10-12, 9 FIR DFE taps 
are required to achieve an eye opening close to 10% at this BER. DFEs with IIR taps have 
been shown to efficiently cancel smooth exponentially decaying ISI, with only one IIR-
tap utilized for signaling over an RC-limited on-chip channel [73]. However, a major issue 
with DFE IIR feedback taps is that the comparator regeneration can limit the time available 
for the IIR filter output to reach the required amplitude to cancel the large first-post cursor 
ISI term. This motivates hybrid DFE architectures which employ one FIR feedback tap 
for the first post-cursor ISI and subsequent IIR taps for long-tail ISI cancellation [71, 74]. 
Fig. 2(c) shows that by employing one FIR and one IIR feedback tap, a performance better 
than 5 FIR taps is achieved with 32Gb/s PAM4 modulation. Multiple IIR feedback taps 
provide more flexibility to tailor the tap time constants and post-cursor location to better 
match a given PCB channel [69, 75], with close to 10% eye opening at BER=10-12 
 39 
 
 
achieved by employing one FIR and two IIR taps. The pulses responses of Fig. 3.2(d) also 
confirm that this equalization configuration is effective in cancelling both pre-cursor and 
post-cursor ISI. 
  
 
Figure. 3.2: Refined electrical channel (a) S21 response, (b) 16GS/s pulse response, (c) 
32Gb/s PAM4 timing margin with 2-tap pre-cursor TX FFE and various RX DFE 
feedback filter configurations, (d) and 16GS/s pulse response with 2-tap TX FFE and RX 
DFE with 1-FIR and 2-IIR feedback taps. 
 
Fig. 3.3 shows the proposed dual-mode NRZ/PAM4 transceiver architecture. At the 
transmitter side, a modulation mode signal selects either a 1/16th or 1/8th symbol-rate 
(a)
TX FFE
RX DFE
(b)
(c) (d)
DFE FB Filter
 40 
 
 
clock to control the 16-bit wide PRBS15 pattern generator and initial serialization stages 
in NRZ and PAM4 mode, respectively, to generate four sets of four-bit patterns which 
address the LUT equalizer that controls the 31-segment high-swing SST output stage. This 
allows the realization of a 4/2-tap FFE in NRZ/PAM4 mode, respectively. At the receiver 
side, a quarter-rate 3-tap NRZ/PAM4 DFE is utilized with 1 FIR and 2 IIR feedback taps. 
The three output bits per quarter-rate slice, which are all the same value for NRZ and 
thermometer-code for PAM4, are converted to binary and buffered out of the chip for BER 
testing. 
 
 
Figure. 3.3: Dual-mode NRZ/PAM4 transceiver architecture. 
2
15
-1
PRBS 
Generator
QCLK/4 
(NRZ)
QCLK/2
 (PAM4)
16:8 8:4
PAM4
NRZ
LUT
FFE 
Equalizer
4:1
Final 
Serializers 
& 
SST Driver
TX Out
16 8
4
8
4-tap(NRZ)
2-tap(PAM4)
4x4 5x4
SEL
Z
-1
MSB
LSB
Z
-1
Z
-1
Z
-1
Z
-1
M
o
d
u
la
ti
o
n
 
S
e
le
c
to
r
4
4

RX In
S&H

2-bit Flash ADC/Summer
QCLK
Thermometer 
To
 Binary 
Conversion
MSB/LSB and 
Phase Select
Quarter 
Rate
RX Out
Select
1-tap FIR, 
2-tap IIR 
DFE
Z
-1
Z
-1
3
 41 
 
 
3.3 Transmitter Architecture 
 
Fig. 3.4 shows the detailed transmitter block diagram. The quarter-rate architecture uses 
four sets of four-bit patterns from the on-chip PRBS15 generator to address the 16x5 
element LUT equalizer by controlling four 5-bit 16-to-1 muxes. This allows the realization 
of a 4-tap FFE in NRZ mode, with a main cursor and up to three pre/post cursor taps, and 
a 2-tap FFE in PAM4 mode, with a main cursor and either one pre/post cursor tap for the 
MSB and LSB bits. The LUT provides for 5-bit resolution in the output stage level 
generation, eliminates any full-rate tap-select muxes in the output segments [39], and also 
allows for potential non-linear equalization. After a retiming stage, a final quarter-rate 
dynamic tri-state inverter-based 4-to-1 stage serializes the 5-bit resolution LUT output to 
full rate to drive the 31-segment high-swing SST output stage. Finally, the driver output 
impedance is efficiently set to near 50 Ω with a pseudo-analog control loop. In order to 
compensate for phase mismatches in the critical serialization clocks, per-phase digitally-
controlled delay lines with adjustable duty cycle and delay are inserted in the clock 
distribution network. While not implemented in this prototype, a calibration scheme can 
be utilized similar to [52] to automatically correct phase mismatches and provide uniform 
output eyes. 
 
 
 42 
 
 
 
5 bit
16:1 
MUX
4 tap, 5 bit NRZ equalizer
2 tap, 5 bit PAM4 equalizer
16x5 lookup table
Retiming 
FFS 4
:1
 
S
e
ria
liz
e
r
SST driver
x16
SST driver
x8
SST driver
x4
SST driver
x4
SST driver
x4
16
8
4
2
1
2
3
Output
5
4x54x5
QCLK0, QCLK90,
 QCLK180, QCLK270
Impedance 
Control 
Circuit
Dynamic
 Tri-State 
Inverter
4-Phase
Input 
Clock 
Phase CalibrationPRBS Gen
+
Low 
Speed 
Serializers
Modulation 
Selector
NRZ
PAM4
4x4
 
Figure. 3.4: NRZ/PAM4 transmitter with lookup table-based FFE equalizer and pseudo-
analog impedance control. 
 
3.3.1   4-to-1 Serializer 
The final 4-to-1 serializer is one of the most critical blocks in a quarter-rate transmitter, 
as it must maintain enough bandwidth to support the full-rate output. However, this can 
be difficult to achieve with conventional pass-gate serializers which suffer from reduced 
drive strength due to the effective transistor stacking at the high self-loading output node. 
This transmitter extends the 2-to-1 tri-state inverter-based mux design proposed in [46] to 
perform 4-to-1 serialization and further improves power efficiency by utilizing dynamic 
NAND pre-drivers (Fig. 3.5(a)). Fig. 3.5(b) shows the serializer’s PMOS-path timing 
diagram, with similar waveforms present in the NMOS-path. The dynamic NAND 
 43 
 
 
predriver gates utilize the input data to qualify a pulse defined by adjacent quarter-rate 
clock edges. This allows the tri-state inverter-based mux to drive the full-rate output node 
through only a single transistor, similar to a simple inverter, with the input data activating 
one of the PMOS/NMOS devices. Dummy gates are present in both the PMOS and NMOS 
paths to enable a uniform eye diagram at the full-rate serializer output. As shown in the 
post-layout simulation results of Fig. 3.5(c), the proposed dynamic tri-state inverter-based 
design has significantly faster transition times relative to a conventional pass-gate 
serializer designed with equal power consumption. Overall, the minimal transistor 
stacking allows the proposed serializer to achieve the same level of deterministic jitter 
with a 40% power reduction relative to a conventional pass-gate design. 
 
D0
CLK0
VDD
CLK270
D0
CLK180
VSS
CLK90
D
CLK
CK0
CK90
X1
X3
D0
D0 D0 D0
D0 D0
90
o
180
o
270
o
X1
X2
X3
X2
Out D0 D0 D0
Out
Proposed Serializer
Pass-gate Serializer
0
o
D1 D2 D3 D1 D2 D3 D1 D2 D3
(a)
(b)
(c)
Time (ps)
Figure. 3.5: Dynamic tri-state inverter-based 4-to-1 serializer: (a) schematic, (b) timing 
diagram (PMOS path), and (c) simulated performance comparison with a conventional 
pass-gate design. 
 
 44 
 
 
3.3.2    Pseudo-Analog-Controlled Output Driver 
Fig. 3.6(a) shows a single segment of a conventional high-swing source-series-
terminated output driver. The segment’s output impedance is set by the series combination 
of the passive resistor, Rterm, and the transistor’s triode resistance. As shown in the 
simulation results of Fig. 3.6(c), both of the elements are affected by process variations 
and can cause deviations in the driver output impedance without any compensation (No 
comp). A straight-forward technique to control the output impedance of a high-swing 
voltage-mode driver involves implementing redundant segments that can be digitally 
activated to match the channel impedance [50]. However, the presence of these redundant 
stages results in increased output stage area and pre-driver power.  
This design proposes pseudo-analog control to compensate for large statistical 
variations in driver output impedance. Fig. 3.6(b) shows a schematic of the voltage-mode 
SST driver segments which supports a 1.2 Vpp output swing. Here the main MP and MN 
switch transistors and Rterm resistors are sized to always yield greater than 50 Ω output 
impedance over corners, and two analog-controlled paths are added for impedance tuning 
via the GP/N gate voltages. While conceivably one additional analog-controlled branch is 
sufficient for impedance control, a trade-off exists in choosing RP and RN values. As 
shown in Fig. 3.6(c), selecting a relatively small RP and RN value to yield near 50 Ω under 
a +3σ variation case (Single leg comp1) results in low overdrive voltages for the MRP and 
MRN transistors under a nominal impedance corner. This causes a large positive deviation 
from the desired 50 Ω value due to the transistors entering the saturation region with a 
small-signal output impedance higher than the large-signal value set by a conventional 
 45 
 
 
analog control loop. Conversely, selecting a relatively high RP and RN value to yield near 
50 Ω under a nominal variation case (Single leg comp2) results in insufficient overdrive 
voltage range and a large positive deviation under a +3σ impedance corner. Thus, in order 
to break this trade-off, a two branch compensation approach with both analog-controlled 
low-impedance path 1 and high-impedance path 2 are added which are replica-biased by 
the FSM-controlled pseudo-analog loop. 
 
In
RP1
Rterm
MRP2 MRP1
MP
GP1GP2
RN2 RN1
Rterm
MRN2 MRN1
MN
GN1GN2
RP2
Out
(b) (c)
VDD
VDD
Rterm
Rterm
MN
MP
In
(a)
Out
Rterm Value
Nominal-3σ 
 D
ri
v
e
r 
O
u
tp
u
t 
Im
p
e
d
a
n
c
e
 (
Ω
)
+3σ 
No comp
Single leg comp1
Single leg comp2
Dual leg comp
Figure. 3.6: (a) Conventional SST output driver segment. (b) Proposed output driver 
segment with pseudo-analog impedance control. (c) Simulated output impedance vs. 
process corners. 
 
Fig. 3.7 shows the output driver impedance controller that produces the output voltages, 
GP1, GN1, GP2, and GN2, that control the low/high-impedance paths’ pull-up and pull-
down resistances. The impedance controller consists of a replica transmitter stage with a 
precision off-chip 100 Ω resistor load that is placed in two feedback loops. Depending on 
the control-loop mode, the top loop sets the MRP1/2 transistors’ gate voltage with either 
the analog control signal VOP or in a digital fashion to be fully-on (VSS) or full-off (VDD) 
 46 
 
 
in order to force a value of (3/4)*VDD at the replica transmitter positive output. The 
bottom loop works in a similar manner to force a value of (1/4)*VDD at the replica 
transmitter negative output. For corners with low output resistance, the impedance tuning 
circuitry operates with the lower-impedance path 1 in the feedback loops to set analog 
voltages GP1/GN1 with VOP/VON to yield a 50 Ω match, while the higher-impedance 
path 2 is disabled (Mode 1). In Mode 1, both the replica driver and the main output driver 
segments share the same control signals. For corners with high output resistance, path 1 
switches from analog to digital control and is turned fully on, while the higher-impedance 
path 2 is now in the feedback loops to set analog voltages GP2/GN2 with VOP/VON to 
yield a 50 Ω match (Mode 3). In Mode 3, again both the replica driver and the main output 
driver segments share the same control signals. For corners with close-to-nominal output 
resistance, the main output driver is designed to operate with path 1 simply set fully on 
and path 2 disabled, while the replica loop controls either the low- or high-impedance path 
depending on the previous state (Mode 2A/B). Switching between the modes in the replica 
loop without dithering the control signals presented to the main output driver is achieved 
by an asynchronous FSM that monitors the VON voltage. As shown in the Fig. 3.7 
flowchart, in the nominal impedance case the replica driver will be continuously switching 
between the low-impedance (Mode 2A) and high-impedance (Mode 2B) modes without 
disturbing the output driver segments. In Mode 2A with the low-impedance path in 
feedback, the loop checks whether VON is less than a high threshold VH, corresponding 
to deep triode operation of MRN1, minus some margin before transitioning to Mode 1 
with analog control of the low-impedance path 1 in the main output stage segments. In 
 47 
 
 
Mode 2B with the low-impedance path fully on and the high-impedance path in feedback, 
the loop checks whether VON is greater than a low threshold VL, corresponding to a 
minimum conductance level from MRN2, plus some margin before transitioning to Mode 
3 with analog control of the high-impedance path 2 and the low-impedance path 1 fully 
on in the main output stage segments. The margin introduced in transitioning between 
Modes 1/2 and 2/3 introduces hysteresis which, along with the extra Mode 2 state, prevents 
dithering in the main output segment impedance control signals.  
 
 
Figure. 3.7: Impedance control loop: (a) different operation modes, (b) NMOS control 
OTA output voltage VON in different modes, and (c) FSM flow chart.  
 
Robust operation of the replica bias impedance control scheme in the presence of 
mismatch is ensured since the analog-controlled MRP and MRN transistors are always 
3/4 VDD
VSS
VDD
VSSVDD RP2 RP1
Rterm
MRP2 MRP1
MP
GP1GP2
VDD VSS
To Driver 
Segments
To Driver 
Segments
1/4 VDD
VDD
VSS
VDDVSS RN2 RN1
Rterm
MRN2 MRN1
MN
GN1GN2
VSS VDD
To Driver 
Segments
To Driver 
Segments
Off-chip 
accurate 
resistor
3/4 VDD
VSS
VDD
VSSVDD RP2 RP1
Rterm
MRP2 MRP1
MP
GP1GP2
VDD VSS
To Driver 
Segments
To Driver 
Segments
1/4 VDD
VDD
VSS
VDDVSS RN2 RN1
Rterm
MRN2 MRN1
MN
GN1GN2
VSS VDD
To Driver 
Segments
To Driver 
Segments
Off-chip 
accurate 
resistor
3/4 VDD
VSS
VDD
VSSVDD RP2 RP1
Rterm
MRP2 MRP1
MP
GP1GP2
VDD VSS
To Driver 
Segments
To Driver 
Segments
1/4 VDD
VDD
VSS
VDDVSS RN2 RN1
Rterm
MRN2 MRN1
MN
GN1GN2
VSS VDD
To Driver 
Segments
To Driver 
Segments
Off-chip 
accurate 
resistor
3/4 VDD
VSS
VDD
VSSVDD RP2 RP1
Rterm
MRP2 MRP1
MP
GP1GP2
VDD VSS
To Driver 
Segments
To Driver 
Segments
1/4 VDD
VDD
VSS
VDDVSS RN2 RN1
Rterm
MRN2 MRN1
MN
GN1GN2
VSS VDD
To Driver 
Segments
To Driver 
Segments
Off-chip 
accurate 
resistor
VON
Impedance variation
VH
VL
2
nd
 branch
1
st
  branch
VL + margin
VH - margin
Mode2
Mode1 Mode3
High resistanceLow resistance
VOP VOP
VOP VOP
VON VON
VON VON
Start
Mode 1
VON>VH
Mode 2A
VON >
VH - margin
Mode 2B
VON >
VL + margin
Mode 3
Mode 1 Mode 3
Mode 2A Mode 2B
(a) (c)
(b)
VON >VL
YES
NO
YES
NO
YES
NO
YES NO
 48 
 
 
biased in the triode region with a large overdrive voltage. In order to quantify the effect of 
both process variation and mismatch between the replica and output driver segments, the 
output driver’s post-layout simulated return loss plot is shown in Fig. 3.8 with ±3σ error 
bars. While there is some slight variation over the process corners, the small error bars 
indicate that the mismatch-induced variation for a given corner is minimal. Overall, the 
simulation results show a worst-case return loss of -27.4 dB and -10.5 dB at 500 MHz and 
8 GHz, respectively. 
 
ff -40
o
C
tt 27
o
C
ss 120
o
C
 
Figure. 3.8: Monte Carlo simulations of the output driver S11 for different process corners 
with ±3σ error bars for mismatch at a given corner included. 
 
 
 49 
 
 
3.4 Experimental Results 
 
The dual-mode NRZ/PAM4 SerDes was fabricated in a 65-nm CMOS general purpose 
process. As shown in the die micrographs of Fig. 3.9, the total active area for the 
transmitter is 0.06 mm2 and the DFE receiver core is 0.014 mm2. The four phase clocks 
for the quarter-rate SerDes is generated on both chips by passing a half-rate differential 
input clock through on-chip CML divide-by-2 blocks followed by CML-to-CMOS 
converters and local clock buffers. 
 
 
Figure. 3.9: Chip micrograph of (a) transmitter and (b) receiver. 
 
Fig. 3.10 shows measurement results of the transmitter positive and negative output pin 
impedance versus output differential voltage for 5 different transmitter chips. The 
impedance control loop ensures that the output stage maintains near a 50 Ω output 
impedance over the entire 1.2 Vpp range for both nominal samples (1-4) which operate in 
(b)(a)
1
 m
m
DFE 
Core
Clocking
Output Buffer
Input
Clock
105 µm
131 µm
LUT & EQ
4:1 Serializer &
Output Driver
Tx Output
220 µm
160 µm
120 µm 
x60 µm
PRBS+SER+ModSEL
CLK 
Buff
100 µm 
x80 µm
Term
Cont
120 µm 
x80 µm
 50 
 
 
Mode 2 and the high-impedance variation sample 5 which operates in the analog-
controlled Mode 3. Fig. 3.11 shows level separation mismatch ratio (RLM) measurements 
which highlight the utility of the LUT-based transmitter. Utilizing the default PAM4 
settings results in a 93% RLM, with the third level being somewhat low in this sample. 
Optimizing the LUT settings allows for an improved 96.7% RLM and more uniform level 
spacing.  
 
 
Figure. 3.10: Measured transmitter output impedance versus differential output voltage 
for (a) positive output pin and (b) negative output pin. 
 
VD=604mV
VC=206mV
VB=-211mV
VA=-599mV
RLM=96.7%RLM=93.0%
VD=604mV
VC=162mV
VB=-211mV
VA=-599mV
(a) (b)
Figure. 3.11: Level separation mismatch ratio (RLM) measurement results for (a) nominal 
PAM4 level settings and (b) optimized PAM4 level settings. 
(a) (b)
 51 
 
 
 
A block diagram of the link BER test setup and measurements of the two test channels’ 
insertion loss is shown in Fig. 3.12. Eye diagrams are captured at the output of the test 
channels, excluding the RX PCB loss of about 3 dB at 8 GHz, utilizing a high-bandwidth 
sampling scope to characterize the transmitter. Full link testing is performed with two 
synchronized sources to generate the transmitter and receiver clocks. A programmable 
phase shifter is inserted in the receive-side path to manually adjust the phase and generate 
BER bathtub curves. This receive-side clock is also used to clock the BERT. In PAM4 
mode, the on-die quarter-rate data MUX at the receiver output allows for independent 
verification of the MSB or LSB outputs. These results are then combined to produce the 
receiver BER bathtub curves in PAM4 mode. 
 52 
 
 
 
TX
Chip
TX PCB 
Trace
Test 
Channel
RX PCB 
Trace
RX
Chip
Oscilloscope
Signal 
Source
8 GHz CLK
Trigger
BERT
Phase Shifter
Signal 
Generator
Centellax
PCB12500
Agilent
N5230A Agilent
E8267D
Agilent
DCA-X 
86100D
8 GHz CLK
Sync
4Gb/s 
16GS/s
Divider
 
Figure. 3.12: Dual-mode NRZ/PAM4 transceiver test setup. 
 
The transmitter eye diagrams at the channels’ outputs and the full-link BER timing 
margin bathtub curves are shown in Fig. 15 and Fig. 16, respectively. Fig. 16 also includes 
the utilized TX and RX equalizer settings, with initial values obtained using the statistical 
simulation model discussed in Section 3.2 and further manual fine tuning employed to 
achieve the lowest BER. 32 Gb/s PAM4 operation is achieved over channel 1, which has 
13.5 dB loss at the 8GHz Nyquist frequency. The left half of Fig. 3.13 shows that without 
any transmit equalization the output eye diagram is completely closed. As shown in Fig. 
3.14(a), utilizing only RX equalization in this case allows for only a BER near 10-10. While 
 53 
 
 
only optimizing the PAM4 2-tap TX FFE allows for open eyes at the channel output before 
the RX PCB, the additional board loss results in only 0.02UI timing margin at a BER=10-
12 without any receiver equalization. Co-optimizing the 2-tap TX FFE for pre-cursor ISI 
cancellation with the RX DFE for post-cursor cancellation allows this timing margin to 
increase to 0.06UI. Note that in this co-optimized condition the eye diagram at the RX 
PCB input is completely closed, as shown in Fig. 3.13(c). 16 Gb/s NRZ operation is 
achieved over channel 2, which has 27.6 dB loss at the 8 GHz Nyquist frequency. The 
right half of Fig. 3.13 shows that without any transmit equalization the output eye diagram 
is completely closed. As shown in Fig. 3.14(b), utilizing only RX equalization in this case 
allows for only a BER near 10-8. Optimizing the NRZ 4-tap TX FFE allows for open eyes 
at the channel output before the RX PCB, as depicted in Fig. 3.13(e). Jitter decomposition 
of this eye yields 34.3 ps of deterministic jitter, with 31.2 ps of residual ISI being the main 
contributor. The random jitter is measured at 830 fsrms using a clock source with 750 
fsrms random jitter. A timing margin of 0.08UI at a BER=10-12 is achieved without any 
receiver equalization. This timing margin is improved to 0.18UI with co-optimization of 
the 4-tap TX FFE and RX DFE. As in the PAM4 case, in this co-optimized NRZ condition 
the eye diagram at the RX PCB input is completely closed, as shown in Fig. 3.13(f).  
 54 
 
 
 
Figure. 3.13: 32Gb/s PAM4 eye diagrams over channel 1: (a) without TX equalization, 
(b) with optimal 2-tap TX-only FFE settings, and (c) with the 2-tap TX FFE settings co-
optimized with the RX DFE to yield maximum timing margin. 16Gb/s NRZ eye diagrams 
over channel 2: (d) without TX equalization, (e) with optimal 4-tap TX-only FFE settings, 
and (f) with the 4-tap TX FFE settings co-optimized with the RX DFE to yield maximum 
timing margin. 
(a)
200mV/div, 12.5ps/div
(b)
100mV/div, 12.5ps/div
100mV/div, 12.5ps/div
(c)
(d)
200mV/div, 12.5ps/div
50mV/div, 12.5ps/div
100mV/div, 12.5ps/div
(e)
(f)
32Gb/s PAM4 16Gb/s NRZ
 55 
 
 
(a) (b)
Figure. 3.14: Transceiver equalizer settings and bathtub curves for (a) channel 1 at 32 Gb/s 
PAM4 and (b) channel 2 at 16 Gb/s NRZ. 
 
Table 3.1 summarizes the multi-mode transceiver performance and compares this work 
against other dedicated NRZ and PAM4 designs. Relative to the mixed-signal PAM4 
designs of [60] and [49], the presented transceiver’s additional equalization functionality 
allow for compensation of higher channel loss. Better power efficiency and significant 
area reduction is also achieved relative to a 16-nm ADC-based PAM4 transceiver [76]. 
Comparing against NRZ designs, the presented dual-mode transceiver achieves a higher 
32 Gb/s data rate in PAM4 mode at a better power efficiency than a 28-nm design 
operating at 28 Gb/s NRZ [39]. Also, superior power efficiency in NRZ operation is 
 56 
 
 
achieved relative to the 16 Gb/s 40-nm design which utilizes a DFE with 14 FIR feedback 
taps [49]. Fig. 3.15 shows the 32 Gb/s power breakdown. The transmitter consumes 158.6 
mW of power, with the 4-to-1 serializer and local clock buffers having the most 
contribution. Only 17.7 mW is consumed at the receiver, with the local clock buffers and 
comparators dominating. 
 
Table 3.1: Transceiver Performance Summary 
References This Work [60] [49] [76] [39] [67] 
Data Rate 32 Gb/s 16 Gb/s 20 Gb/s 56 Gb/s 56 Gb/s 28 Gb/s 16 Gb/s 
Equalization 
2-tap TX 
FFE + 
1-tap FIR, 
2-tap IIR 
RX DFE 
4-tap TX 
FFE + 
1-tap FIR, 
2-tap IIR 
RX DFE 
3-tap TX 
FFE 
3-tap TX 
FFE + 
1-tap RX 
DFE 
3-tap TX FFE + 
RX CTLE + ADC 
based RX 24-tap 
FEE, 
1-tap DFE 
5-tap TX FFE 
+ 
RX CTLE + 
14-tap RX 
DFE 
3-tap TX FFE 
+ 
RX CTLE + 
14-tap RX 
DFE 
Modulation PAM4 NRZ PAM4 PAM4 PAM4 NRZ NRZ 
Total Loss @ 
Nyquist 
13.5 dB 27.6 dB 5 dB 2 dB 25dB 
40 dB for 
25.78 Gb/s 
34 dB 
Eye Width 
BER 
6% 
10-12 
18% 
10-12 
- 
10-12 
- 
10-12 
- 
10-8 
23% 
10-12 
- 
10-15 
Supply (V) 1.2 TX, 1 RX 1.8 1.2 
0.9 digital, 
1.2 analog, 
1.8 auxiliary 
1 TX & RX, 
1.25 TX 
driver 
1/1.5 TX, 
0.9 RX 
Power (mW) 
(mW/Gbps) 
176.3 
5.5 
173.7 
10.9 
408 
20.4 
475 
8.5 
550* 
9.8 
295* 
10.5 
235* 
14.7 
Area (mm2) 0.074 0.43 2.74 1.4 0.62 2.15 
Technology 65-nm 90-nm 
65-nm TX, 
40-nm RX 
16-nm FinFET 28-nm 40-nm 
*Clock generation and CDR power included 
  
 57 
 
 
Summer 10%
Comparators
33%
Local Clock 
Buffers
38%
IIR Filters 6%Others 13%
4-to-1 Serializer
35%
Local Clock 
Buffers
32%
LUT 18%
Output Driver 
7%
Low-Speed 
Serializers
6%
Others
2%
17.7 mW 
Total
158.6 mW 
Total
(a) (b)  
Figure. 3.15: 32 Gb/s power breakdown of (a) transmitter and (b) receiver. 
 
3.5 Conclusion 
 
This section has presented a 16/32 Gb/s dual-mode NRZ/PAM4 SerDes which can be 
configured to work in both modes with minimal hardware overhead. The SST transmitter 
achieves 1.2 Vpp output swing and employs lookup table control of a 31-segment output 
DAC to implement 4/2-tap FFE in NRZ/PAM4 modes, respectively. Power efficiency is 
improved in the transmitter with an optimized quarter-rate serializer and a new low-
overhead analog impedance control scheme is employed in the output stage to obviate 
additional impedance control segments. The presented DFE receiver utilizes a new single-
clock phase two-stage regenerative comparator in the 2-bit flash ADCs to allow sufficient 
gain to support PAM4 DFE. Improved sensitivity is achieved in the direct feedback design 
with the multi-level first post-cursor ISI subtracted in the comparators and the remaining 
ISI cancelled in a preceding current integration summer. Overall, leveraging the proposed 
dual-mode SerDes architecture allows the support of multiple channel conditions and 
variable data rates with a single design solution. 
 58 
 
 
4. PAM4 56 GB/S RECEIVER WITH THRESHOLD AND DFE ADAPTATION 
4.1 Introduction 
Supporting the increasing demand for higher bandwidth in datacenters and 
telecommunication infrastructure requires increased data-rate per lane for electrical 
interfaces. While I/O circuit bandwidth and power efficiency can leverage technology 
improvements, as upgrading the current infrastructure require major investment, channel 
bandwidths have remained constant prohibiting reliable communication beyond 50Gb/s 
in traditional non-return-to-zero (NRZ) signaling due to excessive channel loss and 
reflections. This has motivated employing PAM4 signaling in new high-speed I/O 
standards owing to higher spectral efficiency it provides [42, 43].  
Adoption of PAM4 signaling, however, increase complexity of system both at 
transmitter and receiver sides. The inherent multi-level signaling increases the complexity 
of transmitter FFE by a factor of 2, making many-tap equalizers prohibitive in terms of 
power. The receiver is required to make multi-level decisions and implement equalizers 
that can cancel long-tail multi-level ISI while maintaining linearity and power efficiency. 
Fig. 4.1 shows block diagram of such a system. The focus of this work is on the receiver 
side while trying to minimize the equalization requirement on the transmitter side.  
 
 
*© 2018 IEEE. Part of section 4 is reprinted, with permission, from A. Roshan-
Zamir, T. Iwai, Y.-H. Fan, A. Kumar, H.-W. Yang, L. Sledjeski, J. Hamilton, S. 
Chandramouli, A. Aude, and S. Palermo, “A 56 Gb/s PAM4 Receiver with Low-
Overhead Threshold and Edge-Based DFE FIR and IIR-Tap Adaptation in 65nm 
CMOS,” IEEE Custom Integrated Circuits Conference, expected publication date: 
July 2018. 
 59 
 
 
While ADC-based receivers are well suited for PAM4 signaling [77], due to inherent 
multi-level detection and possibility of implementing robust many-tap DFE and FFE 
equalizers in digital domain, they generally consume high power. This motivates a power-
efficient mixed-signal receiver front-end solution for these applications. However, several 
challenges are faced in a mixed-signal PAM4 receiver design. Increased sensitivity of  
PAM4 to noise and residual ISI enforce stringent ISI cancellation requirements. This 
can lead to increased tap counts in transmitter side FFE equalization, multi-stage power 
hungry receiver side CTLE equalization [77-79], and receiver side DFEs that have large 
tap counts when implemented with FIR feedback filters [78].  While CTLE is effective in 
canceling the long-tail ISI, the linear amplifiers must be designed with sufficient 
bandwidth to support the full rate signal and linearity to support PAM4 modulation and 
enough peaking to compensate for the target channel loss. This often results in multi-stage 
CTLE implementations with excessive area overhead due to inductive peaking required 
to maintain the bandwidth at high data-rates. An alternative to using CTLE and many-tap 
DFEs is to utilize DFEs which combine FIR and IIR feedback filters [57, 69]. However, 
due to main cursor loss, supporting channels with over 15dB of loss at Nyquist requires 
excessive slicer sensitivity in the frontend [29].  
Similar to an NRZ system, DFE taps should be adaptively tuned for robust operation. 
The extra challenge associated with PAM4 operation is multi-level channel and 
equalization dependent thresholds which also require to be adaptively tuned. These 
adaptations should be done with minimal hardware overhead and offer compatibility with 
clock recovery architectures that support PAM4 modulation. A PAM4 threshold 
 60 
 
 
adaptation based on difference between input data and summation of input data and 
positive and negative thresholds is proposed in [80]. In addition to high speed summer and 
slicer overhead, the proposed method relies heavily on equal spacing of PAM4 levels 
which is sensitive to non-linearity of both transmitter and receiver frontend.  
This section presents a 56 Gb/s mixed-signal PAM-4 quarter-rate receiver with 
background threshold and DFE adaptation which utilizes phase locked loop (PLL) based 
clock and data recovery (CDR). Section 4.2 investigates the optimal equalization solution 
to achieve successful transmission over the target channel using statistical bit error rate 
(BER) modeling. The PAM4 receiver architecture employing a single-stage CTLE and a 
1 FIR and 1 IIR-tap DFE to efficiently cancel long-tail ISI  and utilizing a bang-bang phase 
detector (BBPD) PLL-based CDR recover the clock using only one per-slice edge sampler 
is detailed in section 4.3. Section 4.4 discusses a background sampler threshold adaptation 
scheme which doesn’t rely on equal level spacing of PAM4 levels using only an additional 
single per-slice sampler that periodically scans the top and bottom of PAM4 eyes as well 
as the DFE adaptation scheme of [71] is extended for PAM4 operation with independent 
per-slice tap values for mismatch robustness utilizing the same edge slicers used in CDR. 
Experimental results from a general purpose (GP) 65nm CMOS prototype is presented in 
section 4.5. Finally, section 4.6 concludes the section. 
 61 
 
 
Channel
S
e
ri
a
li
z
e
r
D
T
X
[N
:0
]
D
e
s
e
ri
a
li
z
e
r
D
R
X
[N
:0
]
TX FFE 
Equalizer
Z
-1
Z
-1

Z
-1
RX DFE 
Equalizer
P
A
M
4
 
S
u
m
m
e
r/
D
A
C
Continous 
Time Linear 
Amplifier
Transmitter Receiver
CDR
Equalization &
Threshold 
Adaptation
PLLREF CLK
Equalization 
Adaptation
Back Channel
(Low Frequency)
 
Figure. 4.1:  Conceptual block diagram of a PAM4 transceiver with transmitter and 
receiver side equalization, equalizer and threshold adaptation, and clock recovery circuit.  
 
4.2    System Analysis 
The proposed receiver is designed to support PAM4 transmission over refined electrical 
channels with minimal performance degradation due to reflections. Fig. 4.2(a) shows the 
frequency response of such a channel with 20.8dB of loss at 14GHz. ISI caused by skin 
effect and dielectric loss with results in attenuation and dispersion of input pulses at the 
output of the channel. This results in pulse response of Fig. 2(b) (blue curve) which can 
be well characterized by a fast rising side with one significant pre-cursor ISI term, and a 
slow-decaying long-tail ISI on the other side with significant ISI terms up to tenth post-
cursor. Due to increased sensitivity of PAM4 to ISI compared to NRZ, an appropriate 
cancellation of all these terms should be done in order to have error free transmission over 
this channel. A 3-tap DFE with 1-FIR and 2-IIR taps [29] can cancels out the post-cursor 
terms appropriately as illustrated in pulse response of Fig. 4.2(b) (green curve) while 
avoiding any CTLE. However, due to significant pre-cursor term minimum BER will be 
higher than 10-3 as depicted in voltage and bathtub curves of Fig. 4.2(c) and (d). A 2-tap 
 62 
 
 
transmitter side FFE equalizer can be used to cancel the large pre-cursor tap as illustrated 
in Fig. 4.2(b) (red curve) while using the same 3-tap DFE on the receiver side [72]. This 
plot shows the pulse response with optimal transmitter and receiver equalizer settings. It 
can be noted from the pulse response that the 2-tap FFE equalizer with optimal settings 
won’t cancel out the pre-cursor sufficiently. The reason is that by increasing the pre-cursor 
coefficient beyond this point, the second pre-cursor ISI term will become larger while the 
main cursor becomes smaller and overall BER of the system will degrade. As depicted in 
bathtub curves of Fig. 2(c) and (d), this will result in BER of worse than 10-10. In order to 
achieve better pre-cursor cancellation, more pre-cursor taps are required at the transmitter 
side. While CTLE has relatively similar behavior to DFE IIR-tap, it has some advantages 
and disadvantages in comparison. As discussed in section 4.1 relying solely on CTLE for 
long-tail ISI cancellation can be area hungry due to required peaking. One advantage of 
CTLE over IIR tap is that while the domain of effect for both is in the post-cursors, an 
active CTLE boost the main cursor reducing the system sensitivity to residual ISI. By 
replacing one of the DFE IIR taps with a single stage CTLE with a feasible 6dB of high 
frequency peaking we can break the trade-off between using CTLE and DFE IIR taps. Fig. 
4.2 (b) (black curve) shows the pulse response of such a system with a single stag CTLE, 
1-tap FIR and 1-tap IIR DFE along with the same 2-tap transmitter FFE. As depicted, 
although similar pre-cursor ISI cancellation is achieved the boost in the main cursor makes 
the system performance less susceptible to the residual ISI. The resultant bathtub curves 
of Fig. 4.2(c) and (d) show BER of better than 10-12 with timing and voltage margins of 
better than 0.2UI and 18mV for this BER. In order to achieve similar performance by 1-
 63 
 
 
tap FIR, 2-tap IIR DFE without engaging CTLE peaking, a 4-tap transmitter side FFE is 
required which could be prohibitive in terms of power and area considering the PAM4 
complexity. 
No EQ
RX DFE
TX FFE + RX DFE
TX FFE + RX CTLE & DFE
No EQ
RX DFE
TX FFE + RX DFE
TX FFE + RX CTLE & DFE
No EQ
RX DFE
TX FFE + RX DFE
TX FFE + RX CTLE & DFE
(a) (b)
(c) (d)
Figure. 4.2: Refined electrical channel. (a) S21 response. (b) 28GS/s pulse responses with 
various equalizer configurations, (c) 56 Gb/s PAM4 voltage margin, and (d) 56 Gb/s 
PAM4 timing margin with 2-tap pre-cursor TX FFE and various RX equalizer 
configurations. 
 
4.3    Receiver Architecture 
Based on system analysis of Section 4.2, a PAM4 receiver with a single stage CTLE 
and a 1-tap FIR, 1-tap IIR DFE is proposed as shown in Fig. 4.3. The CTLE output is 
connected to the quarter-rate DFE with slices that consists of 5 samplers. Three data 
samplers implement a 2-bit flash ADC for PAM4 symbol detection, 1 error sampler 
 64 
 
 
periodically scans the top and bottom eyes for threshold tuning, and 1 edge sampler 
provides information for CDR phase locking and DFE tap adaptation. The outputs of the 
4 receiver slices are first deserialized to 1/8 symbol rate, with the data and edge samples 
driving the CDR’s PAM4 BBPD. At this point the data samples are also probed out for 
external BER testing. All the data, error, and edge samples are then further deserialized to 
1/32 symbol rate for processing by the DFE tap and threshold adaptation logic. 
 
Input
CTLE 2-bit Flash ADC 
+
PAM4 Equalizer 
Adaptation 
Logic
DFE
FIR 
Weight
DFE
IIR
 Time 
.Const
DFE
IIR 
.Amp
Error Sampler
Slicer 
.Thre
Edge Sampler
14 GHz 
LC-VCO
BBPD
Divider 
and 
Buffers
8:32
MUX
Output
D
ER
ED
1-tap FIR, 1-tap 
IIR DFE
Z
-1+
3 12
4
4
4:8
 
Figure. 4.3: 56Gb/s PAM4 receiver with threshold and DFE tap adaptation. 
 
A detailed block diagram of the equalizer data-path is shown in Fig. 4.4. The quarter-
rate architecture reduces the clocking power and relaxes timing of CML slicers by giving 
them extra time to recover from previous decision in sampling phase. Eight phase clocks 
are used for data and edge detection. The three data samplers implement a 2-bit flash ADC 
for PAM4 level detection. The middle data slicer threshold will be set to zero while the 
 65 
 
 
top and bottom slicers’ threshold will be set to ±2/3 the post-equalized amplitude of the 
received signal by the threshold adaptation circuit. The error slicer is used for threshold 
tuning and its threshold will be set by the adaptation circuit. The edge sampler is used for 
timing recovery and DFE adaptation with the threshold set to zero. An FIR tap is utilized 
to cancel the first post-cursor ISI. This multi-level FIR tap is efficiently realized by feeding 
back the data samplers’ 3-bit thermometer-coded output bits directly to three equally 
weighted summer inputs embedded in data, edge and error comparator to minimize FIR 
tap critical delay and meet the 1-UI stringent timing. Long-tail ISI is efficiently cancelled 
with the IIR tap, starting from the second post-cursor. In order to minimize the 
comparator’s internal loading, these IIR tap is subtracted from the input with CML 
summers that precedes the comparators. The quarter-rate data samplers’ outputs are 
serialized to full-rate and filtered to generate the IIR tap signal. 
 66 
 
 

D4[1:3]
VIIR
VIIR
D1[1:3]
II
R
 F
il
te
r/
M
u
x
ER1
Clk0 Clk45

VIIR
D2[1:3]
Clk90 Clk135
Two more 
segments
DFE IIR TAP
ED1
ER2
ED2




Data Samplers
Edge 
Sampler

Error 
Sampler
DFE FIR TAP
RXin CTLE
 
Figure. 4.4: Equalizer data-path. 
 
Fig. 4.5 (a) shows the single-stage CTLE with tunable degeneration resistor and 
capacitor to tune the high frequency peaking and low frequency gain. The high frequency 
peaking is set by the capacitor DAC while the low frequency gain is set by the resistor 
DAC both of which having 3-bit resolution and providing 0 to 6 dB of range as illustrated 
in frequency response of Fig. 4.5 (b) and (c), respectively, for different settings. 
 67 
 
 
In
Out
(a) (c)
(b)
 
Figure. 4.5: Single stage CTLE (a) block diagram and frequency response with different 
(b) capacitor DAC settings, and (c) resistor DAC settings. 
 
 While strong-arm comparators [76], and modified double-tail versions [73], have 
advantages that include no dc power, small aperture time, high gain, and CMOS-level 
outputs, due to their multi-stage implementation, they often suffer from reduced 
bandwidth and large delay. The single stage CML slicers [58], on the other hand, provide 
higher bandwidth while suffering from reduced gain and static power consumption. Fig. 
4.6 (a) shows the schematic of such slicer with embedded FIR taps and threshold/offset 
control pairs. In order for the feedback loop to operate as a through DFE, avoiding noise 
 68 
 
 
propagation in the loop, [54] the input data should be sufficiently large to clip the feedback 
pairs. A differential input amplitude of ~14mV is required to achieve 90% maximum DFE 
weight at 56Gb/s as depicted in Fig. 4.6 (b). Thus the minimum sensitivity of the DFE 
loop is equal to 28mVppd. Samplers’ threshold and offset is controlled through the DAC-
generated Voff/Vth voltage with 7-bit (1-bit sign, 6-bit amplitude) resolution with a 
maximum range of more than 250mV as illustrated in Fig. 4.7 (a). Independent DFE FIR-
tap weights are used to set the tail currents with 6-bit resolution on a per-slice basis to 
compensate for mismatch between the receiver slices achieving more than 150mV of 
range as depicted in Fig. 4.7 (b). The single IIR MUX of Fig. 4.8 combines the 
thermometer quarter-rate data from all the slices and serializes it to full-rate using a 
current-mode architecture. A tunable RC load filter implements the IIR filter, with the 
time constant controlled through tunable 3-bit resistor DAC for coarse tuning and tunable 
3-bit tunable capacitor DAC for fine tuning. Fig. 4.9 shows how the time constant can be 
tuned by changing the resistor and capacitor DAC codes. The IIR amplitude is controlled 
by the tunable tail current. The IIR summation is done in the CML summer as shown in 
Fig. 4.8. The input pair is degenerated to achieve required linear range to support input 
signal swing. The gain and linear range of the input pair can be set by the tunable 
degeneration resistor. As the IIR tap cancelation domain starts from 2nd post cursor, the 
IIR taps doesn’t require degeneration due to smaller amplitude requirements.   
 69 
 
 
Dn[x]/EXn
VIN
CLKB
IMain
CLK CLKB
Voff/Vth
IOff
CLK CLKB
IPost
CLK
Dn-1[1]
Dn-1[1] slice
Dn-1[2] slice
Dn-1[3] slice
RR
(a)
(b)  
Figure. 4.6: (a) CML buffer with DFE FIR-tap and threshold control. (b) Simulated 
normalized FIR-tap offset weight versus differential input amplitude. 
 
(a) (b)  
Figure. 4.7: (a) Simulated comparator offset versus threshold DAC code, (b) simulated 
FIR weight vs FIR DAC code.  
 70 
 
 
 
Mn5 Mn6
Clk0 Clk90
Clk0Clk90
Cell 1
Cell 2 Cell 3 Cell 4
Io-Itap Io+Itap
Clk180
Clk180
Clk270
Clk270
MUX DX[1]
MUX DX[2]
MUX DX[3]
Cd
Rd
Ro Ro
D1[1] D2[1] D3[1] D4[1]
In
RR
Summer
IIR
Tap
IIR MUX & Flilter
VDD
VDD
To the 
SlicersFrom 
CTLE
 
Figure. 4.8: Block diagram of IIR MUX, filter and summer. 
 71 
 
 
Resistor DAC Code
0 to 7
 
Figure. 4.9: IIR time-constant versus resistor and capacitor DAC settings. 
 
Fig. 4.10 shows the PLL-based CDR block diagram. The BBPD receives the 1/8-rate 
data and edge samples and filters out all but the symmetric transitions to avoid asymmetric 
PAM4 transition-induced jitter. In order to reduce loop latency, the BBPD works with 8 
parallel Early/Late signals controlling an 8-segment charge pump. This parallel charge 
pump drives the loop filter to produce the control voltage for a 14GHz LC oscillator [81]. 
In addition to the primary resonator tank, oscillator phase noise is reduced with tanks also 
in the source of both cross-coupled transistor pairs [82]. Quarter-rate clocks are generated 
by a CML divide-by-two and then converted to CMOS level. Static CMOS phase 
interpolators efficiently generate the 8 clock phases for the quarter-rate data and edge 
samplers. Per-phase skew calibration is achieved with tunable delay buffers preceding the 
samplers. 
 72 
 
 




Charge 
Pump
Early
Late
Loop Filter
I
Q
QB
IB
Dn[1:3]
En
12
4
2X Oversampling 
Clock Generators
In
4:8
24
8
PAM4
BBPD
CML
Divider
CML
to
CMOS
I
Q
CLK0
CLK45
CLK90
Phase 
Calibration
Data 
CLK
Edge 
CLK
4
4
4 4
VCNT
14 GHz 
LC-VCO
 
Figure. 4.10: PAM4 PLL-based CDR. 
 
4.4    Threshold and DFE Tap Adaptation 
Background sampler threshold adaptation is achieved with an additional error sampler 
to periodically estimate the top/bottom eye height and place the thresholds in the middle. 
An initial foreground calibration step is performed where all 20 samplers are set to zero 
offset/threshold by shorting the input to the common mode and adjusting the per-sampler 
Voff/Vth DAC codes. On a per-slice basis, the top sampler’s threshold (highlighted in 
green) is then incremented up by 1LSB (error offset 1) to come to the initial condition 
shown in Fig. 4.11. The initial coarse adaptation steps are based on uniform symbol 
statistics, with both the top sampler and error thresholds (highlighted by red) increased 
until a 25% one detection probability is achieved by the error sampler. Also, in parallel 
 73 
 
 
the bottom sampler (highlighted by blue) is stepped at the same rate in an open-loop 
manner to improve convergence speed. At the end of state 1, the error sampler is residing 
at the bottom of the top eye and the top sampler is sampling only 1 threshold LSB inside 
the eye. Next, the polarity of the error sampler threshold is inverted and then fine-tuned to 
converge to the top of the bottom eye based on 25% zero detection criteria, while the 
bottom sampler will follow the error sampler by -1 LSB difference (State 2). The 
independent top and bottom threshold tuning eliminates errors caused by PAM4 
asymmetry and level spacing mismatch. In order not to rely on uniform statistics, the 
process then transitions to monitoring the relative values of the error samplers and the 
bottom/top samplers to track the eye edges in States 3 and 4. It should be noted that at the 
end of state 4, the top and bottom slicers are in a sub-optimal position inside the eye. While 
ideally the top and bottom samplers should be following the error sampler with ± half the 
eye-height respectively in States 4 and 3, due to lack of eye-height estimation at this point, 
they are following it with only ± 1 LSB (the minimum possible estimation). 
 74 
 
 
To State5
Ideal TH3
THERError Offset 1
%50  1
%50  0 
Ideal Error offset
Ideal Error offset
Initial Condition
Ideal TH1
THER
Error Offset 1
%75  1
%25  0 
Top of the Bottom Eye Detected
State2
THERError Offset 1
Dn[3]=ERn
Monitor Bottom of the Top Eye
State4
THER
Dn[1]  ERn
Bottom of the Bottom Eye Detected
Error Offset 2 
Bottom
State6
THER
Monitor Bottom of the Top eye
State4
THER
%25  1
%75  0 
Bottom of the Top Eye Detected
State1
Error Offset 1
THERError Offset 1
Dn[1]=ERn
Monitor Top of the Bottom Eye
State3
THER
Error Offset 2 
Top
Dn[3]  ERn
Top of the Top Eye Detected
State5
THER
Monitor Top of the Bottom eye
Dn[1]=ERn
State3
Error Offset 3 
Bottom
Error Offset 3 
Top
Figure. 4.11:  Background sampler threshold adaptation algorithm. 
 
 Next, in order to get an estimation of eye-height, the data samplers’ thresholds are fixed 
and the error sampler threshold is increased until discrepancy is detected between error 
and the top outputs, implying the error sampler has reached the top edge of the top eye 
(State 5) resulting in threshold code difference of error offset 3 with top sampler. This is 
repeated to find the bottom of the bottom eye (State 6). At this point, the top and bottom 
eye heights are found and the eye-height can be estimated precisely for top and bottom 
eyes independently. The error offset will be updated to be: 
𝑒𝑟𝑟𝑜𝑟 𝑜𝑓𝑓𝑠𝑒𝑡 =
𝑒𝑟𝑟𝑜𝑟 𝑜𝑓𝑓𝑠𝑒𝑡 1 + 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓𝑓𝑠𝑒𝑡 2 
2
. (4.1) 
 
Next, the bottom/top thresholds are placed in the middle of the corresponding eye 
when the process goes back to State 3 and 4 for monitoring of the top of the bottom eye 
 75 
 
 
and bottom of the top eye, respectively. The algorithm then periodically rotates between 
States 3-6 to track eye-height and optimal threshold position. 
The edge-based DFE tap background adaptation logic tables are shown in Fig. 4.12, 
which is modified from [71] to allow for PAM4 operation and independent per-slice DFE 
FIR-tap control. Similar to the BBPD logic, the DFE tap adaptation works with symmetric 
PAM4 data transitions in order to improve convergence. When a symmetric transition is 
detected, the correlation between the edge sample and the sign of the previous symbols 
determines the residual ISI polarity from the corresponding symbol. As the DFE FIR-tap 
cancels the first post-cursor, if the D-1 symbol polarity matches the edge sample ISI 
polarity, this implies that the tap value is too small and the FIR-tap counter is incremented 
and vice-versa. As PAM4 receivers require improved sensitivity, independent per-slice 
adaptation is implemented for the DFE FIR-taps to compensate for mismatch in the 4 
receiver slices. The DFE IIR-tap amplitude is set in a similar manner utilizing the D-2 
polarity, as this IIR tap compensates for long-tail ISI after the first post cursor. IIR-tap 
time constant is set with the correlation from either D-3 or D-4 and the edge sample. The 
use of one common DFE IIR-tap mux allows for the adaptation of only a single set of IIR 
values. 
 76 
 
 
a
-a
a/3
-a/3
D0 D1D-1D-2D-3D-4
D0 D1D-1D-2D-3D-4
D0 D1D-1D-2D-3D-4
ED0
 
Figure. 4.12. PAM4 DFE FIR and IIR-tap adaptation logic tables. 
 
4.5    Experimental Results 
The PAM4 receiver is fabricated in GP 65nm process, occupying a total active area of 
0.51mm2 as shown in chip micrograph of Fig. 4.13. The receiver BER test setup and 
measured insertion loss of the two test channels’ are shown in Fig. 4.14. A PAM4 pattern 
generator with 1-main and 1-pre-cursor FFE taps generates PRBS15 data which then pass 
 77 
 
 
through channel 1 and channel 2 with 16.1 dB and 20.8 dB of loss at 14GHz respectively. 
The on-die 1/8 rate data MUX at the receiver output allows for independent verification 
of the MSB or LSB outputs which are tested using an NRZ pattern checker. In order to 
measure the timing bathtub curves of the receiver, the CDR is bypassed and receiver is 
clocked through an external clock. In this mode, the half-rate clock is generated by pattern 
generator with phase shift capability to capture BER at different sampling times.  
 
Bypass Clock 
Buffer
CTLE
DFE
4:8
DESER 8:32
DESER
LC VCO
Output
Buffer&
MUX
Adapt.
DAC
& 
MUX
PLL Low Pass Filter
Adaptation Logic
Charge 
Pumps
BBPD
Divider
&
Buffers
Input
Output
 
Figure. 4.13: Chip micrograph of 56Gb/s PAM4 receiver. 
 78 
 
 
Test Channel
RX PCB 
Trace
RX
Chip
PAM4
Pattern generator
Agilent
M8040A 3.5Gb/s
NRZ 
Pattern 
Checker
Agilent
N4903A
56Gb/s
PAM4
Phase Shifter
14GHz External Clock
(For timing bathtub curves only)
 
Figure. 4.14: High speed PAM4 receiver test setup. 
 
Fig. 4.15 (a) shows the transmitter PAM4 pre-channel eye diagram without any 
equalization with 600 mVppd swing. Co-optimizing the 2-tap pre-cursor FFE with the 
receiver equalization results in a completely closed eye at the output of channel 2 (Fig. 
4.15 (b)). 
 79 
 
 
7.1ps100mV 7.1ps
100mV
(a) (b)
Figure. 4.15: (a) 56 Gb/s eye-diagram before channel 2 without equalization and (b) after 
channel 2 with 2-tap pre-cursor FFE. 
 
An on-chip DAC monitors DFE tap coefficients and the sampler thresholds 
convergence. Fig. 4.16 (a) and (b) show the DFE taps convergance for channel 1 and 
channel 2 respectively. In both cases all taps converge within 2µs. The threshold 
convergence for channel 1 and channel 2 is illustrated in Fig. 4.16 (c) and (d). The initial 
threshold procedure completes within 16µs in for both channels. The combined MSB/LSB 
BER timing bathtub curves of receiver is measured using the bypass CDR mode with 2-
tap pre-cursor transmitter FFE and different number of DFE taps enabled  as depicted in 
Fig. 4.17 (a) and (b) for channel 1 and channel 2, respectively. While an optimized 
combination of CTLE and DFE FIR tap allows BER of better than 10-7 for channel 1, due 
to significant ISI from 2nd post cursor and beyond BER is worse than 10-2 for channel 2 
using this equalization setting. Adding the IIR DFE tap allows for efficient higher post-
cursor cancellation achieving 0.22UI and 0.19UI of timing margin at BER=10-12 for 
channel 1 and channel 2 respectively. The voltage bathtub curves of Fig. 4.17 (c) and (d) 
are measured with all taps enabled while the CDR is locked and by changing the threshold 
 80 
 
 
code for the top, middle, and bottom samplers from their ideal position. Voltage margins 
of 23mV and 14mV are achieved at BER=10-12 for channel 1 and channel 2 respectively. 
 
FIR[1:4]
IIR 
time 
constant
IIR 
Amplitude
TH3[1:4]
THER[1]
S
ta
te
 1
S
ta
te
 2
S
ta
te
 3
S
ta
te
 4
S
ta
te
 5
S
ta
te
 6
S
ta
te
 3
S
ta
te
 4
S
ta
te
 5
S
ta
te
 6
TH2[1:4]
TH1[1:4]
S
ta
te
 3
S
ta
te
 4
TH3[1:4]
THER[1]
S
ta
te
 1
S
ta
te
 2
S
ta
te
 3
S
ta
te
 4
S
ta
te
 5
S
ta
te
 6
S
ta
te
 3
S
ta
te
 4
S
ta
te
 5
S
ta
te
 6
TH2[1:4]
TH1[1:4]
S
ta
te
 3
S
ta
te
 4
FIR[1:4]
IIR 
time 
constant
IIR 
Amplitude
Channel 1 Channel 2
(a) (b)
(c) (d)
 
Figure. 4.16: Measured DFE tap adaptation working over (a) channel 1 and (b) channel 2, 
and measured sampler threshold adaptation working over (c) channel 1 and (d) channel 2. 
Note, edge sampler values are omitted and only error sampler#1 is shown for clarity. 
 
 81 
 
 
Channel 1 Channel 2
(a) (b)
(c) (d)
 
Figure. 4.17: Measured 56Gb/s receiver timing bathtub curves working over (a) channel 
1, and (b) channel 2, and receiver voltage bathtub curves working over (c) channel 1, and 
(d) channel 2.  
 
 Fig. 4.18 shows the jitter tolerance of the receiver while working over channel 2 for 
BER=10-9. CDR shows more than 6 MHz of bandwidth with 0.12UI of high frequency 
jitter tolerance exceeding CEI-56G-VSR requirements. It should also be noted that CEI-
56G-VSR spec only requires a BER=10-6.  
 82 
 
 
 
Figure. 4.18: Measured PAM4 jitter tolerance working over channel 2. 
 
Fig. 4.19 shows the 56 Gb/s power breakdown of the receiver. The receiver consumes 
259 mW of power, with CML comparators and clocking circuits having the most 
contribution. Table 4.1 summarizes the receiver performance [83] and compares it with 
other PAM4 receivers operating near 56Gb/s. The receiver achieves a power efficiency of 
4.63mW/Gb/s, which is superior to the ADC-based design of [77] and the mixed-signal 
front-end of [79] which utilizes a 2-stage CTLE and an additional TX FFE tap. Employing 
the DFE IIR-tap allows for a reduction in the total tap count relative to [78], while also 
extending the maximum supported channel loss. 
 83 
 
 
Deserializers 
and 
Adaptation
11%
Comparators
61%
CDR and 
Clocking
19%
CTLE
5%
Others
 1%
IIR 
Summer 
and MUX
2%
 
Figure. 4.19: 56Gb/s power breakdown of the receiver. 
 
Table 4.1: Performance Summary 
References [77] [78] [79] This Work 
Technology CMOS 16nm FinFET CMOS 16nm FinFET 40nm CMOS 65nm CMOS 
Data-Rate 56Gb/s 40-56Gb/s 56Gb/s 56Gb/s 
Data Format PAM4 PAM4 PAM4 PAM4 
Equalization 
CTLE 
ADC based DFE & 
FFE 
CTLE 
10-tap DFE 
CTLE 
3-tap DFE 
CTLE 
1-tap FIR & 
 1-tap IIR DFE 
Maximum 
CTLE Peaking 
2-stage 
14dB 
2-stage 
16 dB 
2-stage 
9 dB 
1-Stage 
6 dB 
Channel-Loss 31dB1 10dB2 24dB1 20.8dB2 
Area 
2.8mm2 
(2 TX/RX) 
0.364mm2 1.26mm2 0.51mm2 
Supply 
0.9V/1.2V/1.8V 
(digital/analog/aux) 
0.9V/1.2V/1.8V 
(digital/analog/aux) 
1V 1.2V 
Power 
Consumption 
370mW 
(RX excl. DSP) 
230mW 382mW 259mW 
Power 
Efficiency 
6.6mW/Gb/s 4.1mW/Gb/s 6.82mW/Gb/s 4.63mW/Gb/s 
1 Including 3-tap TX FFE equalization 
2 Including 2-tap TX FFE equalization 
 
 84 
 
 
 
 
4.6    Conclusion 
Section 4 presented a 56Gb/s PAM4 quarter-rate receiver which employs a single-stage 
CTLE and a DFE with 1 FIR and 1 IIR-taps. In addition to main three samplers for PAM4 
data detection, one edge sampler per slice is utilized for PLL based CDR phase detection 
and DFE tap adaptation with independent per-slice values for required PAM4 sensitivity, 
and one error sampler is utilized that periodically scans the top and bottom of the PAM4 
eyes. Overall, the proposed PAM4 receiver architecture enables transmission over 
channels with up to 20dB of loss at Nyquist requiring only a 2-tap pre-cursor transmitter 
side FFE while improving the power efficiency compared to state-of-the-art receivers 
operating at similar data-rates over channels with comparable channel loss. 
 
 
 
 
 
 
 
 
 
 
 
 85 
 
 
5. CONCLUSION 
 
Mixed signal transceivers can provide low power solutions compared to ADC based 
counterparts. However, additional complexities associated with PAM4 modulation 
including linearity, sensitivity, and multi-level ISI cancellation issues can results in 
significant reduction in transceiver power efficiency. This requires serious consideration 
of different aspects of system level and circuit level design. This dissertation proposes 
design techniques for power efficient >32Gb/s mixed signal transceivers. One PAM4 
transmitter and one receiver prototype have been designed, fabricated as a part of this 
research. 
The first prototype includes a dual-mode NRZ/PAM4 serial I/O transmitter achieving 
16Gb/s NRZ and 32Gb/s PAM4 operation at 10.4 and 4.9 mW/Gb/s while operating over 
channels with 27.6 and 13.5dB loss at Nyquist, respectively. The source-series-terminated 
(SST) transmitter utilizes lookup table (LUT) based digital-to-analog converter (DAC) to 
implement 4/2-tap feed-forward equalization (FFE) in NRZ/PAM4 modes, respectively. 
A low-overhead analog impedance control is proposed along with a quarter-rate serializer 
based on a tri-state inverter-based mux with dynamic pre-driver gates. The transmitter is 
fabricated in GP 65-nm CMOS, the transmitter occupies 0.060mm2 area. 
 The second prototype presents a 56Gb/s four-level pulse amplitude modulation (PAM4) 
quarter-rate wireline receiver achieving 4.63 mW/Gb/s power efficiency while operating 
over a channel with 20.8dB, implemented in a 65nm CMOS process. The proposed 
receiver utilizes a single stage continuous time linear equalizer (CTLE) along with a 2-tap 
decision feedback equalizer (DFE) with one finite impulse response (FIR) tap and one 
 86 
 
 
infinite impulse response (IIR) taps. The FIR tap direct feedback is implemented inside 
the CML slicers to relax the critical timing of DFE and maximize the achievable data-rate.  
The prototype utilize only one error sampler and one edge sampler in addition to 3 main 
samplers to implement PLL-based CDR phase detection,  threshold adaptation and DFE 
tap adaptation. A novel threshold adaptation is employed in this design, while the edge 
based adaptation of [71] is extended for PAM4 modulation. 
Leveraging the proposed architectures and design techniques, PAM4 transceivers can 
be implemented to compensate for more than 20dB of channel loss, while achieving low 
power efficiency. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 87 
 
 
REFERENCES 
[1]  K. Kibaroglu, M. Sayginer and G. M. Rebeiz, "A Low-Cost Scalable 32-Element 28-
GHz Phased Array Transceiver for 5G Communication Links Based on a 2x2 Beamformer 
Flip-Chip Unit Cell," IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 1260-1274, 
May 2018.  
[2] P. Sepidband and K. Entesari, "A CMOS Wideband Receiver Resilient to Out-of-Band 
Blockers Using Blocker Detection and Rejection," IEEE Transactions on Microwave 
Theory and Techniques, early access, Jan. 2018. 
[3] Y. Liu, P. Roblin, X. Quan, W. Pan, S. Shao and Y. Tang, "A Full-Duplex Transceiver 
With Two-Stage Analog Cancellations for Multipath Self-Interference," IEEE 
Transactions on Microwave Theory and Techniques, vol. 65, no. 12, pp. 5263-5273, Dec. 
2017. 
[4] P. Sepidband and K. Entesari, "A CMOS Real-Time Spectrum Sensor Based on 
Phasers for Cognitive Radios," IEEE Transactions on Microwave Theory and Techniques, 
vol. 66, no. 3, pp. 1440-1451, March 2018. 
[5] T. Fujibayashi et al., "A 76- to 81-GHz Multi-Channel Radar Transceiver," IEEE 
Journal of Solid-State Circuits, vol. 52, no. 9, pp. 2226-2241, Sept. 2017. 
[6] H. N. Nguyen, K. S. Kim, S. H. Han, J. Y. Lee, C. Kim and S. G. Lee, "A Low-Power 
Interference-Tolerance Wideband Receiver for 802.11af/ah Long-Range Wi-Fi With Post-
LNA Active N-Path Filter," IEEE Transactions on Microwave Theory and Techniques, 
early access, Feb. 2018. 
 88 
 
 
[7] P. Sepidband and K. Entesari, "A CMOS UWB receiver with reconfigurable notch 
filters for narrow-band interferers," IEEE Radio Frequency Integrated Circuits 
Symposium (RFIC), Honolulu, HI, 2017, pp. 356-359. 
[8] P. Sepidband and K. Entesari, "A phaser-based real-time CMOS spectrum sensor for 
cognitive radios," IEEE Radio Frequency Integrated Circuits Symposium (RFIC), San 
Francisco, CA, 2016, pp. 274-277. 
[9] S. Lee, D. Jeong and B. Kim, "Ultralow-Power 2.4-GHz Receiver With All Passive 
Sliding-IF Mixer," IEEE Transactions on Microwave Theory and Techniques, early 
access, Feb. 2018. 
[10] P. Sepidband and K. Entesari, "A CMOS Spectrum Sensor Based on Quasi-
Cyclostationary Feature Detection for Cognitive Radios," IEEE Transactions on 
Microwave Theory and Techniques, vol. 63, no. 12, pp. 4098-4109, Dec. 2015. 
[11] Y. C. Lien, E. A. M. Klumperink, B. Tenbroek, J. Strange and B. Nauta, "Enhanced-
Selectivity High-Linearity Low-Noise Mixer-First Receiver With Complex Pole Pair Due 
to Capacitive Positive Feedback," IEEE Journal of Solid-State Circuits, vol. 53, no. 5, pp. 
1348-1360, May 2018.  
[12] S. Mondal, R. Singh, A. I. Hussein and J. Paramesh, "A 25-30 GHz Fully-Connected 
Hybrid Beamforming Receiver for MIMO Communication," IEEE Journal of Solid-State 
Circuits, vol. 53, no. 5, pp. 1275-1287, May 2018. 
[13] A. Roshan-Zamir, B.Wang, S. Telaprolu, K. Yu, C. Li, M. A. Seyedi, M. Fiorentino, 
R. Beasoleil, and S. Palermo, "A 40 Gb/s PAM4 silicon microring resonator modulator 
 89 
 
 
transmitter in 65nm CMOS," IEEE Optical Interconnects Conference (OI), San Diego, 
CA, 2016, pp. 8-9. 
[14] A. Roshan-Zamir, B.Wang, S. Telaprolu, K. Yu, C. Li, M. A. Seyedi, M. Fiorentino, 
R. Beasoleil, and S. Palermo, "A two-segment optical DAC 40 Gb/s PAM4 silicon 
microring resonator modulator transmitter in 65nm CMOS," IEEE Optical Interconnects 
Conference (OI), Santa Fe, NM, 2017, pp. 5-6. 
[15] Y. Tsunoda et al., "A 40-Gb/s VCSEL transmitter for optical interconnect with group-
delay compensation pre-emphasis," OSA Optical Fiber Communication Conference, San 
Francisco, CA, 2014, pp. 1-3. 
[16] S. Palermo, K. Yu,  A. Roshan-Zamir, B.Wang, C. Li, M. A. Seyedi, M. Fiorentino, 
and R. Beasoleil, "PAM4 silicon photonic microring resonator-based transceiver circuits," 
SPIE Photonics West, Jan. 2017. 
[17] A. Roshan-Zamir, B. Wang, K. Yu, S. Telaprolu, C. Li, M. A. Seyedi, M. Fiorentino, 
R. Beausoleil, and S. Palermo, ”A 40Gb/s PAM4 Optical DAC Silicon Microring 
Resonator Modulator Transmitter ,” IEEE International Midwest Symposium on Circuits 
and Systems, Aug. 2017. 
[18] S. Moazeni et al., " A 40Gb/s PAM-4 transmitter based on a ring-resonator optical 
DAC in 45nm SOI CMOS," IEEE International Solid-State Circuits Conference (ISSCC), 
San Francisco, CA, 2017, pp. 486-487.  
[19] A. Roshan-Zamir, K. Yu, D. Liang, C. Zhang, , C. Li, G. Fan, B.Wang, M. Fiorentino, 
R. Beausoleil, and S. Palermo ”A 14 Gb/s Directly Modulated Hybrid Microring Laser 
 90 
 
 
Transmitter,” OSA Optical Fiber Communication Conference, San Diego, CA, 2018, pp. 
1-3.  
[20] D. Liang, C. Zhang, A. Roshan-Zamir, K. Yu, C. Li, G. Kurczveil, Y. Hu, W. Shen, 
M. Fiorentino, S. Kumar, S. Palermo, and R. Beausoleil, ”A Fully-integrated Multi-λ 
Hybrid DML Transmitter,” OSA Optical Fiber Communication Conference, San Diego, 
CA, 2018, pp. 1-3.  
[21] V. Stojanovic, “Channel-limited high-speed links: Modeling, analysis and design," 
Ph.D. dissertation, Stanford University, Stanford, CA, Sep. 2004. 
[22] A. Nazemi, K. Hu, B, Catli, D. Cui, U. Singh, T. He, Z. Huang, B. Zhang, A.Momtaz, 
and J. Cao "3.4 A 36Gb/s PAM4 transmitter using an 8b 18GS/S DAC in 28nm 
CMOS," IEEE International Solid-State Circuits Conference - (ISSCC) Digest of 
Technical Papers, San Francisco, CA, 2015, pp. 1-3. 
[23] A. Sheikholeslami, "ADC-based receiver designs: Challenges and opportunities," 
IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS), Miami, FL, 
2017, pp. 1-4. 
[24] S. Palermo, S. Hoyos, A. Shafik, E. Z. Tabasy, S. Cai, S. Kiran, and K. Lee, "CMOS 
ADC-based receivers for high-speed electrical and optical links," IEEE Communications 
Magazine, vol. 54, no. 10, pp. 168-175, October 2016. 
[25] A. Shafik, E. Zhian Tabasy, S. Cai, K. Lee, S. Hoyos and S. Palermo, "A 10 Gb/s 
Hybrid ADC-Based Receiver With Embedded Analog and Per-Symbol Dynamically 
Enabled Digital Equalization," IEEE Journal of Solid-State Circuits, vol. 51, no. 3, pp. 
671-685, March 2016. 
 91 
 
 
[26] K. L. Chan, K. H. Tan, Y. Frans, J. Im, P. Upadhyaya, S. W. Lim, A. Roldan, N. 
Narang, C. Y. Koay, H. Zhao, P.-C. Chiang, and K. Chang,  "A 32.75-Gb/s Voltage-Mode 
Transmitter With Three-Tap FFE in 16-nm CMOS," IEEE Journal of Solid-State Circuits, 
vol. 52, no. 10, pp. 2663-2678, Oct. 2017. 
[27] G. Steffan, E. Depaoli, E. Monaco, N. Sabatino, W. Audoglio, A. A. Rossi, M. Bassi, 
A, Mazzanti, "6.4 A 64Gb/s PAM-4 transmitter with 4-Tap FFE and 2.26pJ/b energy 
efficiency in 28nm CMOS FDSOI," IEEE International Solid-State Circuits Conference 
(ISSCC), San Francisco, CA, 2017, pp. 116-117. 
[28] H. W. Yang, A. Roshan-Zamir, Y. H. Song and S. Palermo, "A low-power dual-mode 
20-Gb/s NRZ and 28-Gb/s PAM-4 voltage-mode transmitter," IEEE Asian Solid-State 
Circuits Conference (A-SSCC), Seoul, 2017, pp. 261-264. 
[29] A. Roshan-Zamir, O. Elhadidy, H. W. Yang and S. Palermo, "A Reconfigurable 16/32 
Gb/s Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS," IEEE Journal of Solid-State 
Circuits, vol. 52, no. 9, pp. 2430-2447, Sept. 2017. 
[30] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno and D. J. Friedman, 
"A 19-Gb/s Serial Link Receiver With Both 4-Tap FFE and 5-Tap DFE Functions in 45-
nm SOI CMOS," IEEE Journal of Solid-State Circuits, vol. 47, no. 12, pp. 3220-3231, 
Dec. 2012. 
[31] Y. Wang, Z. Li, J. Zhuang, C. Zhi and C. P. Yue, "A 26-Gb/s 8.1-mW receiver with 
linear sampling phase detector for data and edge equalization," Symposium on VLSI 
Circuits, Kyoto, 2017, pp. C324-C325. 
 92 
 
 
[32] J. Han, Y. Lu, N. Sutardja and E. Alon, "6.2 A 60Gb/s 288mW NRZ transceiver with 
adaptive equalization and baud-rate clock and data recovery in 65nm CMOS technology," 
IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, 
pp. 112-113. 
[33] J. Han, Y. Lu, N. Sutardja, K. Jung and E. Alon, "Design Techniques for a 60 Gb/s 
173 mW Wireline Receiver Frontend in 65 nm CMOS Technology," IEEE Journal of 
Solid-State Circuits, vol. 51, no. 4, pp. 871-880, April 2016. 
[34] J. Han, N. Sutardja, Y. Lu and E. Alon, "Design Techniques for a 60-Gb/s 288-mW 
NRZ Transceiver With Adaptive Equalization and Baud-Rate Clock and Data Recovery 
in 65-nm CMOS Technology," IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 
3474-3485, Dec. 2017. 
[35] M. Hekmat, S. Song, N. Jaffari, S. Sankaranarayanan, C. Huang, M. Han, G. 
Malhotra, J. Kamali, A. Amirkhany, and W. Xiong, "A 6Gb/s 3-tap FFE transmitter and 
5-tap DFE receiver in 65nm/0.18??m CMOS for next-generation 8K displays," IEEE 
International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 402-
403. 
[36] T. Shibasaki, T. Danjo, Y. Ogata, Y. Sakai, H. Miyaoka, F. Terasawa, M. Kudo, H. 
Kano,A. Mastsuda, S. Kawai, T. Arai, H. Higashi, N.Naka, H. Yamaguchi, T. Mori, Y. 
Koyanagi, and H. Tamura, "3.5 A 56Gb/s NRZ-electrical 247mW/lane serial-link 
transceiver in 28nm CMOS," IEEE International Solid-State Circuits Conference 
(ISSCC), San Francisco, CA, 2016, pp. 64-65. 
 93 
 
 
[37] A. Manian and B. Razavi, "A 40-Gb/s 14-mW CMOS Wireline Receiver," IEEE 
Journal of Solid-State Circuits, vol. 52, no. 9, pp. 2407-2421, Sept. 2017. 
[38] S. Hwang, S. Moon, J. Song, and C.  Kim, "A 32 Gb/s Rx only equalization 
transceiver with 1-tap speculative FIR and 2-tap direct IIR DFE," IEEE Symposium on 
VLSI Circuits (VLSI-Circuits), Honolulu, HI, 2016, pp. 1-2. 
[39] B. Zhang, K. Khanoyan, H. Hatamkhani, H. Tong, K. Hu, S. Fallahi, M. Abdul-Latif, 
K. Vakilian, I. Fujimori, and A. Brewster, "A 28 Gb/s Multistandard Serial Link 
Transceiver for Backplane Applications in 28 nm CMOS," IEEE Journal of Solid-State 
Circuits, vol. 50, no. 12, pp. 3089-3100, Dec. 2015. 
[40] K. Y. Chen, W. Y. Chen and S. I. Liu, "A 0.31-pJ/bit 20-Gb/s DFE With 1 Discrete 
Tap and 2 IIR Filters Feedback in 40-nm-LP CMOS," IEEE Transactions on Circuits and 
Systems II: Express Briefs, vol. 64, no. 11, pp. 1282-1286, Nov. 2017. 
[41] O. Elhadidy and S. Palermo, "A 10 Gb/s 2-IIR-tap DFE receiver with 35 dB loss 
compensation in 65-nm CMOS," Symposium on VLSI Circuits, Kyoto, 2013, pp. C272-
C273. 
[42] Optical Internetworking Forum (OIF), CEI-56G-VSR-PAM4 Very Short Reach 
Interface, Contribution, document OIF 2014.230.07, Jun.2016. 
[43] IEEE P802.3bs 200 Gb/s and 400 Gb/s Ethernet Task Force, accessed on Nov. 2016. 
[Online]. Available: http://www.ieee802.org/3/bs/ 
[44] B. K. Casper, M. Haycock, and R. Mooney, “An accurate and efficient analysis 
method for multi-Gb/s chip-to-chip signaling schemes,” VLSI Circuits Digest of Technical 
Papers, pp. 54–57, June 2002. 
 94 
 
 
[45] C. –H. Yang, “Design of High-Speed Serial Links in CMOS," Technical Report, 
Stanford University, Stanford, CA, Dec. 1998. 
[46] H. Li, Z. Xuan, A. Titriku, C. Li, K. Yu, B. Wang, A. Shafik, N. Qi, Y. Liu, R. Ding, 
T. Baehr-Jones, M. Fiorentino, M. Hochberg, S. Palermo, and P. Chiang, "A 25 Gb/s 4.4 
V-swing AC-coupled ring modulator-based WDM transmitter with wavelength 
stabilization in 65 nm CMOS", IEEE Journal of Solid-State Circuits, vol. 50, no. 12, pp. 
3145-3159, Dec. 2015. 
[47] C. –K. Yang, and M. A. Horowitz, "A 0.8-μm CMOS 2.5 Gb/s oversampling receiver 
and transmitter for serial links," IEEE Journal of Solid-State Circuits, vol. 31, no. 12, pp. 
2015-2023, Dec 1996. 
[48] M. J. E. Lee, W. J. Dally and P. Chiang, "Low-power area-efficient high-speed I/O 
circuit techniques," IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1591-1599, 
Nov. 2000. 
[49] J. Lee, P. C. Chiang, P. J. Peng, L. Y. Chen and C. C. Weng, "Design of 56 Gb/s NRZ 
and PAM4 SerDes Transceivers in CMOS Technologies," IEEE Journal of Solid-State 
Circuits, vol. 50, no. 9, pp. 2061-2073, Sept. 2015. 
[50] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. V. Bueren, L. Rodoni, T. Morf, T. 
Toifl, and M. Schmatz,  "A T-Coil-Enhanced 8.5 Gb/s High-Swing SST Transmitter in 65 
nm Bulk CMOS With <16 dB Return Loss Over 10 GHz Bandwidth," IEEE Journal of 
Solid-State Circuits, vol. 43, no. 12, pp. 2905-2920, Dec. 2008. 
[51] K. Fukuda, H. Yamashita, F. Yuki, M. Yagyu, R. Nemoto, T. Takemoto, T. Saito, N. 
Chujo, K. Yamamoto, H. Kanai, and A. Hayashi, "An 8Gb/s Transceiver with 3x-
 95 
 
 
Oversampling 2-Threshold Eye-Tracking CDR Circuit for -36.8dB-loss Backplane," 
IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San 
Francisco, CA, 2008, pp. 98-598. 
[52] Y. H. Song, H. W. Yang, H. Li, P. Y. Chiang and S. Palermo, "An 8–16 Gb/s, 0.65–
1.05 pJ/b, Voltage-Mode Transmitter With Analog Impedance Modulation Equalization 
and Sub-3 ns Power-State Transitioning," IEEE Journal of Solid-State Circuits, vol. 49, 
no. 11, pp. 2631-2643, Nov. 2014. 
[53] S. Gondi and B. Razavi, "Equalization and Clock and Data Recovery Techniques for 
10-Gb/s CMOS Serial-Link Receivers," IEEE Journal of Solid-State Circuits, vol. 42, no. 
9, pp. 1999-2011, Sept. 2007. 
[54] Y. Lu and E. Alon, "Design Techniques for a 66 Gb/s 46 mW 3-Tap Decision 
Feedback Equalizer in 65 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 48, no. 
12, pp. 3243-3257, Dec. 2013. 
[55] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, “A current-mode latch sense 
amplifier and a static power saving inpu tbuffer for low-power architecture,” in Proc. VLSI 
Circuits Symp. Dig. Technical Papers, June 1992, pp. 28–29. 
[56] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl and B. Nauta, "A Double-Tail 
Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time," IEEE International 
Solid-State Circuits Conference. Digest of Technical Papers, San Francisco, CA, 2007, 
pp. 314-605. 
 96 
 
 
[57] O. Elhadidy, A. Roshan-Zamir, H. W. Yang and S. Palermo, "A 32 Gb/s 0.55 
mW/Gbps PAM4 1-FIR 2-IIR tap DFE receiver in 65-nm CMOS," Symposium on VLSI 
Circuits (VLSI Circuits), Kyoto, 2015, pp. C224-C225. 
[58] M. Mizuno, M. Yamashina, K. Furuta,H. Igura, H. Abiko, K. Okabe, A. Ono, and 
H. Yamada, "A GHz MOS adaptive pipeline technique using MOS current-mode logic," 
IEEE Journal of Solid-State Circuits, vol. 31, no. 6, pp. 784-791, Jun 1996. 
[59] The zettabyte era: trends and analysis, 2015 [Online]. Available: 
http://www.cisco.com. 
[60] J. Lee, M. S. Chen, and H. D. Wang, “Design and comparison of three 20-Gb/s back-
plane transceivers for duobinary, PAM4, and NRZ data,” IEEE Journal of Solid-State 
Circuits, vol. 43, pp. 2120–2133, Sept 2008. 
[61] K. Gopalakrishnan, A. Ren, A. Tan, A. Farhood, A. Tiruvur, B. Helal, C. F. Loi, C. 
Jiang, H. Cirit, I. Quek, J. Riani, J. Gorecki, J. Wu, J. Pernillo, L. Tse, M. Le, M. Ranjbar, 
P. S. Wong, P. Khandelwal, R. Narayanan, R. Mohanavelu, S. Herlekar, S. Bhoja, and V. 
Shvydun, “A 40/50/100Gb/s PAM-4 ethernet transceiver in 28nm CMOS,” in ISSCC Dig. 
Tech. Papers, pp. 62–63, Jan 2016. 
[62] J. Kim, A. Balankutty, A. Elshazly, Y. Y. Huang, H. Song, K. Yu, and F. O’Mahony, 
“A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS,” in 
ISSCC Dig. Tech. Papers, pp. 60–61, Feb 2015. 
[63] M. Bassi, F. Radice, M. Bruccoleri, S. Erba, and A. Mazzanti, “A 45Gb/s PAM-4 
transmitter delivering 1.3Vppd output swing with 1V supply in 28nm CMOS FDSOI,” in 
ISSCC Dig. Tech. Papers, pp. 66–67, Jan 2016. 
 97 
 
 
[64] H. Yueksel, L. Kull, A. Burg, M. Braendli, P. Buchmann, P. A. Francese, C. Menolfi, 
M. Kossel, T. Morf, T. M. Andersen, D. Luu, and T. Toifl, “A 3.6pJ/b 56Gb/s 4-PAM 
receiver with 6-Bit TI-SAR ADC and quarter-rate speculative 2-tap DFE in 32 nm 
CMOS,” in European Solid-State Circuits Conference, pp. 148–151, Sept 2015. 
[65] D. Cui, H. Zhang, N. Huang, A. Nazemi, B. Catli, H. G. Rhew, B. Zhang, A. Momtaz, 
and J. Cao, “A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with 
programmable gain control and analog peaking in 28nm CMOS,” in ISSCC Dig. Tech. 
Papers, pp. 58–59, Jan 2016. 
[66] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, J. 
Weiss, and M. L. Schmatz, “A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI 
technology,” IEEE Journal of Solid-State Circuits, vol. 41, pp. 954–965, April 2006. 
[67] A. A. Hafez, M. S. Chen, and C. K. K. Yang, “A 32-to-48Gb/s serializing transmitter 
using multiphase sampling in 65nm CMOS,” in ISSCC Dig. Tech. Papers, pp. 38-39, Feb. 
2013. 
[68] B. Dehlaghi and A. C. Carusone, “A 0.3 pJ/bit 20 Gb/s/wire parallel interface for die-
to-die communication,” IEEE Journal of Solid-State Circuits, vol. 51, no. 11, pp. 2690–
2701, Nov. 2016. 
[69] S. Shahramian and A. C. Carusone, “A 0.41 pJ/Bit 10 Gb/s hybrid 2 IIR and 1 
discrete- time DFE tap in 28 nm-LP CMOS,” IEEE Journal of Solid-State Circuits, vol. 
50, pp. 1722–1735, July 2015. 
[70] J. F. Bulzacchelli, C. Menolfi, T. J. Beukema, D. W. Storaska, J. Hertle, D. R. 
Hanson, P. H. Hsieh, S. V. Rylov, D. Furrer, D. Gardellini, A. Prati, T. Morf, V. Sharma, 
 98 
 
 
R. Kelkar, H. A. Ainspan, W. R. Kelly, L. R. Chieco, G. A. Ritter, J. A. Sorice, J. D. 
Garlett, R. Callan, M. Brandli, P. Buchmann, M. Kossel, T. Toifl, and D. J. Friedman, “A 
28-Gb/s 4-tap FFE/15-Tap DFE serial link transceiver in 32-nm SOI CMOS technology,” 
IEEE Journal of Solid-State Circuits, vol. 47, pp. 3232–3248, Dec 2012. 
[71] S. Shahramian, B. Dehlaghi, and A. C. Carusone, “A 16Gb/s 1 IIR + 1 DT DFE 
compensating 28dB loss with edge-based adaptation converging in 5us,” in ISSCC Dig. 
Tech. Papers, pp. 410–411, Jan 2016. 
[72]A. Roshan-Zamir, O. Elhadidy, H.-W. Yang, and S. Palermo, “A 16/32 Gb/s dual-
mode NRZ/PAM4 SerDes in 65nm CMOS,” in Proc. IEEE Compound Semi. IC Symp., 
pp. 1-4, Oct. 2016. 
[73] E. Mensink, D. Schinkel, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, “Power 
efficient gigabit communication over capacitively driven RC-limited on-chip 
interconnects,” IEEE Journal of Solid-State Circuits, vol. 45, pp. 447–457, Feb 2010. 
[74] B. Kim, Y. Liu, T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, “A 10-Gb/s 
compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS,” IEEE 
Journal of Solid-State Circuits, vol. 44, pp. 3526–3538, Dec 2009. 
[75] S. Shahramian, H. Yasotharan, and A. C. Carusone, “Decision feedback equalizer 
architectures with multiple continuous-time infinite impulse response filters,” IEEE 
Transactions on Circuits and Systems II: Express Briefs, vol. 59, pp. 326–330, June 2012. 
[76] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, 
P. M. Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Mad- 
den, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stehpany, and S. C. 
 99 
 
 
Thierauf, “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,” IEEE Journal of 
Solid-State Circuits, vol. 31, pp. 1703–1714, Nov 1996. 
 [77] Y. Frans J. Shin, L. Zhou, P. Upadhyaya, J. Im, V. Kireev, M. Elzeftawi, H. Hedayati, 
T. Pham, S. Asuncion, C. Borrelli, G. Zhang, H. Zhang, and Ken Chang, "A 56-Gb/s 
PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm 
FinFET," IEEE Journal of Solid- State Circuits, vol. 52, no. 4, pp. 1101-1110, April 2017. 
[78] J. Im, D. Freitas, A. B. Roldan, R. Casey, S. Chen, C.-H. A. Chou, T. Cronin, K. 
Geary, S. McLeod, L. Zhou, I. Zhuang, J. Han, S. Lin, P. Upadhyaya, G. Zhang, Y. Frans, 
and K. Chang, "A 40-to-56 Gb/s PAM-4 Receiver With Ten-Tap Direct Decision-
Feedback Equalization in 16-nm FinFET," IEEE Journal of Solid-State Circuits, vol. 52, 
no. 12, pp. 3486-3502, Dec. 2017. 
[79] P. J. Peng, J. F. Li, L. Y. Chen and J. Lee, "6.1 A 56Gb/s PAM-4/NRZ transceiver in 
40nm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2017, pp. 110-111. 
[80] L. Tang, W. Gai, L. Shi, X. Xiang, K. Sheng, and A. He, “A 32Gb/s 133mW PAM4 
transceiver with DFE based on adaptive clock phase and threshold voltage in 65nm 
CMOS,” in ISSCC Dig. Tech. Papers, Feb 2018. 
[81] N. Qi , Y. Kang, Q. Lin, J. Ma, J. Shi, B. Yin, C. Liu,R. Bai, S. Hu, J. Wang, J. Du, 
L. Ma, Z. He, M. Liu, F. Zhang, P. Y. Chiang, "A 51Gb/s, 320mW, PAM4 CDR with 
baud-rate sampling for high-speed optical interconnects," in IEEE Asian Solid-State 
Circuits Conference (A-SSCC), Nov. 2017, pp. 89-92. 
 100 
 
 
[82] E. Hegazi, H. Sjoland and A. A. Abidi, "A filtering technique to lower LC oscillator 
phase noise," IEEE Journal of Solid-State Circuits, vol. 36, no. 12, pp. 1921-1930, Dec 
2001. 
 [83] A. Roshan-Zamir, T. Iwai, Y.-H. Fan, A. Kumar, H.-W. Yang, L. Sledjeski, J. 
Hamilton, S. Chandramouli, A. Aude, and S. Palermo, “A 56 Gb/s PAM4 Receiver with 
Low-Overhead Threshold and Edge-Based DFE FIR and IIR-Tap Adaptation in 65nm 
CMOS,” IEEE Custom Integrated Circuits Conference, Apr. 2018. 
