Modeling and Design of Architectures for High-Speed ADC-Based  Serial Links by Shiva Kiran, FNU





Submitted to the Office of Graduate and Professional Studies of 
Texas A&M University 
in partial fulfillment of the requirements for the degree of 
DOCTOR OF PHILOSOPHY 
Chair of Committee,  Sebastian Hoyos 
Co-Chair of Committee, Samuel Palermo 
Committee Members, Sunil Khatri 
Jorge Alvarado 
Head of Department, Miroslav M. Begovic 
December 2018 
Major Subject: Electrical Engineering 
Copyright 2018 Shiva Kiran
ii 
ABSTRACT 
There is an ongoing dramatic rise in the volume of internet traffic. Standards such as 
56Gb/s OIF very short reach (VSR), medium reach (MR) and long reach (LR) standards for chip 
to chip communication over channels with up to 10dB, 20dB and 30dB insertion loss at the PAM-
4 Nyquist frequency, respectively, are being adopted. These standards call for the spectrally 
efficient PAM-4 signaling over NRZ signaling. PAM-4 signaling offers challenges such as a 
reduced SNR at the receiver, susceptibility to nonlinearities and increased sensitivity to residual 
ISI. Equalization provided by traditional mixed signal architectures can be insufficient to achieve 
the target BER requirements for very long reach channels. ADC-based receiver architectures for 
PAM-4 links take advantage of the more powerful equalization techniques, which lend themselves 
to easier and robust digital implementations, to extend the amount of insertion loss that the receiver 
can handle. However, ADC-based receivers can consume more power compared to mixed-signal 
implementations. Techniques that model the receiver performance to understand the various 
system trade-offs are necessary. 
This research presents a fast and accurate hybrid modeling framework to efficiently 
investigate system trade-offs for an ADC-based receiver. The key contribution being the addition 
of ADC related non-idealities such as quantization noise in the presence of integral and differential 
nonlinearities, and time-interleaving mismatch errors such as gain mismatch, bandwidth 
mismatch, offset mismatch and sampling skew. 
The research also presents a 52Gb/s ADC-based PAM-4 receiver prototype employing 
a 32-way time-interleaved, 2-bit/stage, 6-bit SAR ADC and a DSP with a 12-tap FFE and a 2-tap 
DFE. A new DFE architecture that reduces the complexity of a PAM-4 DFE to that of an NRZ 
iii 
DFE while simultaneously nearly doubling the maximum achievable data rate is presented. The 
receiver architecture also includes an analog front-end (AFE) consisting of a programmable two 
stage CTLE. A digital baud-rate CDR’s utilizing a Mueller-Muller phase detector sets the sampling 
phase. Measurement results show that for 32Gb/s operation a BER < 10-9 is achieved for a 30dB 
loss channel while for 52 Gb/s operation achieves a BER < 10-6 for a 31dB loss channel with a 
power efficiency of 8.06pj/bit. 
iv 
ACKNOWLEDGEMENTS 
As my long PhD journey winds up, I would like to thank all the people who have been part 
of this journey and have contributed to it in their own unique ways. 
First and foremost I would like to thank my advisor, Dr. Sebastian Hoyos, for giving me 
the tremendous opportunity of working under his exceptional guidance. I have benefitted greatly 
from both his technical intuition and his warm, supportive nature.This work would not have been 
possible without Dr. Samuel Palermo’s passionate, hands-on involvement. His technical expertise, 
his passion for perfection and his meticulously organized approach to work is an inspiration to all 
his students. Perhaps, the most important lesson I have learnt from him during my time here is to 
always take pride in work. 
I would like to express my sincere gratitude to Ayman and Ehsan for being example 
students to emulate when I first joined this group. I thank them for all the discussions we have had 
over the years and the direct and indirect contributions they have made to my work. I would also 
like to thank Ashkan for being a great companion when we travelled to San Diego in 2018. 
I would like to thank James Jaussi, Tzu-Chien, Frank, and Ajay of Intel for the wonderful 
learning experiences they provided me during my internships at Intel in 2015 and 2017. Special 
thanks goes to them for the lab support they provided for testing our chip. 
I consider myself very fortunate to have had Shengchang as my project partner. While his 
work on the project was outstanding, his resourcefulness and his calmness when things went wrong 
were something I could always bank on. The stressful tape out deadlines and long testing nights 
actually turned out quite enjoyable due to him. I thank him for being the best project partner one 
could hope to have. 
v 
Finally, I would like to thank my parents and my sister for being the pillars on which I can 
stand securely. I am indebted for their unconditional love and support and for that I dedicate this 
dissertation to them. 
vi 
CONTRIBUTORS AND FUNDING SOURCES 
Contributors 
This work was supported by a dissertation committee consisting of Professor Sebastian 
Hoyos (advisor), Professor Samuel Palermo (co-advisor) and Professor Sunil Khatri of the 
Department of Electrical and Computer Engineering and Professor Jorge Alvarado of 
Department of Engineering Technology and Industrial Distribution. 
Funding Sources 
Graduate study was supported in part by Intel under Task.2583.001. 
vii 
TABLE OF CONTENTS 
Page 
ABSTRACT .................................................................................................................................... ii 
ACKNOWLEDGEMENTS ........................................................................................................... iv 
CONTRIBUTORS AND FUNDING SOURCES ......................................................................... vi 
TABLE OF CONTENTS .............................................................................................................. vii 
LIST OF FIGURES ....................................................................................................................... ix 
LIST OF TABLES ....................................................................................................................... xiii 
1. INTRODUCTION .......................................................................................................................1
2. BACKGROUND .........................................................................................................................5
2.1 Channel Components and Characteristics ........................................................................ 5 
2.2 Receiver Equalization Techniques .................................................................................... 7 
2.2.1 Linear Equalizers ....................................................................................................... 8 
2.2.2 Non-Linear Equalizers ............................................................................................. 13 
2.3 Statistical Modeling of High-Speed Serial Receivers ..................................................... 18 
3. HYBRID STATISTICAL ADC-BASED RECEIVER MODELING .......................................20
3.1 Statistical BER Modeling ............................................................................................... 20 
3.2 ADC Quantization Noise ................................................................................................ 28 
3.3 Radix Errors .................................................................................................................... 31 
3.4 Time-Interleaving Mismatch Errors................................................................................ 37 
3.5 Conclusion ...................................................................................................................... 42 
4. A 52GB/S ADC-BASED PAM-4 RECEIVER WITH REFERENCE SCALED 
2BIT/STAGE SAR ADC AND PARTIALLY-UNROLLED DFE ..............................................44 
4.1 PAM-4 DFE Challenges ................................................................................................. 44 
4.1.1 PAM-4 DFE Loop-Unrolling ................................................................................... 46 
4.2 Receiver Architecture ..................................................................................................... 49 
4.3 ADC Design .................................................................................................................... 53 
4.3.1 Time-Interleaved ADC Architecture ....................................................................... 54 
4.3.2 Unit ADC Architecture ............................................................................................ 54 




4.4.1 Main and Parallel FFE ............................................................................................. 60 
4.4.2 Partially-Unrolled DFE ............................................................................................ 61 
4.4.3 Critical Path Optimization ....................................................................................... 62 
4.4.4 Advantages of PU-DFE ........................................................................................... 66 
4.5 Measurement Results ...................................................................................................... 67 
4.5.1 ADC Characterization .............................................................................................. 68 
4.5.2 Analog Front-End Characterization ......................................................................... 70 
4.5.3 Receiver Characterization ........................................................................................ 71 
4.6 Conclusion ...................................................................................................................... 75 
5. CONCLUSION AND FUTURE WORK ..................................................................................76 
5.1 Conclusion ...................................................................................................................... 76 
5.2 Future Work .................................................................................................................... 77 
5.2.1 Single Parity Check Code for ADC-Based Serial Link ........................................... 77 
















Figure 1.1. ADC-based serial link. ................................................................................................. 1 
 
Figure 2.1. A typical backplane link. .............................................................................................. 5 
Figure 2.2. Link system model showing a transmitter, an effective channel and a receiver. 
Also shown is the dispersion of a pulse at the receiver as it travels through the 
channel. ........................................................................................................................... 6 
Figure 2.3. (a) Passive and (b) Active realization of a CTLE......................................................... 9 
Figure 2.4. (a) A channel with 37.2dB loss at 14GHz 3 different CTLE-based analog front-
end frequency responses and their impact on voltage margin for a BER of 10-4 for 
various (b) ADC resolutions and number of (c) digital FFE tap counts. ...................... 11 
Figure 2.5. Conceptual full rate implementation of a digital FFE. ............................................... 12 
Figure 2.6. A P-way parallel implementation of a digital FFE. .................................................... 12 
Figure 2.7. N-tap conceptual full rate DFE implementation. ....................................................... 13 
Figure 2.8. (a) A Channel with 31dB loss at 16GHz is utilized to show the impact of (b) the 
number of digital FFE taps on BER and (c) the improvement achieved with a digital 
DFE. .............................................................................................................................. 15 
Figure 2.9. A P-way parallel implementation of a 2-tap DFE. ..................................................... 17 
Figure 2.10. A loop-unrolled N-tap PAM-2 DFE. ........................................................................ 19 
Figure 2.11. 2-way look-ahead transformation of a 2:1 multiplexer loop. ................................... 19 
 
Figure 3.1. 10Gb/s NRZ modulation modeling results with a conventional mixed signal 
statistical modeling framework. .................................................................................... 21 
Figure 3.2. Nonlinear CTLE decomposed into a linear CTLE followed by a nonlinear transfer 
function. ......................................................................................................................... 24 
Figure 3.3. Transformation of PDF of a random variable as a result of applying monotonic 
one-to-one nonlinear transfer function. ......................................................................... 24 




Figure 3.5. (a) Decomposed PDF at the output of the linear CTLE with the four PDFs 
centered at ±𝑝𝑘 ± 𝑝0 (b) Nonlinearity applied to the 4 PDFs and the resulting PDF 
at the output of the nonlinear transfer function block and (c) the combined PDF 
obtained by setting 𝑝𝑘 = 0 at the DFE summer output. ............................................... 26 
Figure 3.6. Statistical model and transient simulation results showing good matching both 
with and without nonlinearity. ...................................................................................... 27 
Figure 3.7. Input PDF and equivalent quantization noise PDF construction................................ 29 
Figure 3.8. Modeling of quantization noise amplification through the digital FFE. .................... 29 
Figure 3.9. Comparison of 10Gb/s NRZ voltage bathtub curves for the 25dB loss channel of 
Fig. 3.1 produced with the quantization noise modeling technique and transient 
simulations. The ADC resolution is varied from 3-6 bits with an ADC input random 
noise of (a) 1mVrms and (b) 2mVrms. (c) The DSP resolution is reduced to 6 bits for 
the 2mVrms case to show the impact of DSP round-off error. ....................................... 32 
Figure 3.10. Decomposing of the ISI PDF and convolving with the corresponding 
quantization noise PDF. ................................................................................................ 35 
Figure 3.11. (a) Ideal and compressive 6-bit ADC characteristics. (b) 20Gb/s PAM-4 voltage 
bathtub curves for an ideal ADC, (c) an ADC with 2-LSB INL and small post-
equalization residual ISI, and (d) an ADC with 2-LSB INL and a significant first 
postcursor ISI component after equalization. ................................................................ 36 
Figure 3.12. Modeling of time-interleaving ADC errors. ............................................................. 39 
Figure 3.13. Pulse response through the digital equalizer for the M time-interleaved channels 
(left) and ISI contributions from different channels at one particular time instant 
(right). ............................................................................................................................ 41 
Figure 3.14. Comparison of 10Gb/s NRZ voltage bathtub curves for the 25dB loss channel of 
Fig. 2 produced with the time-interleaving errors modeling techniques and transient 
simulations. A 4-way time-interleaved ADC with 6-bit and 1mVrms input noise is 
utilized with subsequent 5-tap digital FFE. Simulation results for (a) gain, (b) timing 
skew, (c) bandwidth, and (d) offset errors. ................................................................... 42 
 
Figure 4.1. (a) Channel with 31dB loss at a frequency of 13GHz and (b) Voltage margin 
improvement with a DFE at a BER of 10-6 with a 2-tap DFE for 52Gb/s operation. ... 45 
Figure 4.2. DFE loop-unrolling process for a 2-tap PAM-4 DFE. ............................................... 46 
Figure 4.3. Timing paths in a 2-tap PAM-4 DFE implementation with P parallel paths. ............ 47 




Figure 4.5. 2:1 mux count obtained by digital synthesis flow for a 52Gb/s PAM-4 DFE in 
65nm technology for various DFE tap lengths. ............................................................. 48 
Figure 4.6. The complete receiver architecture. ........................................................................... 50 
Figure 4.7. 4-stage CTLE-VGA analog front-end. ....................................................................... 50 
Figure 4.8. Schematic of the track and hold circuit consisting of a bootstrapped switch and 
gain boosted FVF. ......................................................................................................... 51 
Figure 4.9. Simulated THD comparison for a simple source voltage follower, a FVF, and a 
gain boosted FVF. ......................................................................................................... 51 
Figure 4.10. Phase generation block that generates the 8 1-UI spaced phases by dividing an 
external 13GHz clock and the timing diagram showing the 8 phases. ......................... 53 
Figure 4.11. Block diagram of the 32-way time-interleaved SAR ADC. ..................................... 55 
Figure 4.12. Unit ADC architecture and timing diagram. ............................................................ 56 
Figure 4.13. Shared-input stage 2-bit flash ADC schematic. ....................................................... 58 
Figure 4.14. Improved 6-bit resolution embedded FFE coefficient coverage map with non-
binary FFE DAC. .......................................................................................................... 59 
Figure 4.15. DSP architecture showing the 12-tap main FFE, 4-tap parallel FFE and the 2-tap 
loop-unrolled PAM-4 DFE. ........................................................................................... 60 
Figure 4.16. PDF of the signal at the output of the parallel FFE and the CDF for 0.33 symbol. . 64 
Figure 4.17. Partially-unrolled 2-tap PAM-4 look-ahead multiplexing with a look-ahead factor 
of 4. ................................................................................................................................ 64 
Figure 4.18. (a) Straight forward and (b) optimized implementation of mux selection logic. ..... 65 
Figure 4.19. Gate count comparison between a conventional DFE and PU-DFE for a 2-tap 
PAM-4 DFE in 65nm technology. ................................................................................ 66 
Figure 4.20. Power consumption comparison between a conventional DFE and PU-DFE for a 
2-tap PAM-4 DFE in 65nm technology. ....................................................................... 67 
Figure 4.21. Maximum achievable data rate comparison between a conventional DFE and PU-
DFE for a 2-tap PAM-4 DFE in 65nm technology. ...................................................... 67 
Figure 4.22. ADC-based PAM-4 chip micrograph. ...................................................................... 68 




Figure 4.24. SNDR and SFDR vs input frequency at fs = 26GHz. .............................................. 69 
Figure 4.25. Measured INL/DNL plot. ......................................................................................... 70 
Figure 4.26.  Measured CTLE magnitude response. .................................................................... 71 
Figure 4.27. Measured channel responses. ................................................................................... 72 
Figure 4.28. Receiver characterization setup for 32Gb/s operation and 52Gb/s operation. ......... 73 
Figure 4.29. Measured timing bathtub curves for (a) 32Gb/s operation and (b) 52Gb/s 
operation. ....................................................................................................................... 74 
Figure 4.30. Measure recovered clock jitter histogram. ............................................................... 74 
 
Figure 5.1. Transceiver architecture for implementing a single parity check code based 
architecture. Interleaving and de-interleaving are employed to avoid burst errors. ...... 78 
Figure 5.2. BER vs ADC resolution with and without SPC code based erasure filling scheme. . 78 
Figure 5.3. Analog multi-tone modulation reconfigurable receiver. ............................................ 80 
Figure 5.4. Channel response and 5-channel multitoned analog modulated signal. ..................... 82 
Figure 5.5. Jitter robustness comparison for different modulation schemes. ............................... 83 













Table 4.1. Complexity comparison between conventional PAM-4 DFE and PU-DFE. ............... 66 









1. INTRODUCTION  
 
With IOT and cloud computing becoming ubiquitous, there has been an explosion in the 
bandwidth demand placed on servers and routers operating over legacy channels. In order to tackle 
the significant frequency dependent attenuation encountered on these legacy channels, PAM-4 
signaling is emerging as the choice of modulation format, particularly, for medium reach (MR) 
and long reach (LR) links. Several recent designs [1-6] employ ADC based receiver architectures 
for PAM-4 links to take advantage of process scaling and process, voltage and temperature (PVT) 
robustness of the CMOS process in the DSP. These architectures employs a DSP followed by an 
ADC as shown in Fig. 1.1. The more powerful equalization techniques that lend themselves to 
easier digital implementations extend the amount of insertion loss that the receiver can handle in 
comparison to mixed signal implementations. Despite these advantages, ADC-based serial links 
are generally more complex and consume more power than mixed signal serial links. Both the 
ADC and the DSP can consume significant amount of power which is often prohibitive for systems 
where link power efficiency is the key metric. This motivates the development of a fast and reliable 
modeling framework to efficiently investigate system trade-offs and determine the optimal ADC 































 Moreover, efficient modeling techniques approaches can quantify the effectiveness of 
system and circuit techniques used to save power in ADC-based links such as partial analog pre-
equalization embedded in the ADC. Time-domain Monte Carlo (transient) simulations are 
impractical for system performance prediction for typical BER < 10-12 since the simulation time 
and memory requirements would be impractically large. Worst case methodologies like peak 
distortion analysis are pessimistic and result in over design [7]. For these reasons, today’s high-
speed links employs statistical tools for predicting system performance [8-10]. Since these 
statistical tools were developed for mixed signal links, they do not model several ADC related 
non-idealities. Efficient modeling techniques that capture ADC quantization noise, integral and 
differential non-linearity (INL/DNL), and time-interleaving mismatches are necessary to 
determine key ADC specifications and digital equalization complexity.  
In ADC based links, traditionally, a baud rate linear feed-forward equalizer (FFE) is 
cascaded with a non-linear decision feedback equalizer (DFE) to equalize the channel [1, 11]. NRZ 
modulation is commonly employed but as data rates continue to increase, more bandwidth efficient 
modulation schemes such as PAM-4 have to be employed in order to compensate the severe 
attenuation introduced by the channel. The Nyquist frequency of PAM-4 modulation is half of that 
of NRZ modulation and hence is more bandwidth efficient. However, PAM-4 modulation has its 
own drawbacks in terms of the receiver complexity. While PAM-4 modulation relaxes the 
bandwidth requirement of the analog-front end and reduces the ADC sampling rate requirement 
by half, the DSP complexity can grow significantly. The linear FFE is modulation format agnostic 
and it is primarily the DFE complexity that suffers when PAM-4 modulation is employed. The 
DFE has a timing loop whose timing requirements can be very challenging to meet. Traditional 




issue in the DFE. This necessitates the use of techniques such loop-unrolling and look-ahead 
multiplexing. While these techniques successfully solve the critical timing path issue in the DFE, 
their complexity grows exponentially for each additional tap of the DFE. The number of summers 
and digital comparators in a PAM-4 DFE grows as 4N where N is the number of DFE taps as 
opposed to 2N as in an NRZ-DFE implementation. The number of multiplexers needed for a look-
ahead implementation also grows as 4N. This requires the investigation in to a new PAM-4 DFE 
architecture that can simplify the DFE architecture. 
Forward error correction (FEC) is of interest in serial links employing PAM-4 modulation 
to relax the stringent equalization and non-linearity requirements. However, decoders for well-
known error correction codes such as the RS and BEC codes can be power hungry and more 
importantly introduce large latency.  
ADC-based serial links employ some equalization in the analog-front end in the form a 
CTLE or embed low-overhead equalization inside the ADC. Previous work [11] has utilized the 
information in the partially equalized signal to determine whether a symbol requires further 
equalization or if a decision can be reliably made with no further equalization. This allows the 
digital equalizer power to be gated on a symbol by symbol basis leading to considerable power 
savings in the DSP. This principle can be also be utilized to employ simple error correction 
techniques without the power and latency overhead introduced by more complex coding schemes.  
Timing errors in the form of jitter are another major impairment as data rates continue to scale. 
While time-interleaved ADCs relax the sampling frequency for each track and hold, each time-
interleaved channel still samples the full bandwidth signal. Hence, the jitter induced noise in time-
interleaved system is the same as a single channel system. Frequency channelized receivers are 




This dissertation is organized as follows. Chapter 2 presents background material on high 
speed serial link systems including the impairments introduced by the various compoments that 
make up the channel and the techniques available at the receiver to combat these impairments. It 
also introduces the existing mixed-signal statistical modeling techniques to predict system 
performance. 
Chapter 3 presents a hybrid statistical framework for ADC-based serial links receivers. The 
framework builds upon existing statistical modeling techniques for mixed-signal receivers and 
adds the support of ADC quantization noise, radix errors (INL/DNL) and time-interleaving 
mismatch errors. The results of the statistical model are compared with transient simulation to 
verify the modeling technique. 
Chapter 4 presents a 52Gb/s PAM-4 ADC-based receiver that employs a 32-way time-
interleaved 6-bit SAR ADC and DSP architectures which utilize the information in the partially 
equalized signal to simplify the loop-unrolled PAM-4 DFE complexity to that of an NRZ DFE 
while simultaneously nearly doubling data rate. Circuit design details and measurement results 
that verify the performance of a prototype in GP 65nm process are presented. 
Chapter 5 presents a bit erasure filling architecture that can relax both the analog front-end 
and the DSP design by targeting a higher raw pre-erasure filled BER. Chapter 5 also presents a 
frequency domain ADC-based receiver architecture that can support multi-tone modulation to 








This chapter introduces the topic of modeling and design of high-speed ADC-based serial 
link receivers. The primary challenge of high speed data transmission through a lossy channel 
utilizing less than perfect circuit blocks and the consequent necessity of accurately modeling the 
channel and circuit imperfections are presented. Various available choices for modulation 
techniques, equalization architectures and their implementation styles, and their relative trade-offs 
are also presented. 
2.1 Channel Components and Characteristics 
A typical backplane serial link along with its constituent components such as the IC 
package, connector, vias, and the backplane trace is shown in Fig. 2.1 [12]. Each of these 
components can introduce dispersion and reflections which leads to symbols transmitted in 
different time intervals interfering with each other. This is known as inter-symbol interference 












Figure 2.2. Link system model showing a transmitter, an effective channel and a receiver. Also 




While the IC package, connectors, and vias are primarily responsible for reflections due to 
the impedance discontinuities that these components introduce the backplane trace introduces 
dispersion due to its frequency dependent loss and non-linear phase characteristics. The frequency 
dependent loss of the backplane trace in turn results from physical phenomenon such as skin effect 
and dielectric absorption. The entire link can be modeled as a transmitter and receiver connected 
through a lossy, reflective equivalent channel comprising of all the components previously 
described as shown in Fig 2.2. A pulse at the transmitter occupying one symbol interval or unit 
interval (UI) of time, which is 100ps in the example of Fig. 2.2 spreads out into several UIs at the 
receiver as it travels over a channel, which has a loss of 25dB at 5GHz in this example. The signal 
at the receiver can be written as 
 𝑧(𝑡) =  ∑ 𝑏𝑘𝑐(𝑡 − 𝑘𝑇) + 𝑤(𝑡)𝑘 ,  (2.1) 
where 𝑏𝑘 is the transmitted symbol, 𝑘 is the symbol index, 𝑇 is the symbol period, and 𝑐(𝑡) is the 




pulse similar to the pulse on the transmitter side in Fig. 2.2 and 𝑤(𝑡) is a additive white Gaussian 
noise (AWGN) term. After sampling, 𝑡 is replaced by 𝑛𝑇 and the received signal can be written as  
 





Equation 2.2 can be re-written to separate out the desired received signal from the ISI as in 
Equation 2.3, where the second term represents the ISI and third term represents the additive noise 
term. 
 𝑧(𝑛) =  𝑏𝑛𝑐(0) + ∑ 𝑏𝑘𝑐(𝑛 − 𝑘) + 𝑤(𝑛)𝑘≠𝑛 .  (2.3) 
Now that it has been identified that the impact of the channel is to introduce ISI, techniques 
to combat this ISI at the receiver are introduced in the next section. 
2.2 Receiver Equalization Techniques 
Receiver architectures can be classified as mixed-signal architectures and ADC-based 
architectures. Both mixed-signal receivers and ADC-based receivers employ equalization that can 
be broadly divided into linear equalization and non-linear equalization. Examples of linear 
equalization are continuous time linear equalizers (CTLE) and discrete time finite impulse 
response (FIR) equalizers. Examples of non-linear equalizers are decision feedback equalizers 
(DFE) and maximum-likelihood sequence estimation (MLSE). It should be noted that while the 
FIR equalizer and the DFE are symbol-symbol equalizers, the MLSE equalizer operates on 
sequences of received symbols and hence can be thought of as belonging to a different class of 
equalizers compared to the FIR equalizer and the DFE. While mixed-signal receivers mainly 
employ CTLE and DFE (both FIR and IIR feedback), ADC-based receivers employ some analog 




followed by a powerful linear feedforward equalizer (FFE) and DFE in the digital domain in the 
DSP.  
2.2.1 Linear Equalizers 
A commonly employed linear equalizer is the CTLE that can effectively cancel both the 
precursor and postcursor ISI components by having a frequency response that inverts the channel’s 
frequency dependent loss. Further, the CTLE can also be employed to cancel the long ISI tail 
caused by channel’s loss at relatively low frequencies [13]. The CTLE can be implemented either 
through passive elements [4] or as an active circuit as shown in Fig. 2.3(a) and Fig. 2.3(b) 
respectively with the passive CTLE providing better linearity than the active CTLE. The zero for 
the passive CTLE is given by 𝑤𝑧 =  
1
𝑅1𝐶1
⁄  while the pole is given by 𝑤𝑝 =  
1
(𝑅1||𝑅2)𝐶1
⁄  leading 
to a gain peaking of 1 +  
𝑅1
𝑅2
⁄ . The passive CTLE, however, attenuates the signal at low 
frequencies instead of providing a gain at high frequencies. The active CTLE is commonly realized 
using a source degenerated differential amplifier with the zero and the poles given by 𝑤𝑧 =
 1 𝑅𝑠𝐶𝑠
⁄  and 𝑤𝑝1 =  
1
𝑅𝐷𝐶𝐷
⁄  and 𝑤𝑝2 =  
1 + 𝑔𝑚𝑅𝑠
𝑅𝑠𝐶𝑆
⁄  where 𝑔𝑚 is the transconductance of 
the differential pair. Discrete time FIR filters can be realized using analog delay elements as in 
[14-15] or can be realized in an efficient manner using an embedded capacitive DAC in a SAR 
ADC [16]. When analog linear equalization is provided either through a CTLE, a discrete time 
FIR equalizer or an embedded FFE, it can potentially lead to lower resolution requirements for the 
ADC or fewer digital FFE taps in an ADC-based receiver. Simulation results in Fig. 2.4 shows the 
receiver voltage margin for 56Gb/s operation over a channel with 37.2dB loss at 14GHz for 3 
different CTLE-based analog front-end frequency responses (Fig. 2.4(a)) as a function of ADC 





















Figure 2.4. (a) A channel with 37.2dB loss at 14GHz 3 different CTLE-based analog front-end 
frequency responses and their impact on voltage margin for a BER of 10-4 for various (b) ADC 




These simulation results assume 56Gbps operation utilizing PAM-4 modulation with a 1V 
peak to peak swing and a 3-tap FIR equalizer at the transmitter. A Random jitter of 300fsrms is also 
assumed.  
FFEs render themselves to be very suitable for digital implementations and are as such the 
most commonly employed for of equalization in ADC-based receivers. As mentioned earlier, FFEs 
are symbol-by-symbol linear equalizers and their conceptual full rate implementation is shown in 
Fig 2.5. FFEs can be effectively pipelined and implemented in a parallel architecture as shown in 
Fig. 2.6, leading to relaxed timing requirements. The relaxed timing requirements can be exploited 
















A well-known drawback of linear equalizers is that not only do they cancel ISI, but they 
also amplify noise and crosstalk. Since linear equalizers invert the channel response, frequencies 
at which the channel has notches in its magnitude response can have large gain peaks in the linear 
equalizers frequency response significantly amplifying noise and crosstalk. Hence, while linear 
equalizers are effective at cancelling ISI in well behaved channels they are inadequate in channels 
with spectral notches that can result from reflections due to connectors and via stubs. A non-linear 
equalizer like the DFE can overcome this drawback and hence are commonly used in both mixed-
signal and ADC-based receivers.  
2.2.2 Non-Linear Equalizers 
One example of a non-linear equalizer is the DFE. The DFE offers significant advantage 
over a linear equalizer since it can cancel postcursor ISI without noise and crosstalk amplification. 
Once the decision of a symbol becomes available at the output of the slicer, it is fedback into a 










The decision is multiplied with a DFE coefficient and this value is subtracted from the newly 
arrived symbol. If the DFE coefficient is equal to the first postcursor ISI value then essentially the 
postcursor ISI value is removed from the newly arrived symbol. Several decision can be fedback 
as shown in Fig. 2.7 to cancel multiple postcursor ISI terms. Since the DFE can only cancel 
postcursor ISI, an FFE-DFE combination is frequently used with their taps co-optimized. The 
impact of having a DFE is illustrated in the simulation results of Fig. 2.8. A data rate of 64Gb/s 
and PAM-4 modulation is assumed with the transmitter swing being 0.9 Vppd. A CTLE with 15dB 
of peaking at 16GHz and 3-taps of embedded FFE provide analog equalization. A receiver jitter 
random jitter of 300fsrms and a random noise of 3 mVrms at the ADC input is also assumed. The 
ADC has a resolution of 6-bits and is followed by a DSP with a FFE-DFE equalizer. Comparing 
Fig. 2.8(b) and Fig. 2.8(c) it can be seen that increasing the number of FFE taps beyond 19 taps 
brings about very little improvement in the BER. At this point, the BER is limited by the SNR. 













Figure 2.8. (a) A Channel with 31dB loss at 16GHz is utilized to show the impact of (b) the 




The DFE, however, is difficult to implement due to the challenging feedback timing path. 
In each cycle of operation, the decision at the output of the slicer has to pass to the output of the 
delay element, it has to then get scaled in the multiplier by the DFE coefficient, the scaled value 
has to pass through multiple summers where it is subtracted from the newly arrived symbol and 




sliced to generate the new decision for the next cycle of operation.  All of the above operations 
need to be completed in 1 symbol or 1 unit interval (UI) of time. The conceptual full rate 
implementation of Fig. 2.7 has one decision element, all the summers and a multiplier in the critical 
feedback timing path. This is often a very difficult timing path to meet. The critical feedback timing 
path is also significantly longer than the iteration bound since the combinational logic delay is very 
unevenly distributed between the delay elements. While the parallel implementation shown in Fig. 
2.9 brings the critical path delay closer to the iteration bound, additional techniques are necessary 
to reduce the critical path delay for high data wireline receivers. Two commonly employed 
techniques that can significantly reduce the critical path delay are loop-unrolling and look-ahead 
multiplexing.  
A. Loop-Unrolling 
The idea of loop-unrolling [17] is to precompute all possible equalized values before a decision 
is actually fed back. The precomputed possible equalized values are then sliced to obtain all the 
possible decisions for the input symbol. These decisions are fed to a multiplexer which selects the 
correct decision by making use of the past decisions that are fed back. Hence, the decision is fed 
back only to the final multiplexer forming a multiplexer loop. This final multiplexer is now in the 
1UI critical path and the summer and the slicer have been removed from this critical timing path. 
A general N-tap PAM-2 loop-unrolled architecture is shown in Fig. 2.10. The summers and slicers 
can be thought of as a precomputation section that can be pipelined and hence offers relaxed timing 
paths to design. The cost of reducing the critical path delay is an increase in circuit complexity. In 
a PAM-2 modulation system, with N taps of DFE, the number of summers and slicers required for 




Figure 2.9. A P-way parallel implementation of a 2-tap DFE. 
B. Look-Ahead Multiplexing 
Since the final decision selection multiplexer is now in the critical feedback timing path, 
techniques are necessary to reduce delay in this multiplexer loop. Pipelining and parallel 
implementations cannot be used to speed up a multiplexer loop. The authors in [17] introduce a 
technique known as look-ahead multiplexing to speed up multiplexer loops in high speed DFE 
implementations. The technique is further illustrated in [18]. The 2:1 look-ahead multiplexer loop 
can be described using the following equations: 
𝑌𝑛 =  𝐴𝑛𝑌𝑛−1
′ + 𝐵𝑛𝑌𝑛−1. (2.4) 
18 
Equation 2.4 describes a 2:1 multiplexer loop. The previous output 𝑌𝑛−1  is fed back as the
select line for generating the current output, 𝑌𝑛 from the current inputs 𝐴𝑛 and 𝐵𝑛. The previous 
output 𝑌𝑛−1 can be written as 
𝑌𝑛−1 =  𝐴𝑛−1𝑌𝑛−2
′ + 𝐵𝑛−1𝑌𝑛−2,  (2.5)
Substituting (2.5) in (2.4), 
𝑌𝑛 =  (𝐴𝑛𝐴𝑛−1
′ + 𝐵𝑛𝐴𝑛−1)𝑌𝑛−2
′ + (𝐴𝑛𝐵𝑛−1
′ + 𝐵𝑛𝐵𝑛−1)𝑌𝑛−2.  (1.6)
From equation 2.6, we see that the dependence of 𝑌𝑛 on 𝑌𝑛−1  has been removed. Hence, 
the delay tolerated in the feedback path has been doubled. This technique can be extended to 
further relax the feedback timing path. Fig. 2.11 shows the look-ahead technique for a 2:1 
multiplexer loop and it can be seen that the relaxation in the feedback timing path comes at the 
cost of additional multiplexers. The number of 2.1 multiplexers needed for a 2-tap multiplexer 
loop with a look-ahead factor of LF is 2𝑁(𝐿𝐹 − 1) +  ∑ 2𝑁−𝑖𝑁𝑖=1 .
2.3 Statistical Modeling of High-Speed Serial Receivers 
The growing complexity of high speed links systems make it impractical to use time 
domain Monte Carlo (transient) simulations alone to predict the system performance, where the 
number of bits required to validate typical bit error rate (BER) requirements (< 10−12) becomes 
prohibitive and hence statistical techniques become necessary. While these statistical tools are 
growing mature for binary links, conventional modeling approaches for ADC-based receivers and 
digital equalization use ADC performance metrics based on mean-square error (MSE), such as 
signal-to-noise and distortion ratio (SNDR) or effective number of bits (ENOB) [19], [20]. Since 
these metrics characterize the ADC performance for a single tone, they are unsuitable for a 
broadband system such as an ADC-based receivers. The existing statistical techniques for mixed-
signal receivers and their extension to ADC-based receivers is presented in the next chapter. 
19 
Figure 2.10. A loop-unrolled N-tap PAM-2 DFE. 
Figure 2.11. 2-way look-ahead transformation of a 2:1 multiplexer loop. 
20 
3. HYBRID STATISTICAL ADC-BASED RECEIVER MODELING*
Accurate modeling of serial link receivers with high-speed analog-to-digital converters 
(ADCs) is necessary to understand the various trade-offs involved and to prevent over designed 
power hungry systems. This chapter presents a hybrid statistical modeling framework for ADC-
based serial link receivers. The framework builds upon existing statistical modeling techniques [8-
10] for mixed-signal receivers and adds the support of ADC quantization noise, radix errors
(integral and differential nonlinearity (INL/DNL)), and time-interleaving mismatches. A rapid, 
purely statistical simulation mode is utilized to model systems with small front-end nonlinearity 
and ADC INL/DNL, while a hybrid approach is utilized to model INL/DNL. 
3.1 Statistical BER Modeling 
Statistical simulation techniques that are often used to model mixed-signal serial link 
receivers serve as the base for inclusion of additional ADC-based receiver non-idealities. In the 
absence of jitter and random noise, the signal at the channel output can be written as shown in 
equation 2.1 and repeated here for convenience. 
𝑧(𝑡) =  ∑ 𝑏𝑘𝑐(𝑡 − 𝑘𝑇)𝑘 ,  (3.1)
where 𝑏𝑘 is the transmitted symbol, 𝑘 is the symbol index, 𝑇 is the symbol period, and 𝑐(𝑡) is the 
channel pulse response that is extracted by convolving the channel impulse response with a single 
pulse. An example of a 10Gb/s NRZ pulse response is shown in Fig. 3.1. Further improvements to 
this pulse response with linear equalization (CTLE, FFE) is modeled by convolving the equivalent 
* Part of this work is reprinted with permission from “Modeling of ADC-Based Serial Link Receivers with Embedded
and Digital Equalization” by S. Kiran, A. Shafik, E. Z. Tabasy, S. Cai, K. Lee, S. Hoyos and S. Palermo, IEEE 
Transactions on Components, Packaging and Manufacturing Technology, July 4, 2018 copyright 2018 by IEEE. 
21 
equalizers' impulse responses with this channel pulse response. DFE is modeled by subtracting the 
DFE coefficients from the sampled post-cursor ISI components, with error propagation being 
generally ignored. After sampling, 𝑡 is replaced with 𝑛𝑇 and the sampled equalized channel output 
𝑦(𝑛) can be written as 
𝑦(𝑛) =  ∑ 𝑏𝑘𝑝(𝑛 − 𝑘)𝑘 ,  (3.2)
where 𝑝(𝑛) is the sampled equalized pulse response obtained by sampling the equalized pulse 
response  𝑝(𝑡). The first step in statistical analysis is to find the probability density function (PDF) 
of 𝑦(𝑛) due to the channel's inter-symbol interference (ISI). In order to calculate this, the 
individual channel ISI components' PDFs are convolved together and the final ISI PDF can be 
written as 
𝑓𝐼𝑆𝐼(𝑣) =  𝑓𝐼𝑆𝐼,−𝑖 ∗ 𝑓𝐼𝑆𝐼,−𝑖+1 ∗ … ∗ 𝑓𝐼𝑆𝐼,𝑗−1 ∗ 𝑓𝐼𝑆𝐼,𝑗, (3.3) 
where for binary NRZ modulation the individual ISI components’ PDFs are written as 
𝑓𝐼𝑆𝐼,𝑚(𝑣) =  
1
2
(𝛿(𝑣 − 𝑝[𝑚]) +  𝛿(𝑣 + 𝑝[𝑚])). (3.4) 
Figure 3.1. 10Gb/s NRZ modulation modeling results with a conventional mixed signal statistical 
modeling framework.  Reprinted with permission from S. Kiran et al, IEEE ©July 4, 2018
22 
Here δ(.) is the Dirac delta impulse function with 𝑚 ∈  [−𝑖, 𝑗] and 𝑚 ≠ 0, and it is 
assumed that the channel has 𝑖 pre-cursor and 𝑗 post-cursor ISI taps. The statistical ISI PDF in Fig. 
3.1 results from the residual ISI in an equalized pulse response, where the equalized pulse response 
is constructed by convolving the channel's sampled pulse response with a 5-tap FFE. Once this 
PDF is obtained, additional voltage noise components such as random Gaussian noise and uniform 
power supply noise can be included in the system by convolving the voltage noise PDFs with the 
ISI PDF since these components are independent. The final step is to shift the total PDF to the 
main cursor position at ±𝑝[0] and perform an integration. This results in a bathtub curve which 
plots the BER as a function of the voltage margin. 
In order to include jitter, the ISI PDF with receiver jitter can be written as 
𝑓𝐼𝑆𝐼




𝑛 , (3.5) 




 is the ISI PDF with the sampling instant offset from the ideal position by + 𝑛
𝑅𝑋 .A
family of ISI PDFs including the jitter effect are constructed at different sampling instants by 






], and voltage and timing margin 
plots similar to the ones shown in Fig. 3.1 may be obtained that include receiver jitter. The 
techniques described so far are very similar to the ones implemented in statistical simulator engines 
such as Stateye [12]. It has also been assumed that the entire link is perfectly linear allowing 
convolution of the channel impulse response with the impulse response of the linear equalizers 
such as the CTLE. However, the CTLE can introduce nonlinearity that can significantly degrade 
BER performance particularly in PAM-4 receivers. Modeling CTLE nonlinearity in a mixed-signal 
link having only a CTLE as the equalizer before the slicer is relatively straight forward [21]. 
23 
Consider the link system shown in Fig. 3.2 where a nonlinear CTLE has been decomposed 
into a linear CTLE followed by a non-linear transfer function. The input 𝑋 and the output 𝑌 of the 
nonlinearity block are related by the equation 
𝑦 =  𝑎1𝑥 + 𝑎3𝑥
3,  (3.6)
where 𝑎1 and 𝑎3 are the linear gain and third non-linearity coefficient respectively. It is assumed 
that the second order non-linearity is negligible. If the PDF of the signal before the non-linearity 
is known, which can determined through the techniques described previously for a perfectly linear 
system, the PDF at the output of the non-linear block can be simply written as 






where 𝑓𝑌(𝑦) and 𝑓𝑋(𝑥) are the PDFs of 𝑌 and 𝑋 respectively. An assumption is also made that the 
non-linear transfer function is one-to-one and monotonically increasing, which generally holds in 
the case of a CTLE nonlinear transfer function. The impact of this nonlinearity is illustrated in Fig. 
3.3. The simulation result of Fig. 3.3 assumes that the 1-dB gain compression point of the CTLE 
transfer function is at 350mV. Mixed signal receivers commonly employ DFE for additional 
equalization and hence it important to model the impact of DFE on the nonlinearity introduced by 
the CTLE. Fig. 3.4 shows a mixed signal receiver with a CTLE followed by a DFE with the 
summer in the DFE explicitly shown to aid the discussion of the technique employed to model the 
impact of the DFE on the CTLE nonlinearity. The key point to note in this discussion is how the 
PDF at the input of the summer is modified at the output of the summer due to the feedback from 
the FIR filter. If the DFE cancels the 𝑘𝑡ℎ postcursor component of the sampled equalized pulse
response 𝑝(𝑛), then the signal at the output of the DFE summer, represented by the random 
variable 𝑍, has 𝑝(𝑘) = 0 with the signal before the summer, both 𝑋 and 𝑌, having 𝑝(𝑘) ≠ 0. 
24 
Figure 3.2. Nonlinear CTLE decomposed into a linear CTLE followed by a nonlinear transfer 
function. 
Figure 3.3. Transformation of PDF of a random variable as a result of applying monotonic one-
to-one nonlinear transfer function. 
25 





Figure 3.5. (a) Decomposed PDF at the output of the linear CTLE with the four PDFs centered at 
±𝑝(𝑘) ± 𝑝(0) (b) Nonlinearity applied to the 4 PDFs and the resulting PDF at the output of the 
nonlinear transfer function block and (c) the combined PDF obtained by setting 𝑝(𝑘) = 0 at the 
DFE summer output. 
27 
Figure 3.6. Statistical model and transient simulation results showing good matching both with 
and without nonlinearity. 
Therefore the technique to model CTLE nonlinearity in the presence of a DFE is to 
decompose the ISI PDF, 𝑓𝐼𝑆𝐼(𝑣), into 2 PDFs where 𝑘
𝑡ℎ postcursor is ±𝑝(𝑘), applying the PDF
transformation technique of Equation 3.6 to both the PDFs and then merge the PDFs together by 
setting 𝑝(𝑘) = 0. With reference to Equation 3.3 and Equation 3.4, the decomposition is achieved 
by setting 𝑚 ≠ 0 and 𝑚 ≠ 𝑘. The resulting ISI PDF is then shifted to ±𝑝(𝑘) ± 𝑝(0). Note that 
this results in 4 PDFs instead of 2 PDFs that result from Equation 3.3 and Equation 3.4. All of the 
above discussion applies to a 1-tap DFE for a link employing PAM-2 modulation but can be easily 
extended to multiple tap DFE and PAM-4 modulation. The nonlinearity modeling technique is 
illustrated using Fig. 3.5. The simulation results assume a transmitter swing of 0.6Vppd with no Tx 
28 
equalization. The receiver comprises of a CTLE and a 1-tap DFE with the nonlinear transfer 
function yielding a 1-dB compression point of 250mV. The statistical model is compared with 
transient simulation in Fig. 3.6 and excellent matching is obtained verifying the technique. To the 
mixed signal simulation framework various ADC specific impairments are added. These are 
detailed in the sections that follow. 
3.2 ADC Quantization Noise 
Quantization error is formally deterministic for a given input signal, additional justification 
is necessary to include an equivalent quantization noise PDF similar to independent random noise. 
It was shown in [22] that under certain conditions that are generally met with ADC resolutions 
used in high-speed link applications, quantization error can be treated as an uncorrelated random 
variable with a white spectrum that can be treated as an additive noise term known as quantization 
noise. The uniform PDF for quantization noise is obtained through an area summation process as 
shown in Fig. 3.7. The probability of having a particular quantization error 𝑞 is probability of the 
input signal 𝑥 falling at 𝑥𝑖 + 𝑞 where 𝑥𝑖 is quantized level in any of the quantization intervals. This 
is equivalent to cutting the input signal’s PDF into strips of width ∆, the quantization interval, and 
summing them up. From another perspective, the quantization process works to concentrate the 
input signal PDF within each quantization interval into a single quantized level at the center of the 
quantization interval. This is mathematically equivalent to convolving the input signal PDF with 
a uniform PDF with the width of the uniform PDF equal to the quantization interval and then 
sampling the resulting PDF. Therefore, the quantizer’s output PDF is very similar to the PDF of 
the sum of the input signal and an independent uniformly distributed noise quantity. This approach 
has been called the pseudo quantization noise (PQN) model. An important issue in ADC-based 
29 
high speed link receivers is that the quantized input signal is generally equalized with a digital 
FFE, resulting in quantization noise amplification. 
Figure 3.7. Input PDF and equivalent quantization noise PDF construction. Reprinted with 
permission from S. Kiran et al, IEEE ©July 4, 2018. 
Figure 3.8. Modeling of quantization noise amplification through the digital FFE.  
30 
In order to include this effect, the quantization PDF at the ADC output, 𝑓𝑄, is scaled by
different FFE coefficients and the resulting PDFs are convolved together, as shown in Fig. 3.8, to 
arrive at the final quantization noise PDF 𝑓𝑄,𝑓𝑓𝑒 which is written for a K-tap FFE as 



















The ADC quantization noise modelling approach is verified by comparing transient 
simulations and statistical modeling results of 10Gb/s NRZ operation over the channel with 25dB 
loss at the Nyquist frequency from Fig. 3.1. These simulations utilize a 0.6Vppd transmit swing 
without TX equalization and the receiver employs a 5-tap digital FFE with a coefficient resolution 
of 8 bits. Utilizing 14-bit DSP resolution to have negligible round-off errors, the simulation results 
of Fig. 3.9(a) and (b) show that good matching is achieved as the ADC resolution is varied from 3 
to 6 bits for cases with an ADC input random noise of 1mVrms and 2mVrms. For the channel under 
test, at least 5 bits of ADC resolution is necessary to obtain an open eye with a BER less than 10-
10. While the BER performance is poor, the model still displays good matching with a 3-bit
resolution ADC that is generally the lowest resolution used in ADC-based wireline receivers. 
DSP round-off errors are also included in the model using a similar approach. This round-
off generally occurs at the multiplier outputs to reduce the resolution required in the subsequent 
adders and pipelining sequential elements. The probability of a particular round-off error is the 
probability of the corresponding ADC output occurring, which is obtained by computing the ISI 
PDF at the ADC output. This is essentially the same process as the area summation process for 
computing the ADC quantization error PDF. The resultant round-off error PMF at the different 
multiplier outputs are assumed independent and are convolved with the ISI PDF, similar to how 




simulation results when the FFE multiplier resolution is limited to a relatively low 6 bit value, 
which is the same as the ADC resolution. Even with this low DSP resolution, good matching is 
obtained between the statistical model and the transient simulations. Assuming the 2mVrms noise 
case used in Fig. 3.9(b), the BER=10-10 voltage margin reduces from 31.3 mV to 9.3 mV when the 
DSP resolution is reduced from 14 bits to 6 bits. Note that, while these purely-statistical techniques 
works well for an ADC with low INL/DNL and allows for rapid simulations, this approach breaks 
down with appreciable non-linearity and the technique described in the next sub-section is 
required. 
3.3 Radix Errors 
The most commonly used quantizers are the uniform radix-2 type that map the input to one 








Figure 3.9. Comparison of 10Gb/s NRZ voltage bathtub curves for the 25dB loss channel of Fig. 
3.1 produced with the quantization noise modeling technique and transient simulations. The 
ADC resolution is varied from 3-6 bits with an ADC input random noise of (a) 1mVrms and (b) 
2mVrms. (c) The DSP resolution is reduced to 6 bits for the 2mVrms case to show the impact of 
DSP round-off error. Reprinted with permission from S. Kiran et al, IEEE ©July 4, 2018
33 
However, quantizers exhibit non-idealities due to process variations and circuit 
mismatches, resulting in modified quantizer thresholds. Due to the modified thresholds, 
quantization is non-uniform and this causes the quantization noise to be no longer uncorrelated 
with the input [16]. This introduces 2 difficulties in modeling quantization noise as described in 
section 3.2. First, in order to find the quantization noise PDF at the output of the digital equalizer, 
quantization noise PDFs that are scaled by the FFE coefficients cannot be convolved together as 
shown in Fig. 3.8. This necessitates a transient simulation to find the quantization PDF at the output 
of the digital equalizer. The second difficulty is that the quantization noise PDF at the output of 
the digital equalizer extracted through a transient simulation cannot be convolved the ISI PDF. 
This second challenge to including these radix errors in the modeling framework is addressed by 
generating level-dependent quantization noise PDFs. Recall that the ISI PDFs obtained from 
Equation 3.3 are centered at +𝑝[0] and −𝑝[0]. Consider first a scenario where at the output of the 
equalizer there is no residual ISI and the two ISI PDFs are simply delta functions at +𝑝[0] and 
−𝑝[0]. Two quantization noise PDFs, denoted as 𝑄𝑝𝑑𝑓
+  and 𝑄𝑝𝑑𝑓
− , are extracted through transient
simulations when the current symbol is 1 and -1 respectively. This requires tracking the current 
symbol from the ADC input to the digital equalizer output. Hence, these quantization noise PDFs 
are said to be extracted with 1 symbol tracking in this case. For every quantization noise value in 
𝑄𝑝𝑑𝑓
+  and 𝑄𝑝𝑑𝑓
− , the signal value at the equalizer output is a deterministic value of +𝑝[0] and
−𝑝[0], respectively. The resultant signal with the quantization noise added is simply the sum of 
+𝑝[0] and all the quantization noise values in 𝑄𝑝𝑑𝑓
+  and the sum of −𝑝[0] and all the quantization
noise values in 𝑄𝑝𝑑𝑓
− . The resultant signal PDF is obtained by shifting 𝑄𝑝𝑑𝑓
+  to be centered on
+𝑝[0] and shifting 𝑄𝑝𝑑𝑓
−  to be centered on −𝑝[0]. This is done through a convolution operation,
34 
where +𝑝[0] is convolved with 𝑄𝑝𝑑𝑓
+ and −𝑝[0] is convolved with 𝑄𝑝𝑑𝑓
− . Note, that it is not 
necessary for the residual ISI to be zero and this technique can be employed as long as the residual 
ISI at the output of the equalizer is small. 
From the discussion so far, in the presence of non-uniform quantization the signal with ISI 
and the quantization noise are made conditionally independent with the condition being the current 
symbol having a specific value of +1 or -1. To handle scenarios where there is significant residual 
ISI at the equalizer output, this technique is further extended with the condition being specific 
symbol patterns. For example, if the significant residual ISI term is the first postcursor ISI term, 
then the conditions are the symbol patterns "-1-1", "-1+1","+1-1", and "+1+1" for the previous and 
the current symbol, respectively. In general, referring to Equation 3.3 and Equation 3.4, if a 
significant residual ISI term at position 𝑖 is present at the equalizer output, then the ISI PDF is 
computed by making 𝑚 ≠ 0 and 𝑚 ≠ 𝑖. The resulting ISI voltage PDFs are then shifted to 
± 𝑝 [0] ±𝑝[𝑖], yielding 4 ISI PDFs in place of 2 ISI PDFs that are obtained without this additional 
decomposition. Quantization noise PDFs are then extracted through transient simulations when 
the current received symbol is ±1 and the received symbol giving rise to the 𝑖𝑡ℎ ISI component is
±1. This technique is illustrated in Fig. 3.10, where 2 of the 4 decomposed ISI PDFs at 𝑝[0] ±
𝑝[𝑖] and their corresponding quantization noise PDFs are shown. Note that if a large number of 
significant residual ISI terms are present, then running multiple transient simulations can increase 
the total simulation time. Thus, the amount of symbols that are tracked and the utilized quantization 
noise PDFs can be varied to trade-off accuracy versus simulation time.  This radix error modeling 
technique is verified for the case of a 6-bit ADC with a compressive INL profile with maximum 
INL of 2 LSB as shown in Fig 3.11(a). The INL profile is said to be compressive because for 
certain inputs the ADC with non-linearity has a smaller slope in the output code than an ideal ADC 
35 
without non-linearity. The analog front-end non-linearity can also be merged with the ADC non-
linearity to get an effective ADC transfer characteristic. 20Gb/s PAM-4 modulation with a transmit 
swing of 900mVppd over the channel with 25dB loss at the Nyquist frequency from Fig. 3 is utilized 
to more clearly illustrate the impact of any receiver non-linearity. A longer 12-tap digital FFE and 
a 1-tap DFE is also employed due to the PAM-4 modulation sensitivity to residual ISI. 
Figure 3.10. Decomposing of the ISI PDF and convolving with the corresponding quantization 
noise PDF. Reprinted with permission from S. Kiran et al, IEEE ©July 4, 2018
Fig. 3.11(b) shows the PAM-4 voltage bathtub curves for an ADC without non-linearity, 
which allows for rapid simulation via direct convolution of the FFE-shaped quantization noise 
PDF with the ISI PDF. Fig. 3.11(c) shows the PAM-4 voltage bathtub curves for the ADC with 2 
LSB peak INL. Utilizing the 12-tap FFE and 1-tap DFE allows for a small amount of residual ISI 
and the use of only one symbol tracking in producing the level-dependent quantization noise PDFs. 
Excellent matching is obtained between the hybrid statistical model and transient results. In the 
simulation of Fig 3.11(d), one significant ISI term is introduced by a having a suboptimal DFE 
36 
coefficient equal to 70% of its ideal value. With this residual ISI, tracking of two symbols is 
required to achieve good matching of the hybrid statistical model with transient simulations. 
Figure 3.11. (a) Ideal and compressive 6-bit ADC characteristics. (b) 20Gb/s PAM-4 voltage 
bathtub curves for an ideal ADC, (c) an ADC with 2-LSB INL and small post-equalization 
residual ISI, and (d) an ADC with 2-LSB INL and a significant first postcursor ISI 
component after equalization. Reprinted with permission from S. Kiran et al, IEEE ©July 4, 
2018
It is interesting to note that with one significant residual ISI term, the middle eye is worse than the 
outer two eyes. This is because the middle eye has the same worst case patterns in terms of ISI and 
37 
quantization noise and their impacts combine with the same polarity. While for the outer eyes, the 
worst case patterns for ISI and quantization noise are different. The outer eyes' worst case ISI 
patterns and the corresponding quantization error terms combine with opposite polarities and 
reduce the impact of each other. Similarly, the worst case patterns for quantization error and the 
corresponding residual ISI errors again combine with opposite polarities. It should be noted that 
this is not a general result, but represents one possibility of how quantization noise and ISI can 
interact with each other. Overall, different transfer functions and FFE taps values can produce 
different results. 
3.4 Time-Interleaving Mismatch Errors 
Time-interleaving is utilized in order to enable the use of ADC-based receivers for multi-
Gb/s operation. With time-interleaving, 𝑀 parallel converters, each at a sampling rate of 
𝑓𝑠
𝑀⁄ , 
give an aggregate sampling rate of 𝑓𝑠. However, new types of errors are generated due to 
mismatches existing between different parallel channels. These mismatches can be divided into 
four types: gain, bandwidth, sampling time skew, and offset errors. 
Each time-interleaved channel can experience a different static gain due to mismatches in 
the parallel track-and-holds (T/Hs), reference circuitry, and other gain stages in the signal path. 
These device mismatches, along with layout asymmetries, also result in the different parallel ADCs 
displaying different effective bandwidths. While ideally the multiple clock phases used to sample 
the input signal are uniformly spaced, device mismatches and layout asymmetries in the clock 
generation and distribution to the parallel T/H blocks cause skew errors that shift the sampling 
instance for each parallel ADC. Finally, static offsets in the T/Hs, reference circuitry and 
comparators shift each parallel ADC’s transfer characteristic. 
38 
The effect of time-interleaving errors on ADC performance has been previously studied in 
terms of degradation in SNDR [23]. However, since the most important metric in ADC-based 
receivers is the BER, techniques to analyze the effect of different ADC mismatches on BER are 
necessary. The procedure shown in Fig. 3.12 is utilized to include the time-interleaving errors in 
the statistical model. For an 𝑀-channel time-interleaved system, 𝑀 different pulse responses are 
extracted utilizing the transmission channel, receiver front-end, and parallel ADC channel 
responses. This results in 𝑀 different continuous pulse responses that capture the parallel ADC 
channels' specific gain and bandwidth. These pulse responses are then sampled, taking into account 
the timing skew errors, to yield 𝑀 different sampled pulse responses. The new pulse responses are 
finally equalized through the digital FFE/DFE and the 𝑀 equalized pulse responses are used to 
generate the ISI PDFs required for obtaining BER curves. As an example, consider the case with 
no channel mismatches present and digital FFE utilized, the equalized pulse response, 𝑔[𝑛], is 
represented using the convolution equation as 
𝑔[𝑛] =  ∑ 𝛼[𝑘]ℎ[𝑛 − 𝑘]𝐾−1𝑘 ,  (3.9)
where ℎ[𝑛] is the ideal channel pulse response with 𝑁 ISI taps and 𝛼[𝑘] is a digital FFE with 𝐾 
coefficients such that 𝑁 ≥ 𝐾. Note that Equation 3.9 can be rewritten using a Toeplitz matrix 
representation [24]. Now, consider an 𝑀-way time-interleaved ADC with the previously described 
mismatches present. Each new input to the FFE comes from the next time-interleaved channel in 
relation to the previous input. In this scenario, 𝑀 different pulse responses are obtained depending 
on which of the 𝑀 channels provides the sample for the first FFE tap. The pulse responses are 



























































































































Figure 3.12. Modeling of time-interleaving ADC errors. Reprinted with permission from S. Kiran 
et al, IEEE ©July 4, 2018
Here 𝑙 ranges from 0 to 𝑀 − 1, 𝐿 is equal to 𝑁 + 𝐾 − 1, and the ((.))𝑀 operator represents 
the modulo 𝑀 operation. Fig. 3.13 shows the procedure of combining the residual ISI components 
from the lone pulse responses of the different time-interleaved channels. Let the 𝑀 pulse responses 
in the left half of Fig. 3.13 each have 𝑖 non-zero precursor values and 𝑗 non-zero postcursor values. 
When the main cursor of the 𝑙𝑡ℎ pulse response, 𝑔𝑙[𝑖], is at the FFE output, the residual precursor
ISI components contributed by all the 𝑀 channel responses are 𝑔((𝑙+1))𝑀[𝑖 − 1], 𝑔((𝑙+2))𝑀[𝑖 − 2],
and so on up to 𝑔((𝑙+𝑖))𝑀[0]. The residual postcursor ISI components contributed by all the 𝑀
channel responses are 𝑔((𝑙−1))𝑀[𝑖 + 1], 𝑔((𝑙−2))𝑀[𝑖 + 2], and so on up to 𝑔((𝑙−𝑗))𝑀[𝑖 + 𝑗]. This is
illustrated in the collection of ISI contributions from different channels at one particular time 
instant of interest shown in the right half of Fig. 3.13. The PDFs of these ISI components are 
40 
convolved together as shown in Equation 3.11. This results in 𝑀 different ISI PDFs, 𝑓𝐼𝑆𝐼𝑙 with 𝑙 ∈
[0, 𝑀 − 1], and the PDF with all the mismatches accounted for is the average of all these 𝑀 PDFs. 
𝑓𝐼𝑆𝐼𝑙(𝑣) = 𝑓𝐼𝑆𝐼,𝑔((𝑙+𝑖))𝑀[0]
∗ 𝑓𝐼𝑆𝐼,𝑔((𝑙+𝑖−1))𝑀[1]




So far, the effect of gain, bandwidth, and timing skew mismatch errors have been 
considered. Offset error results in an M-periodic signal at the ADC output that is filtered by the 
high-pass nature of the FFE. Assuming the FFE has 𝑢 precursor taps and 𝑤 postcursor taps, such 
that 𝐾 = 𝑢 + 𝑤 + 1, the filtered offset values at the output of the FFE are 
𝑂𝑜𝑢𝑡[𝑙] =  ∑ 𝛼[𝑘 + 𝑢]𝑂[((𝑙 + 𝑘))𝑀]
𝑤
𝑘= −𝑢 ,  (3.12)
where 𝑂(𝑙) are the offsets at the ADC time-interleaved channel outputs. These offset values 
effectively shift the ISI PDFs. The 𝑀 shifted ISI PDFs are then averaged to yield a final PDF, 




∑ 𝑓𝐼𝑆𝐼𝑙(𝑣) ∗ 𝛿(𝑣 − 𝑂𝑜𝑢𝑡[𝑙])
𝑀−1
𝑙=0 , (3.13) 
In order to verify the time-interleaving error statistical modeling techniques, transient 
simulations and statistical modeling results are compared for 10Gb/s NRZ operation over the 
channel with 25dB loss at the Nyquist frequency from Fig. 3.1. A 6 bit, 4-way time-interleaved 
ADC architecture is assumed, with the results shown in Fig. 3.14. These simulations utilize a 
0.6Vppd transmit swing with no transmit equalization. The receiver employs a 5-tap digital FFE 
and 2mVrms random noise at the ADC input is assumed. Overall, the statistical model matches well 
with transient simulations for time-interleaved gain, bandwidth, timing skew, and offset errors. 
41 
Figure 3.13. Pulse response through the digital equalizer for the M time-interleaved 
channels (left) and ISI contributions from different channels at one particular time instant 
(right). Reprinted with permission from S. Kiran et al, IEEE ©July 4, 2018
42 
Figure 3.14. Comparison of 10Gb/s NRZ voltage bathtub curves for the 25dB loss channel of 
Fig. 2 produced with the time-interleaving errors modeling techniques and transient simulations. 
A 4-way time-interleaved ADC with 6-bit and 1mVrms input noise is utilized with subsequent 5-
tap digital FFE. Simulation results for (a) gain, (b) timing skew, (c) bandwidth, and (d) offset 
errors. Reprinted with permission from S. Kiran et al, IEEE ©July 4. 2018
3.5 Conclusion 
Chapter 3 presented a hybrid statistical modeling framework for ADC-based serial links. 
It builds on existing mixed-signal statistical models that capture ISI and jitter in a perfectly linear 
link. A new method to model CTLE nonlinearity in the presence of DFE based on a PDF 
decomposition technique was introduced. Quantization noise was noise was then included by 
modeling it as a uniformly distributed independent noise source. The second contribution of this 
chapter was the modeling of non-uniform quantization in the presence of residual ISI utilizing the 
43 
PDF decomposition technique previously introduced. The final contribution of this chapter is the 
modeling of time-interleaving mismatches by treating the TI-ADC as LPTV system. The entire 
hybrid statistical model has been verified by comparing its results with a transient bit-by-bit 




4. A 52GB/S ADC-BASED PAM-4 RECEIVER WITH REFERENCE SCALED 2BIT/STAGE 
SAR ADC AND PARTIALLY-UNROLLED DFE† 
 
This chapter presents an ADC-based PAM-4 receiver employing a 32-way time-
interleaved, 2-bit/stage, 6-bit SAR ADC with a single capacitive reference DAC and a DSP with 
a 12-tap FFE and a 2-tap DFE. A new DFE architecture that reduces the complexity of a PAM-4 
DFE to that of an NRZ DFE while simultaneously nearly doubling the maximum achievable data 
rate is presented. Partial analog equalization is provided in the form of a programmable two stage 
CTLE and a 3-tap FFE that is embedded in the ADC using a non-binary FFE DAC to improve the 
FFE coefficient coverage space. Utilizing the partial analog equalization, a digital baud-rate 
CDR’s Mueller-Muller phase detector is placed directly at the ADC output to avoid excessive loop 
delay. Architecture and circuit details along with measurement results is presented next. 
4.1 PAM-4 DFE Challenges 
The decision feedback equalizer (DFE) is a powerful nonlinear equalization technique that 
can offer a significant advantage over a linear equalizer, such as an FFE, since it can cancel post-
cursor ISI without noise and crosstalk amplification. The impact of including 2-taps of DFE is 
shown in Fig. 4.1 by plotting the receiver voltage margin at a BER of 10-6 for a given number of 
FFE taps under 2 different noise assumptions. The simulation assumes a data rate of 52Gb/s with 
the channel having a loss of 30dB at the Nyquist frequency of 13 GHz. No Tx equalization is 
included with the transmitter having a peak to peak differential swing of 0.8Vppd. The receiver is 
                                                 
† Part of this work is reprinted with permission from “S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, “A 32 
Gb/s ADCbased PAM-4 receiver with 2-bit/stage SAR ADC and partially-unrolled DFE,” in IEEE Custom 




assumed to employ a 2-stage CTLE and a 3-tap FFE embedded in the ADC with the ADC having 
a resolution of 6 bits. A random noise of 3mVrms is added to the signal at the input of the ADC. 
This represents the total output noise from the analog front-end and the input referred noise of the 
comparator itself. Fig. 4.1 shows that FFE only receiver needs about 18 taps to achieve a voltage 
margin greater than 10mV with very small improvements on further increasing the number of FFE 
taps. At this point, the voltage margin is mainly limited by random noise and timing uncertainty. 
However, by including 2 DFE taps, a BER of 10-6 with a voltage margin of 19mV can be achieved 
with 12 FFE taps. In order to achieve a comparable voltage margin with the FFE only receiver, 16 
FFE taps and a very low random noise of 1mVrms is necessary. Designing an analog front-end with 




     (a)                                                                   (b) 
 
Figure 4.1. (a) Channel with 31dB loss at a frequency of 13GHz and (b) Voltage margin 







4.1.1 PAM-4 DFE Loop-Unrolling 
The concept of loop-unrolling was introduced in section 2.2.2. It was shown that while 
loop-unrolling significantly reduced the critical path delay, it comes at the cost of increased circuit 
complexity. This increase in complexity is particularly severe in the case of PAM-4 modulation as 
the number summers and slicers for each additional DFE tap scales by 4 times instead of 2 times 
as in PAM-2 DFE. Fig. 4.2 shows a 2-tap loop-unrolled PAM-4 DFE that requires 20 summers 
and 16 slicers. In general, for an N-tap DFE, the number of summers and slicers needed are ∑ 4𝑖𝑁𝑖=1
and 4𝑁 respectively. While the dramatic increase in gate count is one major challenge in a loop-
unrolled PAM-4 DFE, the other major challenge is the longer critical path delay in a PAM-4 DFE 
multiplexer loop compared to the critical path delay in a PAM-2 DFE multiplexer loop. 




In a PAM-4 DFE, the critical path has to pass through multiple 4:1 multiplexers, as shown 
in Fig. 4.3, instead of 2:1 multiplexers as in a PAM-2 DFE. Fig. 4.3 shows the timing paths for a 
2-tap PAM-4 DFE implemented with P parallel paths. Either of these two paths or a path that 
switches between these two paths can be the critical timing path depending on the actual routing 
and gate delays resulting from digital auto place and route flow. This increased delay necessitates 















Figure 4.5. 2:1 mux count obtained by digital synthesis flow for a 52Gb/s PAM-4 DFE in 65nm 





Fig. 4.4 shows a 1-tap PAM-4 look-ahead multiplexer with a look-ahead factor for 4. Look-
ahead multiplexing technique also results in a dramatic rise in multiplexer count with an increase 
in DFE tap count as shown in Fig. 4.5. The number of 2:1 multiplexers needed for an 𝑁-tap DFE 
implemented with 𝑃 parallel paths and a look-ahead factor of 𝐿𝐹 is 6 ∗ 𝑃 ∗ ((𝐿𝐹 − 1) ∗ 4𝑁 +
∑ 4𝑖).𝑁−1𝑖=1   
4.2 Receiver Architecture 
Fig. 4.6 shows the complete ADC-based receiver architecture. The key components of the 
architecture are an analog front-end consisting of a CTLE, a 32-way time-interleaved 6-bit SAR 
ADC, a 64-way parallel DSP consisting of a 12-tap FFE and a 2-tap DFE with an additional parallel 
FFE for simplifying the DFE architecture, a phase generation block that generates 8 1-UI spaced 
clocks needed by the track and hold circuits by dividing an external clock by 4, and a baud rate 
digital phase interpolator based CDR loop employing a Mueller-Muller phase detector.  Fig. 4.7 
shows the analog front-end consisting of a 2 stage CTLE, a gain stage and a source follower stage. 
Programmable capacitor banks in the CTLE provides 8.5-15 dB of gain peaking at a frequency of 
13GHz. The resistor in the second stage CTLE allows for variable DC gain and is used to ensure 
the CTLE-VGA front end output swing spans the full scale range of the ADC. By reducing the 
strength of the ISI components and boosting the main cursor, the CTLE increases the ratio of the 
main cursor to the quantization noise without having to design a higher resolution ADC. The output 
of the CTLE drives an 8-way time-interleaved track and hold (T/H) circuit. The schematic of the 
T/H circuit is shown in Fig. 4.8. The T/H circuit consists of a bootstrapped switch followed by a 
source follower. Due to the presence of a high pass path comprising of the resistor R and the 
capacitor C, the source follower reduces to a flipped voltage follower (FVF) at high frequencies 
leading to bandwidth extension [25] while providing some flexibility is setting the bias conditions. 
50 
The cross-connected NMOS transistors Mgb1 and Mgb2 provide a DC gain boost. The T/H buffer 
has a gain of -2.2dB without gain boosting while with gain boosting the buffer is designed to have 
a gain of 1.6dB. 
Figure 4.6. The complete receiver architecture. 


















For a constant ADC full scale range, this causes the output swing of the stages preceding 
the T/H buffer to be lower and hence improves their linearity. Fig. 4.9 shows the post-layout 
linearity in terms of total harmonic distortion (THD) for the analog front-end with the T/H buffer 
being a simple source follower, the bandwidth extended FVF and the gain-boosted FVF. The gain-
boosted FVF shows an improvement of more than 3dB up to 10GHz. 
Eight of these T/H blocks are arranged in a time-interleaved fashion with each T/H block 
sampling at 3.25 GS/s using 8 1-UI spaced 3.25 GHz clocks to give an aggregate sampling rate of 
26 GS/s. Bias current control is provided in the T/H buffer to calibrate gain mismatches between 
the interleaved T/H blocks. The 8 critical sampling phases are generated from a differential 13 
GHz clock by dividing it with a CML latch based divide by 4 block as shown in Fig. 4.10 [26]. 
The 38.4ps spaced phases then pass through a bank of 64 current mode phase-interpolators 
controlled by the CDR loop [27-28]. The output of the phase interpolators are skew calibrated by 
a digitally controlled variable delay line with a delay step of 90 fs. The sampling phase is set by a 
baud rate CDR loop that employs a Mueller- Muller phase detector and proportional and integral 
loop filter [29]. A 3-tap FFE embedded in the ADC provides additional analog equalization before 
quantization noise is added to the signal. The equalization provided by the CTLE and the 
embedded FFE allows the placement of the CDRs Mueller-Muller phase detector directly after the 
ADC to avoid excessive loop delay [30]. The ADC employed in this design is a 32-way time-
interleaved 6-bit SAR ADC with a 2-bit/stage, loop-unrolled unit ADC. The time-interleaved ADC 
has 8 sub-ADCs, leading to 8 critical sampling phases, with each sub-ADC having 4 unit ADCs. 
The DSP, which has 64 parallel slices, employs a FFE-DFE combination with an additional 
parallel FFE to significantly reduce the DFE complexity. The main FFE has 12 taps with 3 pre-
cursor and 8 post-cursor taps while the DFE has 2 loop-unrolled taps canceling the 1st two post-
53 
cursor ISI components. The parallel FFE has 4-taps with 1 pre-cursor and 2 post-cursor taps. All 
the coefficients in the DSP are set through the SS-LMS algorithm. 
Figure 4.10. Phase generation block that generates the 8 1-UI spaced phases by dividing an 
external 13GHz clock and the timing diagram showing the 8 phases. 
4.3 ADC Design 
Time-interleaved SAR ADCs are a power efficient way of implementing multi-GS/s ADCs 
in CMOS technology. However, it is desirable to keep the time-interleaving factor low in order to 
reduce the loading on the analog front-end to simplify the time-interleaved channel mismatch 
calibration. This requires the design of high speed unit SAR ADCs which can be challenging. In 
order to speed up the unit SAR ADC conversion speed, techniques such as 2bit/stage [31] and 
loop-unrolling [32] can be employed. Previous 2bit/stage ADCs employ multiple reference DACs 
54 
and 3 comparators in each 2bit/stage flash ADC [31,33]. This leads to an increased loading and 
power consumption for the track and hold stage and a decrease in the FFE coefficient range when 
FFE is embedded in the ADC using a binary FFE DAC [6]. This design introduces techniques to 
address these issues as will be described in the sections that follow. 
4.3.1 Time-Interleaved ADC Architecture 
Fig. 4.11 shows the block diagram of the 32-way 6 bit 26 GS/s converter with 3-tap 
embedded FFE. The front-end T/H consists of 8 sub channels working at fs/8 = 3.25 GS/s, and 
each sub T/H drives 4 unit asynchronous 2b/stage SARs operating at fs/32 = 812.5 MS/s. Each 
unit ADC digitizes the difference between its current sample that is sampled on the reference DAC 
and the scaled and summed immediately preceding and succeeding symbols that are sampled on a 
dedicated non-binary FFE DAC,  to implement a 3-tap FFE with 1-precursor and 1-postcursor tap. 
4.3.2 Unit ADC Architecture 
Fig. 4.12 shows the loop-unrolled 2-bit/stage unit ADC that has three stages for the 6-bit 
conversion. Each stage employs a 2-bit flash ADC as the quantizing block with the reference levels 
internally generated by intentionally skewed comparator regeneration stages. The reference levels 
for the flash ADC of each stage scale according to the stage, from 0 to ±1/4Vref for the second 
stage to ±1/64Vref for the third stage and hence are not dynamically set. This allows the use of a 
single DAC and removes the overhead of multiple DACs present in other multi-bit/stage 
implementations [31,33]. The main cursor signal is top-plate sampled to avoid any signal 
attenuation caused by the comparator input capacitance and routing parasitics. 
55 
Figure 4.11. Block diagram of the 32-way time-interleaved SAR ADC. 
56 
Figure 4.12. Unit ADC architecture and timing diagram. 
57 
The unit SAR ADCs operate on a 812.5MHz 25% duty cycle clock leading to a track time 
of 307.7ps and a conversion time of 923.07ps as shown in Fig. 4.12. Loop-unrolled architecture 
leads to a significant reduction in the logic delay in the feedback path from the comparator to the 
DAC switches [32]. It also ensures that the comparators need to be reset until all the stages have 
completed evaluation. The flash ADC stages 1 and 2 generate the RDY signal which triggers the 
comparison in the subsequent stage. Delay lines are provided to delay the RDY signal to ensure 
sufficient time for DAC settling (tDAC1 and tDAC2).  Each of the 2-bit stages generate a 3-bit 
thermometer code that is fed back to a segmented thermometer DAC directly without an additional 
decoder to reduce logic delay. The 4b custom DAC is constructed with Cu = 1fF unit capacitors 
which allows for adequate matching for 6-bit resolution and the merged switch capacitor scheme 
(MCS) [34] is employed for good switching efficiency. At the end of conversion of all the three 
stages, the comparators are reset while the signal is being tracked and their outputs are retimed 
using the same clock. 
In order to embed a 3-tap FFE, samples from T/H blocks sampling the pre and post signal 
values with respect to the current signal value are sampled onto the bottom-plate of a differential 
FFE DAC. These pre and post samples are scaled and summed on the FFE DAC before they are 
effectively subtracted from the main cursor sample due to the differential connection at the flash 
ADC input. The use of 2-bit flash ADCs as the quantization block presents the challenge of the 
FFE DAC being loaded with 9 comparator input stages. There is the potential for significant 
attenuation from the comparator input capacitance due to the pre and post samples being bottom-
plate sampled, which could lead to a decreased range for the FFE coefficients. 
58 










In order to reduce the loading of the comparators on the FFE DAC, the input stage of the 
3 comparators of the flash ADC are shared (Fig. 4.13). This reduces the effective DAC loading by 
3X. Since the pre and post FFE samples are sampled on the same DAC, not all FFE coefficient 
60 
combinations are possible. As shown in Fig. 4.14, improved coefficient coverage is achieved with 
the implementation of a non-binary FFE DAC. Bottom plate sampling on the FFE DAC results in 
a gain of 0.48, which sets the range for the tap coefficient values. 
4.4 DSP Design 
The key architectural details of the DSP is presented in this section. As shown in Fig. 4.15, 
the main path through the DSP consists of a 12-tap FFE and a 2-tap DFE. In addition a low 
complexity parallel FFE with 4 taps is utilized to significantly reduce the DFE complexity. 
4.4.1 Main and Parallel FFE 
In comparison to an NRZ receiver, a PAM-4 receiver is more sensitive to residual ISI [30]. 
This necessitates the use of a longer span FFE than generally used in an NRZ receiver. However, 
the implementation of the FFE taps itself can be modulation format independent. Canonical signed 
digit (CSD) representation, which represents numbers with the least number of 1s possible, is a 
well-known power efficient way of representing numbers when performing multiplications with a 
constant number, like in the case of an FFE. 
Figure 4.15. DSP architecture showing the 12-tap main FFE, 4-tap parallel FFE and the 2-
tap loop-unrolled PAM-4 DFE.  Reprinted with permission from S. Kiran et al, IEEE ©April 
2018
61 
In this work, the FFE coefficients are represented using CSD representation with an 
additional restriction of having only a maximum of 2 1s for each coefficient. This restriction 
ensures each multiplication operation reduces to 2 shifts and 1 addition operation. Additional 
simplification of the FFE is achieved by having a reduced range for taps 9 to 12 of the main FFE 
since these tap values are typically small. The taps of the main FFE are co-optimized with the DFE 
according to the MMSE solution using the SS-LMS algorithm. This can result in two significant 
ISI terms at the first two post-cursor positions that will be cancelled by the DFE later on. Hence, 
at the output of the main FFE, there is very little useful information that can be further exploited 
for simplifying the DFE architecture. The taps of the parallel FFE are set to achieve the lowest 
BER possible at its output, again using the MMSE criterion. The resulting partially equalized 
signal at the parallel FFE output can be now be exploited to simplify the DFE architecture as will 
be described next. 
4.4.2 Partially-Unrolled DFE 
The key idea of the partially-unrolled DFE is as follows. As described earlier, during the 
loop-unrolling process in the DFE, all four possible values for the previous symbols are used to 
precompute all the possible equalized values for the current symbol. For example, in a 2-tap PAM-
4 DFE, to compute all the possible equalized values for the symbol at time 𝑛, 4 possible choices 
for the symbol at time 𝑛 − 1 and 4 possible choices for the symbol at time 𝑛 − 2 are used, resulting 
in 16 possible equalized symbol values. If the previous symbols have been partially equalized, as 
it is at the output of the parallel FFE, then 2 of the most unlikely values for each of the previous 
symbols can be safely discarded during the loop-unrolling process without incurring an error that 
impacts the overall target BER. Hence, it can be said that the loop is only partially unrolled. With 
reference to Fig. 4.16, which shows the PDF of the received signal at the output of parallel FFE 
62 
and the CDF for normalized transmitted symbol 0.33, if the previous symbol falls in region 1, then 
the transmitted symbol most likely corresponds to the normalized values of -1 or -0.33. Hence, 
during the loop unrolling process, DFE coefficients corresponding to symbols 0.33 and 1 are not 
used in precomputing the possible equalized symbols for the current symbol. Similarly, for 
symbols falling in region 2, symbols -1 and +1 are eliminated in the loop- unrolling process and 
for symbols falling in region 3, symbols -1 and -0.33 are eliminated during the loop unrolling 
process. Continuing with the example of the 2-tap DFE, since there are now 2 choices for both the 
symbols at time instants 𝑛 − 1 and 𝑛 − 2, there are only 4 possible equalized values for the current 
symbol. The number of sums to be computed has therefore been reduced from 16 to 4 for a 2-tap 
DFE as shown in Fig. 4.16. 
Apart from reducing the gate count due to a lower number of summers and slicers required, 
PU-DFE also reduces the number of multiplexer required to implement the look-ahead multiplexer 
loop. The critical feedback timing path in a conventional PAM-4 DFE implementation consists of 
a 4:1 multiplexer whereas, in PU-DFE it consists of a 2:1 multiplexer. This reduces the critical 
path length by nearly half and doubles the maximum data rate achievable for a given look-ahead 
factor in the multiplexer loop. The look-ahead multiplexer loop in a PU-DFE is shown in Fig. 4.17. 
4.4.3 Critical Path Optimization 
For each of the past symbols that have been loop-unrolled, a 2-bit past decision has to 
choose the correct choice among the 2 options generated during the loop-unrolling process. With 
reference of Fig. 4.18(a), when the parallel FFE output falls in Region 1 the input to summing 
elements in the DFE are -1*a1 and -0:33*a1 signifying that the partial decision for the symbol is 
00 and 01 respectively. Now if the actual decision on the previous symbol, which is fed back to 




LF, is 00 then the select line S of the multiplexer has to be 0 and if the it is 01 then the select line 
has to be 1. This is shown in the table on the right. Note that the output of the decision elements 
are assumed to be gray coded. Similar analysis can be carried out for symbols falling in Region 2 
and Region 3. The key point to note from the table in Fig. 4.18(a) is that for the same value of the 
delayed previous symbol 𝑌𝑛−𝐿𝐹, the select line for the multiplexer can be either a 0 or 1 depending 
on the region where the parallel FFE output happens to fall in. This ambiguity is resolved by using 
an additional variable R to indicate which region the parallel FFE output fell in. The select line 
can be written as a function of R and 𝑌𝑛−𝐿𝐹 as shown in Equation 4.1. The additional combinational 
logic delay of 3 gates in the path of the select line of the final multiplexer increases the critical 
path delay and therefore any advantage that can be obtained with the PU-DFE architecture in terms 
of reduced delay in the critical path might be lost due to this overhead. 
 𝑆 = 𝑅𝑌𝑛−𝐿𝐹[0]̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ + 𝑌𝑛−𝐿𝐹[1]𝑌𝑛−𝐿𝐹[0]̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ + 𝑅𝑌𝑛−𝐿𝐹[1],  (4.1) 
This overhead logic can be removed by 2 simple modifications to the DFE architecture. 
First, the PAM-4 levels are binary coded and second the input to the summing elements in the DFE 
selected through the 4:2 mux are swapped when the parallel FFE output falls in region 2. This 
results in the implementation as shown in Fig. 4.18(b) and from the resulting table it is clear that 
the select line of the multiplexer, S, is simply 𝑌𝑛−𝐿𝐹[0]. Hence, no additional combinational logic 
is necessary to generate the select line. The additional delay introduced in the 4:2 mux for 
swapping its output in Region 2 can be easily tolerated as this delay is not in the timing critical 
multiplexer loop. This simple trick can nearly double the maximum achievable data rates. 
64 
Figure 4.16. PDF of the signal at the output of the parallel FFE and the CDF for 0.33 symbol.
Reprinted with permission from S. Kiran et al, IEEE ©April 2018





Figure 4.18. (a) Straight forward and (b) optimized implementation of mux selection logic. 
Fig. 4.19 compares the gate count for a 2-tap conventional PAM-4 DFE and PU-DFE in 
65nm technology for various look-ahead factors implemented using a digital synthesis flow. The 
PU-DFE architecture has about 3X lower gate count compared to the traditional architecture. Fig. 
4.20 shows the power consumption comparison for the two architectures with the overhead from 
the parallel FFE included in the case of the PU-DFE architecture. PU-DFE architecture has about 
2X lower power consumption. Fig. 4.21 shows the maximum achievable data rate in 65nm 
66 
technology for the two architectures with the PU-DFE architecture achieving about 2X higher data 
rate compared to the conventional architecture.  
4.4.4 Advantages of PU-DFE 
The advantages of PU-DFE are summarized in Table 4.1. 
Table 4.1. Complexity comparison between conventional PAM-4 DFE and PU-DFE. 
Figure 4.19. Gate count comparison between a conventional DFE and PU-DFE for a 2-tap PAM-



























Figure 4.20. Power consumption comparison between a conventional DFE and PU-DFE for a 2-
tap PAM-4 DFE in 65nm technology. 
Figure 4.21. Maximum achievable data rate comparison between a conventional DFE and PU-
DFE for a 2-tap PAM-4 DFE in 65nm technology. 
4.5 Measurement Results 
Fig. 4.22 shows the chip micrograph of the PAM-4 ADC based prototype fabricated in GP 
65nm process. The total chip area is 2.61mm2 with the core ADC and the DSP occupying 0.41mm2 
and 1.17mm2 respectively. A set of 6 high-speed output buffers with a multiplexer at the input that 
68 
can select either the ADC output or the DSP output help characterize the ADC and the DSP 
separately. The receiver is characterized at 2 data rates of 32Gb/s and 52Gb/s corresponding to 
ADC sampling rates of 16GS/s and 26GS/s respectively. 
4.5.1 ADC Characterization 
Before the ADC characterization begins, termination resistor tuning and CTLE-VGA 
analog front-end offset calibration is performed. Each of the 284 comparator offsets in the TI-ADC 
are calibrated by applying a differential DC input equal to the desired threshold for that particular 
comparator and tuning the threshold of the comparator until an equal distribution of ones and 
zeroes are obtained at the output. Skew between the different sampling phases is calibrated using 
a foreground technique where a sinusoid at a frequency equal to the sampling rate is applied and 
the output from the 8 sub-ADCs are made equal by digitally tuning a bank of variable delay lines. 
Gain calibration is performed by tuning the bias current of the T/H buffer. 















Fig. 4.23 and Fig. 4.24 shows the SNDR and SFDR as a function of the input frequency 
for a sampling rate of 16GS/s and 26GS/s respectively. A low frequency SNDR of 30.29 dB giving 
an ENOB of 4.74 bits is achieved for both sampling rates of 16GS/s and 26GS/s. At the sampling 
rate of 16GS/s, a Nyquist frequency SNDR of 27.58 giving an ENOB of 4.29 bits is achieved while 
70 
at the sampling rate of 26GS/s, a Nyquist frequency SNDR of 26.14dB giving an ENOB of 4.05 
bits is achieved. The high frequency SNDR is limited by jitter and residual timing skew. Fig. 4.25 
shows that the maximum DNL and INL values for the ADC are -0.22LSB and -0.53LSB. 
Figure 4.25. Measured INL/DNL plot. 
4.5.2 Analog Front-End Characterization 
The frequency response of the analog front-end is characterized by sweeping the input 
signal frequency while looking at the ADC output. The resulting frequency response for various 
CTLE settings is shown in Fig. 4.26. The analog front-end provides a gain peaking of 8.5dB to 
15dB at 13GHz. 
71 
Figure 4.26.  Measured CTLE magnitude response. 
4.5.3 Receiver Characterization 
The measured channel responses for both 32Gb/s and 52Gb/s characterization is shown in 
Fig. 4.27. The complete receiver performance is characterized at two data rates of 32Gb/s and 
52Gb/s using the test setup in Fig. 4.28 (top and bottom respectively). For 32Gb/s operation two 
NRZ bitstreams from a Xilinx FPGA board are power combined to produce PAM-4 data. The data 
is then transmitted over two channels with effective losses of 27dB and 30dB. The measured timing 
bathtub curve is shown in Fig. 4.29(a). For the 27dB loss channel a BER < 10-11 is achieved while 
for the 30dB loss channel a BER < 10-9 is achieved. For testing at 52Gb/s, PRBS-15 PAM-4 data 
from a custom Tx chip with no transmit equalization passed over two channels of losses 26dB and 
31dB. The resulting BER performance is shown in Fig. 4.29(b). A BER less than 10-8 is achieved 
for a 26 dB loss channel and a BER of less than 10-6 is achieved for a 31 dB loss channel. For both 
32Gb/s and 52Gb/s operation the timing bathtub curves are obtained by stepping the phase 
72 
interpolator codes with the CDR in open-loop. Results with the CDR activated are also shown for 
both 32Gb/s and 52Gb/s operation for channels losses of 27dB and 26dB respectively, verifying 
that the CDR locks near the optimal BER point. The recovered clock jitter histogram showing 
random jitter of 939fsrms is shown in Fig. 4.30. 
Table 4.2 summarizes the receiver performance and compares it with other ADC-based 
receivers at data rates above 25 Gb/s. The complete 52Gb/s ADC-based receiver achieves a power 
efficiency of 8.05 pJ/bit, including all the front-end, ADC, and DSP power. Utilizing the CTLE 
front-end, embedded 3-tap FFE in the ADC, and the DSP with the PU-DFE allows for 
compensation of comparable channel loss to the other PAM-4 receivers without employing any 
transmit equalization. 
Figure 4.27. Measured channel responses. 
73 




Figure 4.29. Measured timing bathtub curves for (a) 32Gb/s operation (and (b) 52Gb/s operation.
 (a) Reprinted with permission from S. Kiran et al, IEEE ©April 2018
Figure 4.30. Measure recovered clock jitter histogram.  Reprinted with permission from 
S. Kiran et al, IEEE ©April 2018 
75 
Table 4.2. Performance summary. 
4.6 Conclusion 
This chapter presented a 52 Gb/s PAM-4 receiver with a time-interleaved 6-bit SAR ADC 
with 2-bit/stage unit ADC and embedded 3-tap FFE. A new PU-DFE architecture reduces the 
PAM-4 DFE complexity to that of NRZ DFE to both provide reduced gate count and higher data 
rate operation. The receiver achieves a measured BER < 10-6 with a 31 dB loss channel without 




5. CONCLUSION AND FUTURE WORK 
 
5.1 Conclusion 
ADC-based receivers enable power digital equalization techniques to handle higher loss 
than mixed-signal receiver can. However, both the ADC and the DSP can be power hungry leading 
to worse power efficiency compared to mixed-signal receivers. Multi-level modulation schemes 
such as PAM-4 only further exacerbate this problem by introducing additional design challenges.  
To tackle the power efficiency problem, accurate modeling of link systems and design 
techniques specific for multi-level modulation schemes are necessary. This dissertation first 
presented a hybrid statistical modeling framework that incorporates ADC specific errors such as 
quantization noise in the presence of non-linearity and residual ISI and time-interleaving ADC 
mismatch errors that can be utilized to study the trade-offs of different choices. The statistical 
techniques were verified with bit-by-bit transient simulations that showed excellent matching. 
In the second part of this dissertation, a complete 52Gb/s PAM-4 receiver was presented 
that introduced a new DFE architecture called the partially-unrolled DFE (PU-DFE) that reduced 
the PAM-4 DFE complexity to that of PAM-2 DFE while simultaneously nearly doubling the 
maximum achievable data rate. The receiver was able to achieve a BER less than 10-6 while 
operating over a channel with 31dB loss with no transmit equalization. The receiver was also able 








5.2 Future Work  
In this chapter two additional ADC based receiver architectures, based on error correction 
and analog multi-tone modulation, will be described that has potential to achieve low power 
operation and increased jitter robustness as the unit interval shrinks, respectively.  
5.2.1 Single Parity Check Code for ADC-Based Serial Link 
Taking advantage of the partial equalization afforded by embedding a low-overhead linear 
equalizer in the ADC, previous work [11] has proposed dividing received symbols in to two 
categories: reliable and unreliable. Symbols falling close to decision threshold are classified as 
unreliable and symbols far away from the decision threshold as reliable. A significant amount of 
power can be saved in the DSP by enabling the equalizer to function only for the unreliable 
symbols and power gating it for the reliable symbols.  
Building on this idea, if the transmitter inserts one parity check bit for every N-1 
transmitted bits, the received symbols that are classified as unreliable can be discarded, i.e. treated 
as erasures and then can be filled up using the single parity check (SPC) condition. Thus every 
single bit erasure in a block of N bits can be filled up. If there are more than 2 erasures in a block, 
the block is not modified and probability of having an error in this block is the same as the pre-
SPC BER.  For this system, the post-SPC BER is given by 
 𝑃𝑒𝑟𝑟𝑜𝑟 = (1 − (𝑃𝑒𝑟𝑎𝑠𝑒
𝑁 + 𝑁𝑃𝑒𝑟𝑎𝑠𝑒(1 − 𝑃𝑒𝑟𝑎𝑠𝑒)
𝑁−1)) ∗ 𝑃 + 𝑃2.   (5.1) 
Here 𝑃𝑒𝑟𝑟𝑜𝑟 is the post-SPC BER, 𝑃𝑒𝑟𝑎𝑠𝑒 is the probability of an unreliable symbol or 
erasure, 𝑃 is the pre-SPC BER and 𝑃2 is the probability that a symbol classified as reliable is 
erroneous. Equation 15 assumes that the probability of erasures for any symbol is independent of 
the other symbols. This is in general not true since the channel introduces correlation among the 




becomes necessary as shown in Fig. 5.1 [35]. Finally, the impact of SPC on ADC-resolution in 
shown in Fig. 5.2. Fig. 5.2 shows the improvement in BER with the proposed erasure filling 
scheme for different ADC resolutions. For a target BER of 10-12 at a data rate of 64Gb/s, up to 2 
bits of ADC resolution can be saved using the proposed scheme. The simulation results in Fig. 5.2 
assume a transmit swing of 0.9 Vppd and a noise sigma of 2.5mV at the ADC input. Equalization 
in the analog domain is provided by a CTLE and a 3-tap embedded FFE. Equalization in the digital 




Figure 5.1. Transceiver architecture for implementing a single parity check code based 










5.2.2 Frequency Domain ADC-Based Serial Link Receiver Architecture 
As the high-speed data symbol times shrink, transmission over both severe low-pass 
electrical channels and dispersive optical channels result in significant inter symbol interference. 
This necessitates increased equalization complexity, consideration of more bandwidth-efficient 
modulation schemes, such as baseband PAM4 and coherent QAM, and the use of forward error 
correction. Serial links that utilize an analog-to-digital converter (ADC) receiver front-end offer a 
potential solution, as they enable more powerful and flexible digital signal processing (DSP) for 
equalization and symbol detection and can easily support advanced modulation schemes. 
Unfortunately, sampling clock jitter places fundamental performance limitations on common time-
interleaved ADC architectures, necessitating clock generation and distribution circuitry that 
achieve rms jitter of a few hundred femtoseconds. Jitter degrades performance in these systems 
because, although the sampling clocks of a time-interleaved ADC run at a fraction of the Nyquist-
rate frequency, every channel still produces jitter-induced noise from sampling the full input-signal 
bandwidth, whose power is given 𝜎𝑛
2 = (√2𝜋𝐵𝐴𝜎𝑗)
2
, where B and A are the signal bandwidth and 
amplitude, respectively. Therefore, although the sampling rate in each channel of a time-
interleaved ADC is relaxed by the number of channels, the jitter requirement is the same as in a 
single-channel ADC. By limiting the signal bandwidth at the sampler, as in an OFDM system, 
significant jitter robustness can be achieved since for an N-channel receiver the jitter induced noise 
is 𝜎𝑛
2 = (√2𝜋𝐵𝐴𝜎𝑗 𝑁⁄ )
2
 [36]. 
This section presents architectures that are robust to jitter by either employing multiple 
channels with increased symbol period time or by splitting a conventional PAM-M baseband signal 




Fig. 5.3 shows the proposed frequency-domain ADC-based receiver. The input CTLE 
drives the front-end channels that have a mixer for down-conversion, a Bessel low-pass filter, and 
an ADC for sampling and digitization. These digitized samples are then processed by the FIR 
filters in the DSP and their outputs are combined to either perform symbol estimation in PAM-4 
baseband mode or to perform both inter-channel interference (ICI) and ISI cancellation in multi-
tone mode. The receiver can handle either a baseband modulated signal such as PAM-4 modulation 








When the input signal is baseband modulated, the receiver acts a frequency channelizing 
receiver. For a 128Gb/s receiver employing PAM-4 modulation with a symbol rate of 64GS/s, the 
receiver is configured as a 3-channel frequency channelizing receiver. The ADC pairs in two 




dummy mixers and Bessel low pass filters pass only the low frequency content of the signal 
through to the ADC to be digitized. The next set of mixers in channel 3 and channel 4 employ an 
LO frequency of 21.33 GHz which is 1/3rd the baud rate. The final set of mixers in channel 5 and 
channel 6 employ an LO frequency of 32 GHz which ½ the baud rate. The LO frequencies are an 
integer ratio of the baud rate to make the sampled system LTI [37]. The architecture is similar to 
a hybrid filter bank ADC with the key difference being that the digital reconstruction filter 
performs direct symbol estimation and does not implement the perfect reconstruction FIR filters 
[38]. The mixer is followed by a low pass filter with the jitter robustness increasing as the order of 
the filter increases. The 5th order low pass filter is a practical choice. 
A 128 Gb/s multi-tone signaling scheme can employ 5 12.8 GS/s channels consisting of 1 
baseband PAM-4 channel, 1 QAM-64 channel centered at 25.6 GHz and 1 QAM-4 channel 
centered at 51.2 GHz. In this scenario the receiver employs channel 1 for receiving the baseband 
PAM-4 signal with channel 3 and channel 4 employed for the QAM-64 signal and channel 5 and 
channel 6 utilized for the QAM-4 signal. Channel 2 is disabled in this operation mode. Fig.5.4 
shows the signal PSD at the transmitter and a channel having 30 dB loss at 32 GHz which is used 
to present the simulation results. 
Fig. 5.5 shows the jitter robustness comparison for a traditional time-interleaved system, 
PAM-4 baseband modulation system, the 5-channel multi-tone signal described in the previous 
section and a QAM-32 system. For a target BER of 10-12, the Fig. 5.5 shows that the 128 Gb/s 
frequency channelized PAM-4 receiver provides 1.6X improvement in jitter robustness compared 
to the time-interleaved system. The 5-channel multi-tone signal provides close to 6X improvement 
while the QAM-32 system provides about 3X improvement. The improvement in jitter robustness 




into 3-channels before it is sampled. The high frequency attenuation provided by the 5th order 
Bessel low pass filter reduces the signal bandwidth and improves the jitter robustness.  For the 5-
channel multi-tone signal the improvement in jitter robustness comes from a combination of 2 
effects. First, the lower symbol rate of 12.8 GS/s results in a wider pulse compared to the PAM-4 
time-interleaved system and this leads to reduced timing sensitivity. This is illustrated in Fig. 5.6. 
Second, the mixer self-equalization [39] results in the sampler’s input signal being equalized and 
hence the residual ISI contribution from a sub-optimal sampling point is greatly reduced. This is 
evident in the multi-tone pulse response in Fig. 5.6 for channel 3 and channel 5. The jitter 
robustness of QAM-32 system is due to the same reason as the 5-channel multi-tone system. 
However, the QAM-32 system has a higher symbol rate of 25.6 GS/s and hence is more susceptible 
to jitter. It should be noted that the simulation results of Fig. 5.5 also assume an equal jitter on the 



















In summary, an ADC-based frequency domain receiver can be configured for either a 
baseband PAM-4 modulated signal or for a 5-channel multi-tone modulation scheme 
employing 1 PAM-4 channel and 2 QAM channels. Simulation results show the improved jitter 
robustness in comparison with a traditional time-interleaved receiver with up to 6X 
























1. Y. Frans et al., "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved 
SAR ADC in 16-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 
1101-1110, April 2017. 
2. D. Cui et al., "3.2 A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with 
programmable gain control and analog peaking in 28nm CMOS," 2016 IEEE International 
Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 58-59. 
3. S. Rylov et al., "3.1 A 25Gb/s ADC-based serial line receiver in 32nm CMOS SOI," 2016 
IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 
56-57. 
4. Aurangozeb, A. D. Hossain, M. Mohammad and M. Hossain, "Channel-Adaptive ADC 
and TDC for 28 Gb/s PAM-4 Digital Receiver," in IEEE Journal of Solid-State Circuits, 
vol. 53, no. 3, pp. 772-788, March 2018. 
5. P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a 
configurable ADC in 16nm FinFET," 2018 IEEE International Solid - State Circuits 
Conference - (ISSCC), San Francisco, CA, 2018, pp. 108-110. 
6. L. Wang, Y. Fu, M. LaCroix, E. Chong and A. C. Carusone, "A 64Gb/s PAM-4 transceiver 
utilizing an adaptive threshold ADC in 16nm FinFET," 2018 IEEE International Solid - 
State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 110-112. 
7. B. Casper, M. Haycock, and R. Mooney, “An accurate and efficient analysis method for 
multi-Gb/s chip-to-chip signaling schemes,” in Proc. IEEE Symp. VLSI Circuits, June 




8. K. S. Oh, F. Lambrecht, S. Chang, Q. Lin, J. Ren, C. Yuan, J. Zerbe, and V. Stojanovic, 
“Accurate System Voltage and Timing Margin Simulation in High-Speed I/O System 
Designs,” IEEE Trans. Adv. Pack., vol. 31, no. 4, pp. 722–730, Nov. 2008. 
9. G. Balamurugan, B. Casper, J. Jaussi, M. Mansuri, F. O’Mahony, and J. Kennedy, 
“Modeling and analysis of high-speed I/O links,” IEEE Trans. Adv. Pack., vol. 32, no. 2, 
pp. 237–247, May 2009. 
10. A. Sanders, M. Resso, and J. DAmbrosia, “Channel compliance testing utilizing novel 
statistical eye methodology,” presented at the DesignCon, Santa Clara, CA, 2004. 
11. A. Shafik, E. Zhian Tabasy, S. Cai, K. Lee, S. Hoyos and S. Palermo, "A 10 Gb/s Hybrid 
ADC-Based Receiver With Embedded Analog and Per-Symbol Dynamically Enabled 
Digital Equalization," in IEEE Journal of Solid-State Circuits, vol. 51, no. 3, pp. 671-685, 
March 2016. 
12. X. Gu, K. J. Han, M. Cracraft, R. R. Donadio and Y. Kwark, "Efficient parametric 
modeling and analysis for backplane channel characterization," 2012 IEEE 62nd 
Electronic Components and Technology Conference, San Diego, CA, 2012, pp. 1880-1885. 
13. Y. Frans et al., "A 0.5–16.3 Gb/s Fully Adaptive Flexible-Reach Transceiver for FPGA in 
20 nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 50, no. 8, pp. 1932-1944, 
Aug. 2015. 
14. A. Momtaz and M. M. Green, "An 80 mW 40 Gb/s 7-Tap T/2-Spaced Feed-Forward 
Equalizer in 65 nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 45, no. 3, pp. 




15. R. Boesch, K. Zheng, and B. Murmann, “A 0.003 mm2 5.2 mw/tap 20 gbd inductor-less 
5-tap analog rx-ffe,” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, June 2016, pp. 170–
171. 
16. E. Zhian Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, “A 6 bit 10 GS/s TI-SAR 
ADC with low-overhead embedded FFE/DFE equalization for wireline receiver 
applications,” IEEE J. Solid-State Circuits, vol. 49, no. 11, pp. 2560–2574, Nov. 2014. 
17. S. Kasturia and J. H. Winters, “Techniques for high-speed implementation of nonlinear 
cancellation,” Selected Areas in Communications, IEEE Journal on, vol. 9, no. 5, pp. 
711–717, 1991. 
18. K. Parhi, “Design of Multigigabit Multiplexer-Loop-Based Decision Feedback 
Equalizers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 489-
493, Apr. 2005. 
19. P. Newaskar, R. Blazquez, and A. Chandrakasan, “A/D precision requirements for an ultra-
wideband radio receiver,” in IEEE Workshop on Signal Processing Systems, Oct 2002, pp. 
270–275. 
20. A. Hadji-Abdolhamid and D. Johns, “A 400-MHz 6-bit adc with a partial analog equalizer 
for coaxial cable channels,” in Proc. European Solid-State Circuits Conference, Sept 2003, 
pp. 237–240. 
21. G. Malhotra, “Method for analytically calculating BER (bit error rate) in presence of non-
linearity,” DesignCon, Santa Clara, Jan. 28-31, 2014. 
22. Widrow, I. Kollar, and M.-C. Liu, “Statistical theory of quantization,” IEEE Trans. 




23. Vogel, “The impact of combined channel mismatch effects in time-interleaved ADCs,” 
IEEE Trans. Instrum. Meas., vol. 54, no. 1, pp. 415–427, Feb. 2005. 
24. R. M. Gray, “Toeplitz and circulant matrices: A review,” Foundations and Trends in 
Communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006. 
25. R. G. Carvajal et al., "The flipped voltage follower: a useful cell for low-voltage low-power 
circuit design," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, 
no. 7, pp. 1276-1291, July 2005. 
26. R. Nonis, E. Palumbo, P. Palestri and L. Selmi, "A Design Methodology for MOS Current-
Mode Logic Frequency Dividers," in IEEE Transactions on Circuits and Systems I: 
Regular Papers, vol. 54, no. 2, pp. 245-254, Feb. 2007. 
27. R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama and H. Siedhoff, "A 10-gb/s 
CMOS clock and data recovery circuit with an analog phase interpolator," in IEEE Journal 
of Solid-State Circuits, vol. 40, no. 3, pp. 736-743, March 2005. 
28. J. L. Sonntag and J. Stonick, "A Digital Clock and Data Recovery Architecture for Multi-
Gigabit/s Binary Links," in IEEE Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1867-
1875, Aug. 2006. 
29. K. Mueller and M. Muller, "Timing Recovery in Digital Synchronous Data Receivers," 
in IEEE Transactions on Communications, vol. 24, no. 5, pp. 516-531, May 1976. 
30. K. Gopalakrishnan et al., "3.4 A 40/50/100Gb/s PAM-4 Ethernet transceiver in 28nm 
CMOS," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San 




31. Z. Cao, S. Yan and Y. Li, "A 32mW 1.25GS/s 6b 2b/step SAR ADC in 0.13μm 
CMOS," 2008 IEEE International Solid-State Circuits Conference - Digest of Technical 
Papers, San Francisco, CA, 2008, pp. 542-634. 
32. T. Jiang, W. Liu, F. Y. Zhong, C. Zhong, K. Hu and P. Y. Chiang, "A Single-Channel, 
1.25-GS/s, 6-bit, 6.08-mW Asynchronous Successive-Approximation ADC With 
Improved Feedback Delay in 40-nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 
47, no. 10, pp. 2444-2453, Oct. 2012. 
33. C. H. Chan, Y. Zhu, S. W. Sin, S. P. U and R. P. Martins, "A 3.8mW 8b 1GS/s 2b/cycle 
interleaving SAR ADC with compact DAC structure," 2012 Symposium on VLSI Circuits 
(VLSIC), Honolulu, HI, 2012, pp. 86-87. 
34. V. Hariprasath, J. Guerber, S. H. Lee and U. K. Moon, "Merged capacitor switching based 
SAR ADC with highest switching energy-efficiency," in Electronics Letters, vol. 46, no. 
9, pp. 620-621, April 29 2010. 
35. R. L. Narasimha, N. R. Shanbhag. “Design of energy-efficient high-speed links via forward 
error-correction (FEC),” IEEE Transactions on Circuits and Systems-II, vol. 57,no. 5, pp. 
359- 363, May 2010. 
36. S. Hoyos et al., "Clock-Jitter-Tolerant Wideband Receivers: An Optimized Multichannel 
Filter-Bank Approach," in IEEE Transactions on Circuits and Systems I: Regular Papers, 
vol. 58, no. 2, pp. 253-263, Feb. 2011. 
37. A. Amirkhany et al., "A 24 Gb/s Software Programmable Analog Multi-Tone 





38. Won Namgoong, "A channelized digital ultrawideband receiver," in IEEE Transactions on 
Wireless Communications, vol. 2, no. 3, pp. 502-510, May 2003. 
39. W. H. Cho et al., "10.2 A 38mW 40Gb/s 4-lane tri-band PAM-4 / 16-QAM transceiver in 
28nm CMOS for high-speed Memory interface," 2016 IEEE International Solid-State 
Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 184-185. 
 
