Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers by Gharibdoust, Kiarash
POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES
acceptée sur proposition du jury:
Dr J.-M. Sallese, président du jury
Prof. Y. Leblebici, Dr S. A. Tajalli, directeurs de thèse
Prof. P. K. Hanumolu, rapporteur
Dr T. Toifl, rapporteur
Prof. A. Shokrollahi, rapporteur
Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power 
Wireline Transceivers
THÈSE NO 6965 (2016)
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
PRÉSENTÉE LE 15 AVRIL 2016
 À LA FACULTÉ DES SCIENCES ET TECHNIQUES DE L'INGÉNIEUR
LABORATOIRE DE SYSTÈMES MICROÉLECTRONIQUES
PROGRAMME DOCTORAL EN MICROSYSTÈMES ET MICROÉLECTRONIQUE
Suisse
2016
PAR
Kiarash GHARIBDOUST

The only thing greater than the power of the mind
is the courage of the heart.
— John Nash
To my wife Haniyeh, my mother Parvaneh
and
to the memory of my father . . .

Acknowledgements
This thesis work would not have been possible without all the people that supported me
throughout these years. I have been especially lucky to have had so many great people
helping me out with diﬀerent aspects of my life in EPFL.
I would ﬁrst like to express my sincere gratitude to my advisor, Prof. Yusuf Leblebici for
his endless support, for his patience, motivation, and immense knowledge. His guidance
helped me to overcame the biggest obstacles I faced during my PhD, both on technical
and non-technical sides. I would like to thank him for his tolerance and patience in letting
me explore diﬀerent ideas, and for always encouraging me to think revolutionary about
the issues that I faced. I am extremely thankful to my co-advisor, Dr. Armin Tajalli, for
his advices and the fruitful discussions during the entire length of my PhD studies. I do
appreciate his interest, commitment and responsibility to my work. He has always been
kind to me with any questions during the course of my PhD research.
I am very grateful towards the members of my thesis committee: Prof. Pavan Kumar
Hanumolu, Dr. Thomas Toiﬂ, Prof. Amin Shokrollahi and Dr. Jean-Michel Sallese for
taking the time to read this manuscript and provide feedback. I would like to thank Dr.
Alain Vachoux for his timeless support on CAD tools and Dr. Alexandre Schmid for his
support with computing infrastructure.
I am indebted to my master thesis advisor, Prof. Mehrdad Sharif Bakhtiar for always
encouraging me in ﬁnding new solutions and giving me the best advice. Most part of my
PhD research relies on the analog design techniques, which I learned from this master. I
am grateful to Prof. Sadegh Jamali, my undergraduate professor, for always supporting
me and help me out in the most diﬃcult time of my undergraduate study.
I am thankful towards my colleagues with whom discussions have been source of new
ideas and solutions to diﬃcult problems: Dr. Vahid Majidzadeh, Dr. Alessandro Cevrero,
Dr. Hossein Afshari and Dr. Nikola Katic.
Many people in EPFL have contributed to this work. I acknowledge Gain Kim for his
assistance on system level analysis and simulation. I should thank Masoud Shahshahani
for his assistance in silicon measurements. I acknowledge Cosimo Aprile, Jonathan Narinx,
Elmira Shahrabi and Jury Sandrini for their positive attitude and the useful discussions
during all the coﬀee breaks. Having good friends like them made everything nicer in
v
vi Acknowledgements
EPFL for me. Thanks to Sylvain Hauser, for his patience and kind support for providing
the test setups for prototype measurements. I would like to thank Dr. Mahbod Heidari
and Dr. Omid Talebi for their great friendship and support.
I am grateful to my colleagues in Microelectronic Systems Laboratory (LSM) for their
friendship, fruitful discussions and collaborations: Dr. Radisav Cojbasic, Dr. Vladan
Popovic, Dr. Masha Shoaran, Reza Ranjandish, Sebastian Rodriguez, Seniz Kucuk and
also amazing Kerem Seyid.
I am hugely indebted to my parents, my brother and other family members who always
believed in me. Their timeless and countless supports were always motivating during all
years of my education. I am very thankful to my brother Kianoosh for his constant love,
and for ﬁlling in for me at home while I have been away.
My special thanks and enormous gratitude go to my mother Parvaneh for always believing
in me, and supporting me with her inﬁnite motherly love. Her unconditional love has
been always the major source of motivation for me during all my life.
I would like to greatly thank my beloved wife Haniyeh for giving me the reason to be
happy, and strength to work and live. I owe a special thanks to her for always supporting
me with her patience and everlasting love, and bearing with times of my mental and
physical absence. Without any doubt, she is the secret of my success. I am also very
grateful to Haniyeh’s family, and especially my mother-in-law Farideh, for being extremely
kind and supportive to me in these years.
Going through my life period, I have had many great advisors and teachers. Without any
doubt, my father was one of the greatest teacher and also my best friend among them all.
Although I lost him many years ago, I can still get enormous motivation by his memory. I
am hugely indebted to him and I just hope that some day I will repay the debt by trying
to approach the parenting ideal that he set by his example.
Lausanne, 11 March 2016 Kiarash Gharibdoust.
Abstract
Over the past few decades, incessant growth of Internet networking traﬃc and High-
Performance Computing (HPC) has led to a tremendous demand for data bandwidth.
Digital communication technologies combined with advanced integrated circuit scaling
trends have enabled the semiconductor and microelectronic industry to dramatically scale
the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber
Line (DSL). The key to achieving higher bandwidth is to employ equalization technique
to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk,
and environmental noise. Therefore, today’s advanced input/outputs (I/Os) has been
equipped with sophisticated equalization techniques to push beyond the uncompensated
bandwidth of the system.
To this end, process scaling has continually increased the data processing capability
and improved the I/O performance over the last 15 years. However, since the channel
bandwidth has not scaled with the same pace, the required signal processing and equal-
ization circuitry becomes more and more complicated. Thereby, the energy eﬃciency
improvements are largely oﬀset by the energy needed to compensate channel impairments.
Moreover, as the supply voltage scaling has saturated in ﬁner technology nodes, the
power dissipation in digital circuits cannot beneﬁt from voltage scaling, hence, the overall
energy-eﬃciency improvement has diminished.
In this design paradigm, re-thinking about the design strategies in order to not only
satisfy the bandwidth performance, but also to improve power-performance becomes an
important necessity. It is well known in communication theory that coding and signaling
schemes have the potential to provide superior performance over band-limited channels.
However, the choice of the optimum data communication algorithm should be considered
by accounting for the circuit level power-performance trade-oﬀs.
In this thesis we have investigated the application of new algorithm and signaling schemes
in wireline communications, especially for communication between microprocessors, mem-
ories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has
been developed during the course of this research. The system-level and circuit-level anal-
ysis, design, and implementation of the proposed signaling method has been performed in
the frame of this work, and the silicon measurement results have proved the eﬃciency
vii
viii Acknowledgements
and the robustness of the proposed signaling methodology for wireline interfaces.
In the ﬁrst part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop
bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology.
Reducing the complexity of the equalization circuitry on the receiver (RX) side, the
proposed architecture achieves 1 pJ/bit link eﬃciency for a MDB channel bearing 45 dB
loss at 2.5 GHz. The measurement results of the ﬁrst prototype conﬁrm that NRZ/MT
serial data TRX can oﬀer an energy-eﬃcient solution for MDB memory interfaces. The
core size area is 85× 60 μm2 and 150× 60 μm2 for the transmitter (TX) and receiver
(RX), respectively.
Motivated by the satisfying results of the ﬁrst prototype, in the second phase of this
research we have exploited the properties of multi-tone signaling, especially orthogonal-
ity among diﬀerent sub-bands, to reduce the eﬀect of crosstalk in high-dense wireline
interconnects. A four-channel transceiver has been implemented in a standard CMOS
40 nm technology in order to demonstrate the performance of NRZ/MT signaling in
presence of high channel loss and strong crosstalk noise. The proposed system achieves
1 pJ/bit power eﬃciency, while communicating over a MDB memory channel at 36 Gb/s
aggregate data rate. The MT nature of the proposed transceiver helps to control the
ISI and reduce the far-end crosstalk (FEXT), which results in a very energy-eﬃcient
implementation. The core size area is 80× 60 μm2 and 130× 60 μm2 for the TX and
RX blocks (including the clock unit), respectively.
Keywords: Decision-feedback equalizer (DFE), diﬀerential signaling, inter-symbol in-
terference (ISI), multi-drop bus (MDB), multi-tone signaling, nonreturn-to-zero (NRZ)
signaling, far-end crosstalk (FEXT), source-synchronous architecture, dual in-line memory
module (DIMM), double-data rate (DDR), Internet of Things (IoT), high-performance
computing (HPC).
Résumé
Au cours des dernières décennies, la croissance incessante du traﬁc du réseau internet et
du calcul haute performance (“high performance computing”) a conduit à une demande
énorme de bande passante. Les technologies de communications digitales combinées avec
les tendances de mise à l’échelle des circuits intégrés de haute performance ont permis
l’industrie des semi-conducteurs et de la microélectronique à augmenter considérablement
la bande passante des interfaces à taux élevé de perte telles que Ethernet, fond de panier
(backplane) et ligne d’accès numérique (Digital Subscriber Line). La solution pour obtenir
une augmentation radicale de bande passante est d’employer des techniques d’égalisation
pour compenser les déﬁciences du canal de propagation telles que l’interférence inter-
symbole (inter-symbol interference), la diaphonie (crosstalk), et le bruit environnemental.
Par conséquent, les entrées-sorties (I/Os) d’aujourd’hui ont été équipés avec des techniques
d’égalisations sophistiquées pour repousser la limite en bande passante des systèmes.
A cette ﬁn, les avancées technologiques ont continuellement augmenté la capacité de
calcul de donnée et amélioré les performances des entrées-sorties durant ces 15 dernières
années. Cependant, étant donné que la bande passante du canal n’a pas été élargie au
même rythme, le traitement du signal requis et les circuits électrique d’égalisations sont
devenus de plus en plus compliqués. Ainsi, les améliorations de l’eﬃcacité énergétique sont
largement contrebalancées par l’énergie utilisée pour compenser les déﬁciences du canal.
De plus, puisque la mise à l’échelle de la tension d’alimentation a saturé dans les nœuds
technologiques les plus ﬁns, la dissipation de puissance dans les circuits digitaux ne peut
en bénéﬁcier. De ce fait, les améliorations globales en terme d’eﬃcacité énergétique ont été
aﬀaiblies. Dans ce paradigme de conception, une réévaluation des stratégies de design est
devenue une importante nécessité dans le but non seulement de satisfaire les performances
en terme de bande passante mais aussi en terme d’eﬃcacité énergétique. Dans la théorie
de la communication, il est bien connu que les systèmes de codage et de signalisation ont le
potentiel de fournir des performances supérieures sur les canaux de communication limités
en bande passante. Cependant, le choix de l’algorithme de communication de données
optimal doit être considéré en tenant aussi compte des compromis sur les performances
énergétique au niveau du circuit.
Dans cette thèse, nous avons examiné l’application d’un nouvel algorithme et de système
ix
x Acknowledgements
de signalisation pour la communication ﬁlaire, particulièrement pour la communication
entre microprocesseurs, mémoires et périphériques. Une nouvelle méthode de signalisation
hybride NRZ/Multi-Tone (NRZ/MT) a été développée au cours de cette recherche. La
conception, l’analyse et l’implémentation au niveau du système et du circuit de cette
méthode de signalisation ont été accomplies dans le cadre de ce travail, et les résultats
des mesures ont prouvé l’eﬃcacité et la robustesse de cette méthodologie de signalisation
pour des interfaces ﬁlaires.
Dans la première partie de ce travail, un émetteur-récepteur (TRX) hybride NRZ/MT
fonctionnant à 7.5 Gb/s pour mémoire à interfaces bus multipoint (multi-drop bus) a été
conçu et fabriquer avec la technologie CMOS 40mn. Tout en réduisant la complexité du
circuit d’égalisation pour la partie du récepteur (RX), l’architecture proposée atteint une
eﬃcacité de lien de 1pJ/bit pour un canal MDB palliant une perte de 45 dB à 2.5GHz.
Les résultats des mesures du premier prototype conﬁrme que le NRZ/MT serial data
TRX peut oﬀrir une solution eﬃcace en énergie pour les mémoires à interfaces MDB. La
taille du cœur est respectivement de 85× 60 μm2 et 150× 60 μm2 pour le transmetteur
(TX) et le receveur (RX).
Motivé par les résultats satisfaisants du premier prototype, dans la seconde phase de
cette recherche, nous avons exploité les propriétés de signalisation multipoint, particulière-
ment l’orthogonalité parmi les diﬀérentes sous-bandes, pour réduire l’eﬀet de diaphonie
(crosstalk) dans les interconnexions ﬁlaires à haute densité. Un émetteur-récepteur 4-
canaux a été implémenté dans une technologie CMOS standard 40nm dans le but de
démontrer la performance de la signalisation NRZ/MT en présence d’un canal à haute
perte et à forte diaphonie (crosstalk). Le système proposé atteint 1pJ/bit d’eﬃcacité
énergétique, tout en communiquant dans un canal d’une mémoire MDB à un débit de
donnée de 36Gb/s. La nature MT de l’émetteur-récepteur proposé aide à contrôler
l’interférence inter-symbole (inter-symbol interference) et réduire la diaphonie distante
(far-end crosstalk), ce qui résulte à une implémentation d’une grande eﬃcacité énergétique.
La taille du cœur est respectivement de 80× 60 μm2 et 130× 60 μm2 pour les bloques
TX et RX (en incluant l’unité d’horloge).
Mots clefs: Decision-feedback equalizer (DFE), diﬀerential signaling, intersymbol
interference (ISI), multi-drop bus (MDB), multi-tone signaling, nonreturn-to-zero (NRZ)
signaling, far-end crosstalk (FEXT), source-synchronous architecture, dual in-line memory
module (DIMM), double-data rate (DDR), Internet of Things (IoT), high-performance
computing (HPC).
Contents
Acknowledgements v
Abstract (English/Français) vii
Table of Content xi
List of ﬁgures xv
List of tables xxi
1 Introduction 1
1.1 Thesis Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Organization and Content of the Thesis . . . . . . . . . . . . . . . . . . . 7
2 State-of-the-Art Link Systems and Preliminaries 9
2.1 Contemporary Baseband Link Systems . . . . . . . . . . . . . . . . . . . . 10
2.1.1 State-of-the-art DFE Architecture . . . . . . . . . . . . . . . . . . 12
2.2 Multi-Tone Link Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Preliminaries of Hybrid NRZ/Multi-Tone System-level Design . . . . . . . 20
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Hybrid NRZ/Multi-Tone Signaling 23
3.1 Multi-Drop Memory Interfaces: Overview . . . . . . . . . . . . . . . . . . 24
3.1.1 Multi-Drop Channel Characteristics . . . . . . . . . . . . . . . . . 24
3.1.2 Baseband Signaling in Multi-Drop Interfaces . . . . . . . . . . . . . 28
3.2 Hybrid NRZ/MT Signaling: System Design Overview . . . . . . . . . . . . 30
3.2.1 Hybrid NRZ/MT TRX Statistical System-level Modeling . . . . . 32
3.2.2 Hybrid NRZ/MT TRX Detailed System Design . . . . . . . . . . . 35
3.3 Hybrid NRZ/MT signaling: Circuit Design . . . . . . . . . . . . . . . . . 37
3.3.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2.1 AFE Circuit Design . . . . . . . . . . . . . . . . . . . . . 41
xi
xii Contents
3.3.2.2 Downconverting Mixer /Filter Unit . . . . . . . . . . . . 45
3.3.3 Clock Generation Unit . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever 57
4.1 MDLL-Based Clock and Data Recovery: Overview . . . . . . . . . . . . . 57
4.2 MDLL Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.1 Input Clock Buﬀer . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Phase-Frequency Detector (PFD) . . . . . . . . . . . . . . . . . . . 60
4.2.3 Charge Pump (CP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.4 Voltgae-Controlled Delay Line (VCDL) . . . . . . . . . . . . . . . 63
4.2.4.1 Delay cell (D-cell) . . . . . . . . . . . . . . . . . . . . . . 63
4.2.4.2 V-to-I converter . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.4.3 Edge combiner . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.5 Loop Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects 71
5.1 Analysis of ISI and FEXT for BB and NRZ/MT Signaling . . . . . . . . . 72
5.1.1 Signal Integrity in Dense Interconnects: Introduction . . . . . . . . 72
5.1.2 Channel Conﬁguration and BB signaling . . . . . . . . . . . . . . . 73
5.1.3 Hybrid NRZ/MT Signaling for Controlling ISI and FEXT . . . . . 77
5.1.3.1 ISI Controlling Analysis . . . . . . . . . . . . . . . . . . . 78
5.1.3.2 FEXT Controlling Analysis . . . . . . . . . . . . . . . . . 80
5.2 System Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2.1 Downconverting Mixer /Filter Unit . . . . . . . . . . . . 86
5.3.3 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 Conclusion 97
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Contents xiii
A System-Level Statistical BER Modeling for Hybrid NRZ/MT Link 101
B CDR Techniques for Hybrid NRZ/MT Link System 105
Bibliography 120
Curriculum Vitae 121

List of Figures
1.1 Data traﬃc forecast [1]. (a) Network traﬃc from 2014 to 2019. (b) Various
application data traﬃc from 2014 to 2019. . . . . . . . . . . . . . . . . . . 2
1.2 Diﬀernt serial link applications. (a) a data center in Oklahoma, USA [2].
(b) a blade server containing diﬀerent links [3]. . . . . . . . . . . . . . . . 2
1.3 Wireline data rates over the years [4]. . . . . . . . . . . . . . . . . . . . . 3
1.4 (a) Energy eﬃciency versus publication year [5]. (b) Data rate versus
technology node [6]. (c) Energy eﬁciency versus channel loss [6]. . . . . . . 3
1.5 DRAM Data Bandwidth Trends [6]. . . . . . . . . . . . . . . . . . . . . . 4
1.6 (a) 7.5 Gb/s NRZ/multi-tone transceiver ﬁrst prototype. (b) The COB
used for testing the ﬁrst prototype. (c) The 36 Gb/s NRZ/multi-tone
second prototype. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 A conventional state-of-the-art baseband transceiver with a FFE equalizer
at the transmitter, and a CTLE and a DFE at the receiver. . . . . . . . . 10
2.2 Serial link trend for the last 15 years. (a) Data rate versus year of publica-
tion. (b) Energy-per-bit versus year of publication. . . . . . . . . . . . . . 11
2.3 (a) A half-rate DFE architecture with speculative ﬁrst (H1) tap, here the
dashed red line shows a new critical timing path [7]. (b) DFE+Demux
slice presented in [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 8-lane single-ended RX architecture with XDFE and cross CTLE reported
in [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Block Diagram of the DMT system studied in [10]. . . . . . . . . . . . . . 15
2.6 (a) Conceptual multi-tone system with low-pass ﬁlters and mixers at the
transmitter and receiver to create band-limited sub-channels. (b) AMT
architecture with per-sub-channel linear N-times over-sampled equalizers
at the transmitter, and mixer and integrate-and-dump at the receiver [11]. 17
2.7 (a) The proposed BB+RF architecture with forwarded-clock for simul-
taneous bidirectional signaling. (b) Dual-band signaling in frequency
domain [12]. (c) TRX chip die photo. . . . . . . . . . . . . . . . . . . . . . 18
xv
xvi List of Figures
2.8 (a) Proposed system architecture in [13]. (b) Proposed frequency planning
with aggregate data rate of 20 Gb/s. . . . . . . . . . . . . . . . . . . . . . 19
2.9 Proposed mixed NRZ/MT signaling scheme. (a) NRZ spectrum over a
multi-drop channel. (b) Hybrid NRZ/MT spectrum over a multi-drop
channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 (a) Controller to DRAM MDB interface. (b) Simpliﬁed block diagram of
interface in Fig. 3.1 (a) showing the multi-path fading. . . . . . . . . . . . 25
3.2 (a) The fabricated MDB channel. (b) The frequency response of a sample
MDB showing ﬁrst notch at 2.5 GHz. . . . . . . . . . . . . . . . . . . . . . 26
3.3 The measured frequency response and the 3× Fnotch b/s single-bit pulse
response for diﬀerent stub lengths. (a) Fnotch =660 MHz. (b) 2 GHz single-
bit response. (c) Fnotch =1.25 GHz. (d) 3.75 GHz single-bit response. (e)
Fnotch =2.5 GHz. (f) 7.5 GHz single-bit response. (g) Fnotch =3.7 GHz.
(h) 11 GHz single-bit response. . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 (a) Conventional BB transceiver block diagram for communicating at
7.5 Gb/s over MDB interface. System-level simulation results for the link
shown in Fig. 3.2 (b). (b) Eye diagram for 7.5 Gb/s data rate, (c) bathtub
curve, both after optimizing the CTLE and DFE response. . . . . . . . . . 28
3.5 System-level simulation results for the MDB channel of Fig. 3.2 (b) having
0.5% jitter on both TX and RX. (a) Eye diagram for 7.5 Gb/s data rate,
(b) bathtub curve for 7.5 Gb/s data rate, both after optimizing the CTLE
and DFE response. System-level simulation results for the same interface.
(c) Eye diagram for 5 Gb/s data rate, (d) bathtub curve for 5 Gb/s data
rate, both after optimizing the CTLE and DFE response. . . . . . . . . . 29
3.6 Transmitted spectrum in a MDB channel. (a) Conventional NRZ signaling.
(b) Hybrid NRZ/MT signaling. . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Proposed mixed NRZ/MT system architecture. . . . . . . . . . . . . . . . 31
3.8 System-level statistical simulation for the proposed architecture in Fig. 2.9.
(a) Baseband, (b) I sub-band, and (c) Q sub-band statistical eye diagram
and corresponding bathtub. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9 Q sub-band eye diagram for diﬀerent amount of clock phase mismatch. (a)
5°, (b) 10°, (c) 15°, and (d) 20° I/Q phase mismatch. . . . . . . . . . . . . 33
3.10 Horizontal eye-opening at BER = 10−15 versus I/Q phase mismatch. . . . 34
3.11 Proposed hybrid NRZ/MT transceiver. . . . . . . . . . . . . . . . . . . . . 36
3.12 The architecture of the proposed mixed NRZ/MT transmitter. . . . . . . 37
3.13 (a) Eye diagram of the summer/driver input signals. (b) Simulated TX
output spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.14 (a) Upconversion passive mixer schematic. (b) Output driver schematic. . 38
List of Figures xvii
3.15 Output power of the LVDS driver circuit with respect to the 5 GHz input
signal power level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.16 The architecture of the proposed mixed NRZ/MT receiver. . . . . . . . . . 39
3.17 Schematic of the BPF to pass QPSK sub-bands. . . . . . . . . . . . . . . 40
3.18 Transfer function of the BPF for diﬀerent capacitor value, C2, settings. (a)
TT corner at 27° C. (b) FF corner at -40° C. (c) SS corner at 27° C. (d)
SS corner at 80° C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.19 BPF transfer function for diﬀerent gain settings. . . . . . . . . . . . . . . 43
3.20 Schematic of the LPF used to select NRZ sub-band. . . . . . . . . . . . . 44
3.21 Transfer function of the BPF for diﬀerent capacitor value, C2, settings. (a)
FF corner at -40° C. (b) SS corner at 100° C. (c) TT corner at 27° C. . . 45
3.22 LPF transfer function for diﬀerent gain settings in TT corner. . . . . . . 46
3.23 Half-circuit implementation of SCMF and baseband ampliﬁer units. . . . 47
3.24 The timing diagram of the proposed SCMF. . . . . . . . . . . . . . . . . . 48
3.25 Clock generation system block diagram. . . . . . . . . . . . . . . . . . . . 48
3.26 Chip die photo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.27 Micrograph and layout of the chip. The die size is 1.3 × 2.8 mm2. Two
identical TRXs are placed at the top and bottom of the chip. . . . . . . . 49
3.28 (a) Test setup with MDB channel. (b) Measured channel frequency re-
sponse. (c) Measured channel 7.5 Gb/s single-bit pulse response. . . . . . 50
3.29 (a) Measured TX output spectrum at diﬀerent Fref settings. (b) Measured
spectrum at the input of RX. . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.30 Measured RX eye diagram at 7.5 Gb/s data rate. (a) Q sub-band. (b)
I sub-band. (c) BB sub-band. Corresponding bathtub curve for (d) Q sub-
band, (e) I sub-band, (f) BB sub-band, each operates at 2.5 Gb/s data
rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.31 Measured RX eye diagram sensitivity to phase-error. (a) ±5°, and (b)
±10° phase mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.32 (a) Power breakdown for the whole TRX. (b) TX power speciﬁcation. (c)
RX power speciﬁcation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.33 I sub-band bathtub curve for diﬀerent channel notches. . . . . . . . . . . . 53
4.1 Proposed MDLL-based CDR architecture. . . . . . . . . . . . . . . . . . . 58
4.2 Input clock buﬀer and single-to-diﬀerential stage schematic. . . . . . . . . 59
4.3 The schemtic of the PFD circuit. . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 (a) CP conceptual schematic. (b) Eﬀect of mismatch on the control voltage. 61
4.5 Proposed CP schematic with reduced mismatch. . . . . . . . . . . . . . . . 62
4.6 (a) Schematic of the VCDL. (b) Schematic of the D-cell. . . . . . . . . . . 63
xviii List of Figures
4.7 (a) Schematic of the proposed V-I converter. (b) D-cell tunning curve
using the proposed V-I convertor. . . . . . . . . . . . . . . . . . . . . . . . 64
4.8 (a) Edge combiner circuit. (b) Symmetrical NAND gate. . . . . . . . . . . 65
4.9 (a) DLL discrete-time model. (b) Gate leakage current for 0.6 pF MOS
capacitor in 40 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.10 Micrograph and layout of the test chip. . . . . . . . . . . . . . . . . . . . . 67
4.11 (a) Measured I/Q clocks at 5 GHz output. (b) Measured long-term I/Q
phase mismatch at 5 GHz output. . . . . . . . . . . . . . . . . . . . . . . . 67
4.12 (a) Measured long-term jitter histogram at 6 GHz MDLL output. (b) Mea-
sured long-term jitter histogram at 6 GHz MDLL output with 200 mVpp
external supply noise at 250 MHz (worst case) noise frequency. . . . . . . 68
4.13 Measured peak-to-peak jitter degradation versus supply noise and refer-
ence oﬀset noise frequency for 6 GHz output. The supply has 200 mVpp
sinusoidal noise. The reference clock has -28 dBc single-tone sideband. . . 68
4.14 (a) Measured phase noise at 6 GHz output. (b) Measured MDLL reference
and spurs with 200 mVpp supply noise at jitter-peaking frequency. . . . . 69
5.1 Overall block diagram of the 4-channel TRX. . . . . . . . . . . . . . . . . 73
5.2 (a) Stylized view of the 4 diﬀerential channels, side-by-side. (b) Simpliﬁed
block diagram of a DIMM interface showing the multi-path fading. . . . . 74
5.3 (a) Measured channel frequency response. (b) Measured channel 9 Gb/s
single-bit pulse response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 (a) Conventional BB transceiver block diagram for communicating at 9
Gb/s over MDB interface. System-level channel simulation eye diagrams
for: (b) without, (c) with crosstalk, both after optimizing the CTLE and
DFE blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 (a) Hybrid NRZ/MT transceiver architecture. (b) Output TX spectrum
when Fref =3 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.6 (a) Hybrid NRZ/MT simpliﬁed block diagram. (b) The channel frequency
response used in C2C communications. (c) PB channel construction. . . . 78
5.7 (a) 9 Gb/s single-bit pulse response for the C2C channel. (a), (b) BB and
PB 3 Gb/s single-bit pulse response. (c) The eye diagram for 9 Gb/s NRZ
signaling. (d), (e) BB and PB eye diagram for aggregate data rate of 9 Gb/s. 79
5.8 FEXT generation mechanism in hybrid NRZ/MT signaling. . . . . . . . . 80
5.9 Proposed 4×9 Gb/s TX architecture. . . . . . . . . . . . . . . . . . . . . . 84
5.10 Proposed 4×9 Gb/s mixed NRZ/MT receiver. . . . . . . . . . . . . . . . . 85
5.11 Half-circuit implementation of SCMF and baseband ampliﬁer units. . . . . 86
5.12 The timing diagram of SCMF in Fig. 5.11 . . . . . . . . . . . . . . . . . . 87
List of Figures xix
5.13 (a) Test setup with MDB channel. (b) Chip die photo and layout in 40 nm
CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.14 (a) Measured TX output spectrum at diﬀerent Fref settings. (b) Measured
spectrum at the input of RX. . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.15 Measured RX eye diagram at 9 Gb/s data rate. (a) Q sub-band. (b) I sub-
band. (c) BB sub-band. Corresponding bathtub curve for: (d) Q sub-band,
(e) I sub-band, (f) BB sub-band, each operates at 3 Gb/s data rate. . . . . 91
5.16 Measured FEXT on Channel 3 (a) in frequency, and (b) in time domain. . 92
5.17 Measured RX eye diagram at RX side. The aggregate data rate is 36 Gb/s
while each sub-band operates at 3 Gb/s data rate. . . . . . . . . . . . . . 93
5.18 (a) Power breakdown for the whole TRX. (b) TX power speciﬁcation. (c)
RX power speciﬁcation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.1 Transmitter and receiver jitter models. . . . . . . . . . . . . . . . . . . . . 102
A.2 Flowchart of BER calculation using statistical eye. . . . . . . . . . . . . . 104
B.1 Forwarded clock CDR architecture for proposed hybrid NRZ/MT link
system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
B.2 The timing diagram of the clock signals for recovering I sub-band, which
presented in Fig 3.23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
B.3 The proposed CDR architecture for pass-band data. . . . . . . . . . . . . 108

List of Tables
2.1 Required Transceiver Performance presented in Fig. 2.8 [13]. . . . . . . . . 19
3.1 The hybrid NRZ/MT performance presented in Fig. 3.7. . . . . . . . . . . 34
3.2 TRX Performance Comparison with State-of-the-art Memory Transceivers. 54
3.3 Silicon Performance Summary. . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Comparison of MDLL performance for frequency generation. . . . . . . . . 70
5.1 TRX Performance Comparison with State-of-the-art Memory Transceivers. 94
xxi

1 Introduction
Internet networking traﬃc has experienced a tremendous growth in recent years, primarily
driven by cloud computing, the Internet of Thing (IoT), high-performance computing
systems, enterprise servers, ﬂourishing of social media, recent explosive smart phone
advancement, etc. As it has been estimated in [1], in year 2019 mobile IP traﬃc will be
10 times heavier than what it was in 2014, and the total Internet traﬃc will nearly triple
from 2014 to 2019. Fig. 1.1 (a) illustrates this predications for diﬀerent application types,
whereas Fig. 1.1 (b) highlights the network traﬃc growth more speciﬁcally [1]. Among
diﬀerent applications, the greatest growth belongs to the consumer-video, which is indeed
a results of dramatic increase of Internet networking traﬃc. The heart of the Internet
network is the service providers’ data centers, which should appropriately provide the
content, ﬁles, data storage, and switching harbors [14]. Inside data centers, there are big
racks containing many servers, storages, switches, and routers, which are all connected
with cables. Fig. 1.2(a) shows an image of a data center located in Mayes County, in Pryor,
Oklahoma, USA, which consists of such gigantic infrastructures. In order to connect all
modules together, variety of cables with diﬀerent speciﬁcations have been employed, and
they can be categorized in several serial link standards, as it is shown in Fig. 1.2 (a).
Likewise, in a PC blade server there are several units that should communicate together
over copper channel, and depending on the channel length and other speciﬁcations of
the corresponding link, they can be classiﬁed in various wireline standards, as shown in
Fig. 1.2 (b) for a sample blade server [3].
In order to increase the network capacity, increasing the per-lane data rate has been a
standard approach, and many standards have been developed by Optical Internetworking
Forum (OIF), Common Electrical I/O (CEI), Joint Electron Device Engineering Council
(JEDEC), and Institute of Electrical and Electronics Engineers (IEEE) to cope with
the incessant data bandwidth demand. Thanks to the bandwidth demand explosion in
1
2 Chapter 1. Introduction
Figure 1.1: Data traﬃc forecast [1]. (a) Network traﬃc from 2014 to 2019. (b) Various
application data traﬃc from 2014 to 2019.
Figure 1.2: Diﬀernt serial link applications. (a) a data center in Oklahoma, USA [2]. (b) a
blade server containing diﬀerent links [3].
data centers and telecommunication infrastructures, the data rate growth for wireline
transceivers has been consistently increasing at about twice per four years, as shown in
Fig. 1.3 for the most-used standards. Hence, peripheral component interconnect express
is used ubiquitously in almost all chip-to-chip and chip-to-module interfaces, and OIF,
CEI, IEEE, and Ethernet usually come hand-in-hand for diﬀerent serial link applications.
Each standard has its own logic layer, however, the physical layer can be speciﬁed by
several common parameters, such as data rate, reach (either distance or channel loss in
dB), crosstalk, reﬂections. etc. As an example, multiple 25 Gb/s backplane standards
such as CEI-25G long reach (LR) [15], ultra-short reach (VSR) [16], short reach (SR) [17],
etc., are developed for various cable lengths. The speciﬁcation is mainly a bridge among
3Figure 1.3: Wireline data rates over the years [4].
Figure 1.4: (a) Energy eﬃciency versus publication year [5]. (b) Data rate versus technology
node [6]. (c) Energy eﬁciency versus channel loss [6].
transceiver and module vendors, connector vendors, and system houses so all parties can
design their portion ahead of time, which is critical as most of the system and transceiver
4 Chapter 1. Introduction
Figure 1.5: DRAM Data Bandwidth Trends [6].
development takes months or even years [4]. In this paradigm, the I/O performance
becomes a major speciﬁcation in practically any high-performance electronic system,
from consumer products to enterprise servers. Any future progress in integrated circuit
computational capability must naturally be matched with progress on I/O performance.
As a result, in recent years there have been many researches for improving the I/O
performance, and they have led to tremendous progress in the data rates communicated
over diﬀerent wireline links. Indeed, over the past decade the I/O speed improvement has
been fueled by advanced CMOS technology scaling. Fig. 1.4 (a) highlights the important
role technology scaling plays in supporting this trend, which has been reported over
the past decade at the annual International Solid-State Circuits Conference (ISSCC) [6].
Recently, a 40/50/100 Gb/s Ethernet transceiver has been published in [18]. However,
the higher data rate is not the only I/O characteristic that matters in IoT era, and the
energy-eﬃciency should be also properly considered while the industry scales up the data
bandwidth.
Power consumption for I/O circuits has been a ﬁrst-order design constraint for systems
ranging from cell phones to servers. As the pin count and per-pin data rate for I/Os has
increased on a die, so has the percentage of total power that they consume. Over the past
decade, voltage and process scaling have been the key contributors in improving I/O energy
eﬃciency, and providing higher per-lane data rate. The ratio of I/O power consumption
to data rate has become a popular ﬁgure of merit in evaluating an I/O performance. The
power eﬃciency ratio is expressed in mW/Gb/s, or equivalently pJ/bit. It is plotted in
Fig. 1.4 (b) for recent publications [5]. Moreover, the impact of interconnect channel loss
has a great impact on the energy eﬃciency since advanced equalization techniques become
1.1. Thesis Goal 5
inevitable for proper link operation. Simply increasing per-pin baseband data rates with
existing circuit architectures and channels is not always a viable path given ﬁxed system
power limits. Fig. 1.4 (c) plots the energy eﬃciency as a function of channel loss for
recently reported transceivers. Looking at the provided data in Fig. 1.4 (c), the scaling
factor between link power and channel loss1 is about unity, and shows ten times increase
in power consumption for a 30 dB channel loss increase. Therefore, as the uncompensated
part of the channel bandwidth is employed for the higher data rate communication, more
sophisticated equalization circuits are required, hence, leveraging both power dissipation
and occupied silicon die area. Moreover, since the supply voltage does not scale with
transistor size in ﬁner CMOS technology nodes, the I/O energy-eﬃciency improvement
has been undermined in recent years, as shown in Fig. 1.4 (b).
Furthermore, from the vantage point of storage devices, the ever increasing data traﬃc
necessitates a great progresses in computer memory business. Such a progress is only
possible by constant improvements in area, power and performance of volatile and non-
volatile memories. In order to reduce the bandwidth gap between main memory and
processor performance, DRAM data-rates continue to increase at the memory interface,
and various standards such as double-data rate (DDR), low-power DDR (LPDDR) and
graphics DDR (GDDR) have been developed. Fig. 1.5 plots the data rate for diﬀerent
memory application published in ISSCC in recent years [6]. Currently, DDR4 and
GDDR5 memory I/Os operate around 3 Gb/s/pin and 7 Gb/s/pin, respectively, which
represent aggregate rates of 6 GB/s and 28 GB/s, respectively. Likewise, a 9 Gb/s/pin
for GDDR5 application is recently reported in [19]. Nevertheless, the channel impairment
in memory interfaces has always been a bottleneck in computing system throughput
improvement. Such a limiting factor has been the major reason for introducing new
DDR and GDDR standards to avoid the channel impairments, however, employing a
point-to-point interface instead of a multi-drop channel creates higher cost and restricts
the system storage capacity.
1.1 Thesis Goal
Generally speaking, in the wireline communication it has been known that the channel
capacity, which Shannon theory predicts for wireline interfaces, has an order of magni-
tude (in some cases two order of magnitudes) gap from what contemporary baseband
transceivers (TRXs) can practically achieve [20]. In order to bridge this gap, one should
reappraise the traditional wireline techniques, both on system and circuit levels, and
employ a more eﬃcient system-level architecture, which is indeed realizable within a
1Channel loss represents the channel attenuation at Nyquist frequency, i.e., half of the transmitted bit
rate. More explanation is provided in Chapter 3 of this thesis.
6 Chapter 1. Introduction
Figure 1.6: (a) 7.5 Gb/s NRZ/multi-tone transceiver ﬁrst prototype. (b) The COB used for
testing the ﬁrst prototype. (c) The 36 Gb/s NRZ/multi-tone second prototype.
reasonable die area, circuit complexity, and power budget.
Therefore, the goal of this research is to develop a new family of wireline TRXs that can
operate at high-speed and low-power over the communication channels, for which the
conventional baseband TRX cannot provide a power-eﬃcient solution. The fundamental
challenge is to cope with severe channel impairments while meeting at the same time
the stringent speed and power requirements. Moreover, the proposed coding scheme and
system architecture should be optimize so as to the required circuit speciﬁcations become
feasible on silicon without compromising the signaling scheme advantages. Fig. 1.6 shows
the die photos of two prototype serial data TRXs, along with a chip-on-board (COB)
package used for measurement purposes, which has been fabricated in the frame of this
work. The ﬁrst prototype includes a hybrid NRZ/multi-tone TRX that can eﬃciently
communicate over a multi-drop memory channel at 7.5 Gb/s. The second prototype
1.2. Organization and Content of the Thesis 7
incorporates the hybrid NRZ/multi-tone core in order to deliver an aggregate 36 Gb/s
data rate over four diﬀerential lanes, while the power eﬃciency number remains at 1pJ/b
for both prototypes. The prototypes have been implemented in 40 nm bulk CMOS and
COB packaging has been used for testing purposes.
1.2 Organization and Content of the Thesis
Chapter Two
In Chapter 2, we have provided a brief review of state-of-the-art contemporary baseband
link systems in order to highlight the today’s advanced SerDes trends and study some
previously published researches, which have applied multi-tone signaling for wireline
communication. Then, a preliminary for our proposed signaling scheme is presented.
Moreover, system-level modeling and analysis of the proposed hybrid NRZ/multi-tone
scheme is presented. The goal of this MATLAB-level modeling and simulation is to
evaluate the proposed system performance in presents of diﬀerent noise sources, and
optimize it appropriately. Having modeled the system in MATLAB, the building block
speciﬁcations that is required to have an error-free link operation is extracted in this
chapter.
Chapter Three
In Chapter 3 we present a new signaling scheme, called hybrid NRZ/multi-tone (MT),
which can shape the transmitted spectrum of the transmitter (TX) and be customized to
the characteristics of the channel, thus, it provides a power eﬃcient solution. Based on the
proposed method, the design and implementation of a 7.5 Gb/s TRX for communicating
over a multi-drop memory interface is explained in this chapter. The silicon measurement
of the aforementioned TRX has been presented at the end of this chapter.
Chapter Four
Chapter 4 presents the design and analysis of the clocking unit, which has been employed
in our hybrid NRZ/multi-tone TRX, in details. The measurement result of the clock unit,
which is realized on silicon as an independent block from our TRX, is provided in this
chapter.
Chapter Five
This Chapter studies the properties of multi-tone signaling for controlling the eﬀect of
crosstalk in high-density and compact links constructed using low-cost material such
as FR-4. The crosstalk reduction property of the NRZ/multi-tone signaling has been
8 Chapter 1. Introduction
described in this chapter, and a new TRX has been designed that can communicate at
an aggregate 36 Gb/s over a four lane memory channel. Moreover, the inter-symbol
interference reduction property of the proposed signaling method has been explained in
details, which can be useful for very lossy backplane interfaces.
Chapter Six
In the ﬁnal chapter the achievements and the main contribution of this research has been
summarized and the future works has been described.
Apendix A
Appendix A explains the theoretical background for statistical eye diagram evaluation in
details. Such an link estimation method has been employed in Chapter 2 for MATLAB
simulations, and can provide great advantage over time-domain simulation.
Apendix B
The clock and data recovery (CDR) algorithm, and circuit design, which is suitable for
the proposed hybrid NRZ/multi-tone system in Chapter 3 and Chapter 5, is suggested in
Appendix B.
2 State-of-the-Art Link Systems and
Preliminaries
Over the past decade, widespread adoption of data-intensive applications such as video
streaming and cloud-based computing, which has furnished the Internet of Things (IoT)
era, has led to an explosive demand for data bandwidth. In order to satisfy this demand,
the input/output (I/O) speed of communication systems such as routers and backplane-
based servers should grow accordingly. Recent studies indicate that the I/O bandwidth
of link systems must increase by 2-3 times every two years [21] so as to cope with the
ever-increasing bandwidth requirement of IT systems. Moreover, to manage such drastic
bandwidths with reasonable power dissipation, a power eﬃciency of around 1 pJ/b has
been a long-held goal [22].
While the process technology scaling continues to improve on-chip circuit bandwidth,
oﬀ-chip interconnect remains the bandwidth bottleneck. As the channel loss increases, the
link power eﬃciency degrades due to the need for complex equalization, larger transmit
swing, and low-jitter clock requirements. Additionally, I/O power (and consequently
total system power dissipation) will grow if bandwidth demand is not accompanied
by a proportional I/O power eﬃciency scaling [23]. In this paradigm, the majority of
link systems traditionally employ a simple baseband signaling (e.g., NRZ, PAM-4, and
duobinary) with limited feed-forward and feed-back equalization schemes to compensate
for the dispersive nature of the communication channel. A study of the baseband signaling
in wireline communication systems shows that the channel capacity predicted by Shannon
theory has a large gap (one to two order of magnitudes) from what can be oﬀered by
contemporary baseband transceivers [20]. From communication system perspective, the
coding schemes and multi-tone signaling are the key points to bridge the aforementioned
gap [20, 11, 24, 25, 26].
In this chapter, ﬁrstly, we have provided a brief review of state-of-the-art contemporary
baseband link systems. Then, the recent researches for employing multi-tone signaling
9
10 Chapter 2. State-of-the-Art Link Systems and Preliminaries
TX
data
Se
ria
liz
er
W-1
W0
W1
Wn
Z -1
Z -1
Z -1
Driver
FFE
CTLE
Z -1
Z -1
Z -1
W1
W2
Wn
CK
CK
RX
dataD
e-
se
ria
liz
er
CK
DFE
M
ag
ni
tu
de
(d
B
)
Frequency
0 1/Tsym 2/Tsym
Figure 2.1: A conventional state-of-the-art baseband transceiver with a FFE equalizer at the
transmitter, and a CTLE and a DFE at the receiver.
in wireline transceivers has been reviewed, and a preliminary for our proposed signaling
scheme is presented.
2.1 Contemporary Baseband Link Systems
Fig. 2.1 presents the architecture of a baseband wireline transceiver, which is employed
in the majority of state-of-the-art system links. This link uses non-return to zero (NRZ)1
signaling with a Feed-Forward Equalizer (FFE) at the transmitter (TX) to cancel pre-
cursor Inter-Symbol Interference (ISI), a Continuous-Time Linear Equalizer (CTLE) at
the receiver front-end to increase sensitivity and compensate for the high-frequency loss of
the channel, and a Decision Feedback Equalizer (DFE) at the receiver to cancel post-cursor
ISIs. In this architecture, the FFE and DFE do not require an addition analog to digital
converter (ADC), which can be very power hungry at high-sampling rates. Moreover,
having a limited number of voltage levels at the TX, the linearity requirements of the
1Generally speaking, it can be duobinary, ENRZ, or PAM-N signaling. The sub-blocks might have
diﬀerent architectures, depending on the type of baseband signaling.
2.1. Contemporary Baseband Link Systems 11
Figure 2.2: Serial link trend for the last 15 years. (a) Data rate versus year of publication. (b)
Energy-per-bit versus year of publication.
output driver of TX and the input CTLE at the RX can be kept at a reasonable level so
as to avoid power hungry topologies. Furthermore, error correction and detection coding
is generally avoided in high-speed link design because of considerable power consumption
and data latency that it can add to the system. From the frequency spectrum perspective,
the transmitted output spectrum for all types of baseband transceiver (e.g., NRZ, PAM-4,
ENRZ, doubinary, etc.) has a sinc-shaped2 spectrum, as shown in the inset of Fig. 2.1,
and it bears the ﬁrst frequency null at f = 1/Tsym, where Tsym is the symbol period of
the transmitted data. Indeed, Tsym depends on the bit rate and the adopted baseband
modulation scheme. For example, for a simple NRZ signaling Tsym = 1/Tb, where Tb is
the bit period, for duobinary, and PAM-4 modulations Tsym = 0.5/Tb. It can be shown
that 90% of the TX energy spectrum is located below the ﬁrst frequency null, whereas
77% of which is located below Nyquist rate, i.e., f = 0.5/Tsym.
Having the architecture of Fig. 2.1, the ﬁrst challenge for baseband TRX in the IoT era
is that the higher bandwidth requires increased signaling rate, thus, the data should
be transmitted over lossy frequency regions of the communication channel. As a result,
the baseband (BB) transceiver not only needs to have wider bandwidth but it should
also satisfy a more challenging sensitivity requirement. Moreover, jitter becomes more
important in lossy channels [27] necessitating a more sophisticated clock data recovery
(CDR) circuit. These requirements push on both ends of the gain-bandwidth trade-oﬀ for
circuits, which only scales linearly (to the ﬁrst order) with technology scaling. A study of
serial link speciﬁcations shows that although process technology scaling improves energy-
per-bit eﬃciency, this measure has started to taper oﬀ in recent years [5]. Fig. 2.2 (a), and
(b) present the per-pin data-rate and the energy-per-bit metric trends3, respectively [5].
2Sinc(x) = sin(πx)/(πx)
3The SerDes data is collected from papers published in ISSCC, VLSI symposium, CICC, ESSCIRC,
and A-SSCC [5].
12 Chapter 2. State-of-the-Art Link Systems and Preliminaries
This can be explained, to the ﬁrst order, as a consequence of supply voltage scaling
saturation in ﬁner CMOS technology nodes.
Another issue for the conventional BB transceiver in high-loss channels is the required
complicated equalization schemes in order to reach the target bandwidth. Therefore,
in addition to CTLE in such channels, it is also necessary to use a DFE in order to
properly compensate the frequency-dependent channel-loss [27]. The DFE has to subtract
a weighted sum of the received symbols from the incoming signal. In particular, to
close the feedback loop for the ﬁrst tap (as shown in Fig. 2.1), the entire operation of
detecting the current symbol, multiplying it with the appropriate weight, and subtracting
it from the incoming symbol should be performed in less than one symbol period. Having
higher symbol-rate reduces the safe margin for closing this feedback loop, and increases
the required circuit complexity. Although there have been several system and circuit
techniques (e.g., loop-unrolling [28], half-rate/quarter-rate DFE [29], and charge-steering
topology [29, 30]) to alleviate this problem, it still remains as one of the major concerns
in DFE design for today’s advanced CMOS technology. It is constructive to have a
brief review of state-of-the-art DFE topology in order to better understand the design
challenges arising in today’s IoT and high-performance computing (HPC) era.
2.1.1 State-of-the-art DFE Architecture
Unlike a linear equalizer (e.g., CTLE and FFE), a DFE is able to compensate for ISI
without amplifying noise or crosstalk, and it is non-linear in nature. A DFE is also more
eﬀective than linear equalizers in dealing with reﬂections from impedance discontinuities,
provided that the post-cursor ISI due to reﬂections falls within the time span of the DFE,
i.e., the bit period multiplied by the number of DFE taps4. A key challenge in designing
a high-speed DFE is ensuring that the feedback signals are accurately established at the
slicer input by the time the data decision is made. The critical path, marked with a
red line in Fig. 2.1, is the feedback loop, whose delay must be lower than 1 UI. Meeting
this timing constraint becomes diﬃcult at data rates above 20 Gb/s [31]. The timing
constraint on this feedback path can be relaxed by adopting a technique known as
speculation or loop-unrolling [32, 33]. Fig. 2.3(a) presents the block diagram of a half-rate
DFE employing one tap of speculation. In this half-rate 5-tap DFE architecture, the
previous bit decisions are weighted, fed back, and summed with the input signal such that
post-cursor ISI is removed from the received data. The DFE employs a 1-tap loop-unrolled
(or speculative) architecture and requires parallel paths in each half of the DFE [7].
4This means that if the channel shows long-tail pulse response, more DFE taps is required to cancel
the post-cursor, hence, the circuit complexity and power dissipation increase proportionally.
2.1. Contemporary Baseband Link Systems 13
Figure 2.3: (a) A half-rate DFE architecture with speculative ﬁrst (H1) tap, here the dashed
red line shows a new critical timing path [7]. (b) DFE+Demux slice presented in [8].
Employing this architecture, the timing constrain for the ﬁrst tap is relaxed, however,
the second feedback lop becomes the critical path, whose delay must be lower than 2
bit period [or 2 unit intervals (UI)]. Therefore, it can be challenging to meet its time
constrain for higher bandwidth. Although other DFE taps may be speculated to achieve
higher speed operation, the additional circuitry creates signiﬁcant cost in the hardware,
as the number of parallel slicing paths grows exponentially (2S) with the number (S)
of speculative taps. The largest number of speculative taps reported to date is three
[see Fig. 2.3(b)], which was used in implementing a 30-Gb/s 15-tap DFE, and it has
0.1 mW/Gb/tap power eﬃciency [8].
14 Chapter 2. State-of-the-Art Link Systems and Preliminaries
Figure 2.4: 8-lane single-ended RX architecture with XDFE and cross CTLE reported in [9].
The DFE concept can be equally applied to high-dense parallel I/Os to cancel the crosstalk
noise in the same way that it removes the ISI. In [9] the cross DFE (XDFE) has been
introduced in order to cancel the crosstalk in an 8-lane single-ended parallel I/O as shown
in Fig. 2.4. In this work 7× 8 XDFE taps have been used for each lane, and they operate
with the main 8-tap DFE block, resulting in 64-tap DFE per lane. A ﬁrst-in ﬁrst-out
(FIFO) data from 7 aggressor lanes drives 56 XDFE taps. Providing a 7 Gb/s/pin in a
closely-spaced 8-lane single-ended interface, the RX consumes 5.9 mW/Gb/s, whereas
around 4.4 mW/Gb/s of which is used in DFE and XDFE circuits.
Overall, employing conventional baseband signaling, energy eﬃciency improvements
provided by CMOS technology scaling has reached a plateau in recent years since higher
channel loss and shorter timing margin for critical circuits necessitate more power con-
sumption. This has largely oﬀset the technology scaling advantages. Bearing in mind
that in high-loss and multi-drop channels the number of DFE/XDFE taps should pro-
portionally increase with the post-cursor ISI, the energy eﬃciency cannot scale with
higher data rates. Thereby, re-thinking about design strategies is required to not only
increase the bandwidth, but also improve the energy eﬃciency. From communication
system perspective, the coding schemes and multi-tone signaling are to further improve
the energy eﬃciency by optimal employment of the channel capacity [20]. In the next
section we provide a brief review of the researches, which have applied a coding scheme
or multi-tone signaling in wireline communications.
2.2. Multi-Tone Link Systems 15
Figure 2.5: Block Diagram of the DMT system studied in [10].
2.2 Multi-Tone Link Systems
An alternative approach for wired communications is to employ multi-tone signaling, as
originally performed in Digital Subscriber Line (DSL) systems and later used in wireless
communication in the form of orthogonal frequency-division multiplexing (OFDM) [13].
The key advantage of this method is to communicate over a narrower frequency band,
hence, less equalization circuits is required, and most of the critical transceiver building
blocks (e.g., DFE, FFE, CDR) operate at lower speeds. Multi-tone signaling has promising
characteristics for diﬀerent type of interfaces [11, 24, 25, 26, 12, 13, 34]. However, many
other aspects of the transceiver performance that are typically unimportant in binary
systems become critical, and the link design requires major modiﬁcations to the well-known
analysis methods applied to wireline communication systems.
The ﬁrst research that well studied Multi-Tone (MT) techniques in the communication
literature is [34], which has been proposed in 1975 and it was analog in nature. An analog
parallel quadrature AM transmission system with overlapping orthogonal carriers and
oﬀset quadrature phase shift keying (OQPSK) was proposed in this work. Interestingly,
it employs a frequency planning in which the sub-bands has 50% bandwidth overlap,
and it shows that if the symbol period is an integer multiple of the sub-carrier period,
it is possible to recover each sub-bands appropriately. However, the performance of
the system was found to be very sensitive to communication channel variations, which
changes the orthogonality between the system sub-channels. Moreover, such a modulation
scheme would dictate high quadrature precision and very low phase noise so as to retain
orthogonality of the sub-channels. Therefore, it would not be a power eﬃcient and
practical solution for high data rate links.
16 Chapter 2. State-of-the-Art Link Systems and Preliminaries
A digital implementation of a MT system has been the subject of many researches.
In [35] a communication system based on frequency-division multiplexing (FDM) is
presented, in which the discrete Fourier transform (DFT) are computed as part of the
modulation and demodulation process. This work is known today as the OFDM or
Discrete Multi-Tone (DMT) in communication systems. A more elaborated realization
of digital implementation of an orthogonally multiplexed QAM (O-QAM) has been
later presented in [36], where a combination of poly-phase ﬁltering and N/2-point DFT
processing was employed to perform the necessary ﬁltering and mixing.
More recently, a study of the application of DMT to high-speed links has been presented
in [10]. The simpliﬁed block diagram of the proposed DMT system in this work is shown
in Fig. 2.5. The system-level simulation in this research, which has been performed for a
sample 20" FR-4 interface, demonstrates that MT signaling has the potential for achieving
high data rates in lossy channels, however, it would require high-speed DACs and ADCs
with resolutions on the order of 6-7 bits, which add enormous power consumption to
the link system. Moreover, this research indicates that the spectrum shaping at TX is
the essence of MT approach, and having cyclic preﬁx in DFT/IDFT algorithm is just a
clever way to simplify the implementation. The inset of Fig. 2.5 shows the TX spectrum
when the optimum bit loading is adopted, and it illustrates that the TX spectrum has
a diﬀerent shape from conventional Sinc-shaped TX spectrum of Fig. 2.1. Therefore, if
one can reduce the DMT block size (i.e., decrease the number of orthogonal carriers)
while employing the spectrum shaping, then a MT architecture becomes the most eﬃcient
solution for the links in which dispersion and channel loss can be mitigated by independent
number of tones.
An analog implementation of the MT link system has been studied in [11], where the
insight for TX spectral shaping is leveraged and power hungry ADC and DACs are
avoided. The proposed TRX architecture in this work is shown in Fig. 2.6. In this
research, a 24 Gb/s transmitter employing analog multi-tone (AMT) signaling with
18 pJ/b energy eﬃciency is introduced. The proposed TX can be customized to the
link characteristics and has the potential to achieve a superior performance compared
to conventional baseband (BB) transceivers. Having identical symbol rate for all sub-
channels, it has been shown that since the carrier frequencies are the integer multiples
of the symbol rate, ICI can be canceled in the same way as ISI; through equalization.
Moreover, in the MT architecture the equalization building blocks (e.g., DFE, XDFE,
FFE, etc.) runs at sub-stream symbol rate, which is a fraction of the total system bit
rate, therefore, the timing constrains are relieved. However, in this research, the receiver
is not implemented on silicon, and mixing and integration was performed in MATLAB to
generate the eyes.
2.2. Multi-Tone Link Systems 17
Figure 2.6: (a) Conceptual multi-tone system with low-pass ﬁlters and mixers at the transmitter
and receiver to create band-limited sub-channels. (b) AMT architecture with per-sub-channel
linear N-times over-sampled equalizers at the transmitter, and mixer and integrate-and-dump at
the receiver [11].
A 8.4 Gb/s transceiver with 2.5 pJ/b energy eﬃciency for mobile memory I/Os is presented
in [12]. The proposed system architecture and the employed frequency planing are shown
in Fig. 2.7(a) and (b), respectively. This work employs mixed BB signaling and amplitude
shift-keying (ASK) to achieve a better performance over a FR-4 channel with 10 cm
length, which has a smooth frequency response (i.e., exponentially decaying impulse
response) with about 3 dB and 8 dB loss at 5 GHz and 23 GHz, respectively. Having
2.5 pJ/b eﬃciency over a smooth channel, the whole TRX is implemented in 65 nm
CMOS. However, the link system does not demonstrate a superior performance compared
to a conventional BB link system for the same channel. Furthermore, due to high carrier-
frequency (23 GHz), inductors have been employed in the design for ﬁltering purposes
and reducing ICI; thereby, resulting in a bulky design, as shown in Fig. 2.7 (c).
18 Chapter 2. State-of-the-Art Link Systems and Preliminaries
Figure 2.7: (a) The proposed BB+RF architecture with forwarded-clock for simultaneous
bidirectional signaling. (b) Dual-band signaling in frequency domain [12]. (c) TRX chip die
photo.
A spectrum-shaping coding scheme is proposed in [37]. In this work, it has been shown
that by repeating the transmitted pattern every M -bits, the TX spectrum is shaped
and it bears an intrinsic null at fb/(2M), where fb bit rate. Therefore, knowing the
approximate location of the notch frequency, one can adapt the output spectrum to the
channel frequency notches by changing the repetition frame length. Hence, the proposed
spectrum shaping technique can be employed to improve the data rate of high-speed
SerDes communication systems over multi-drop bus interfaces. However, the eﬀective
data rate of the link system remains at fb/M while the whole TRX should operate at fb
data rate, which is indeed higher than the eﬀective data rate.
A system-level analysis of a 20 Gb/s multi-tone serial link for communicating over FR-4
traces, which has 30 dB attenuation at 10 GHz, is presented in [13]. Fig. 2.8 shows a
representative architecture, which composes of four sets of 5 Gb/s binary streams, each
stream is modulated on a sub-carrier, fcj, and the resulting four sub-channels are summed
at the output. In this architecture, the sub-band spectra, shown in Fig. 2.8 (b), have
little overlap to avoid the use of high quadrature precision and very low phase noise so as
to retain orthogonality of the sub-channels. Moreover, to reduce the overall bandwidth
a 16-QAM modulation is employed for each of the sub-streams, thereby, the eﬀective
bandwidth is reduce by a factor of four, and the equalization complexity is relieved.
Table 2.1 summarizes the required TRX performance for the proposed system, which is
2.2. Multi-Tone Link Systems 19
Figure 2.8: (a) Proposed system architecture in [13]. (b) Proposed frequency planning with
aggregate data rate of 20 Gb/s.
Table 2.1: Required Transceiver Performance presented in Fig. 2.8 [13].
Transceiver 
Number of Sub-channels 4 
Modulation 16-QAM 
Baseband Data Rate 5 Gb/s 
Phase Noise −117 dBc/Hz @ 10-MHz offset 
Transmitter 
Output Swing 1.2 Vpp (at summing node) 
Pulse Shaping Raised-Cosine Filter with roll-off factor = 0.8 
I/Q Mismatches 
1% Gain Imbalance 
1° Phase Imbalance 
Receiver 
Noise Figure 23.5 – 43 dB 
IIP3 −8 – 15.3 dBm 
Gain −1.4 – 28 dB 
Equalization 13 dB Pre emphasis at maximum 
evaluated by MATLAB time-domain simulation. Although the speciﬁcations are not too
stringent to be realized on silicon, the use of 16-QAM modulation severely limits the RX
equalized eye-opening (less than 20% horizontal eye-opening is illustrated in this work).
Moreover, due to the frequency planing, which bears data at odd carrier harmonics, the
ICI is mainly aﬀected by clock harmonics. Hence, a pure sinusoidal carrier is required for
upconverting the TX sub-streams, which can impose power hungry clock buﬀers and a
complicated clock generation circuits.
20 Chapter 2. State-of-the-Art Link Systems and Preliminaries
-80
-60
-40
-20
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 4 8 12 16 20
-80
-60
-40
-20
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 4 8 12 16 20
PAM2
2.5Gb/s
QPSK
5Gb/s
(a) (b)
NRZ spectrum
MDB freq. response
Hybrid spectrum
MDB freq. response
Figure 2.9: Proposed mixed NRZ/MT signaling scheme. (a) NRZ spectrum over a multi-drop
channel. (b) Hybrid NRZ/MT spectrum over a multi-drop channel.
Overall, as it can be seen in the simulation and measurement results of MT systems,
which are presented in this section, the major advantages of MT signaling in serial link
system are twofold. Firstly, the TX spectrum can be shaped according to the channel
frequency response, thus, the channel capacity can be better used for data transmission.
Secondly, from the circuit realization perspective, most of the system building-blocks
(e.g, MUX/DEMUX, CDR, DFE, FFE) operate at only a fraction of the total data
rate; therefore, they can be realized with better power eﬃciency and less complexity.
However, the MT transceiver closely resembles an RF communication system, facing
similar sensitivity, linearity, and precision issues. Moreover, unlike typical RF systems,
the broadband nature of the serial link introduces additional harmonics and interference
eﬀects that can severely impact the system performance. Therefore, in order to employ
multi-tone signaling in wireline communication systems, one should consider all these
requirements so as to the MT link system building-blocks can be designed eﬃciently, and
the advantages that are made available by MT signaling will not be overturned by the
tight circuit speciﬁcations. In the Chapter 3, a proposed hybrid/multi-tone system is
presented, which considers all these design requirements.
2.3 Preliminaries of Hybrid NRZ/Multi-Tone System-level Design
In memory interfaces, the channel frequency response bears some notches due to impedance
discontinuity and multiple reﬂections, as will be discussed in Chapter 3. In this paradigm,
shaping the TX spectrum according to the channel frequency response can be a wise
solution to improve the system performance and reduced the circuit complexity. Based
on this understanding, Fig. 2.9 presents a mixed NRZ/MT spectrum, which can properly
increase the power eﬃciency by avoiding the channel frequency notches. Here, the
number of sub-bands is reduced to three, hence, the clock harmonics do not create ICI.
2.4. Conclusion 21
Likewise, a QAM modulation is adopted for the pass-band so as to avoid multi-level
data transmission, which limits the horizontal eye-opening. Since the carrier frequencies
are integer multiples of the sub-band symbol rate, the ICI pattern does not change
from one symbol to another, and consequently, it can be canceled through appropriate
time-delay between the transmitted sub-streams on TX side. The proposed system bears
the most essential characteristic of the MT signaling (i.e., spectrum shaping), whereas
the clock harmonics and ICI impacts on the system performance can be alleviated by
proper frequency planning and time-shifting, respectively.
In the next chapter, we will describe this architecture in details, and system- and circuit-
level analysis, design, and implementation of the proposed hybrid NRZ/MT transceiver
will be presented.
2.4 Conclusion
In this Chapter state-of-the-art serial data transceivers are brieﬂy reviewed, and diﬀer-
ent types of link system architecture are presented. Furthermore, the advantages and
disadvantages of multi-tone signaling in wireline paradigm are reviewed, and based on
the wired communication requirements, a preliminary hybrid NRZ/multi-tone signaling
method is introduced. The proposed system has the spectrum shaping property, whereas
the adverse multi-tone eﬀects (e.g., ICI and clock harmonics) can be reduced by proper
frequency planning.

3 Hybrid NRZ/Multi-Tone Signaling
In this chapter we propose a new signaling scheme, called hybrid NRZ/multi-tone (MT),
which can shape the transmitted spectrum of the transmitter (TX) and be customized to
the characteristics of the channel, thus, it provides a power eﬃcient solution. The proposed
system consists of three wide-band sub-channels, and employs source-synchronous clocking
architecture that facilitates the clock and data recovery on the receiver (RX) side.
We start this chapter by a short overview of today’s popular memory interfaces, which are
widely employed by industry. Then, we describe the system level design and study of the
proposed hybrid NRZ/MT architecture and compare its equalization complexity to an
equivalent baseband (BB) system, for a sample memory channel. We continue the chapter
by circuit design and implementation of a 7.5 Gb/s mixed NRZ/multi-tone transceiver
(TRX) for multi-drop bus (MDB) memory interfaces in 40 nm bulk CMOS technology is
presented. Reducing the complexity of the equalization circuitry on the receiver (RX)
side, the proposed architecture achieves 1 pJ/bit link eﬃciency for a MDB channel with
45 dB loss at 2.5 GHz. The transmitted spectrum is composed of baseband (BB) and
I /Q sub-bands with the ability to match the modulation frequency of the entire TRX
with respect to the channel response over a ±25% range. A switched-capacitor-based
mixer/ﬁlter is developed to eﬃciently down convert and equalize the I /Q sub-bands
in the RX. The core size area is 85× 60 μm2 and 150× 60 μm2 for the TX and RX,
respectively.
We will show that the proposed hybrid NRZ/MT TRX can achieve a superior performance
compared to the conventional BB transceivers, especially for MDB interfaces, through
better allocation of the transmit power MDB channels, which has the frequency selective
characteristics.
23
24 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
3.1 Multi-Drop Memory Interfaces: Overview
In this section, the multi-drop memory interfaces are studied and the challenge of
traditional NRZ signaling for BB transceivers over MDB channels is presented and
analyzed.
3.1.1 Multi-Drop Channel Characteristics
Advancement in CMOS technology have enabled exponential growth of computational
power over three decades. However, data processing eﬃciency also relies on suﬃcient
data communication bandwidth between diﬀerent units of a computing system. For a
modern memory system, higher memory capacity, faster memory access and price are the
major constraints.
From the memory interface design perspective, achieving these goals requires the design
of faster input/outputs (I/Os), which connect to a larger number of memory modules and
cost the same or less. To provide higher capacity, either the number of I/O pins should
be increased, or the data rate on each I/O pin should be improved. The former solution
involves more cost since the chip pin count increases and the routing congestion on the
printed circuit board (PCB) grows rapidly. The latter approach has been the industry
trend for many years. For example, double-data rate second generation (DDR2) provides
the data rates from 400 Mb/s/pin to 1066 Mb/s/pin, while for the third generation
DDR (DDR3), the data rates from 800 Mb/s/pin to 1866 Mb/s/pin are available [38, 39].
However, as the speed increases, the communication bandwidth of the PCB traces limits
the access speed on each pin. For example, advanced DDR3 memory interfaces can operate
at 1.866 Gb/s/pin while for modern graphics DDR (GDDR5) the ﬁgure is 10 Gb/s/pin,
since it employs a point-to-point memory interface that provides a smooth frequency
response [6]. Such a considerable diﬀerence reﬂects the major eﬀect of the channel
frequency response at maximum speed that a link can reach within a reasonable power
budget.
Memory systems typically apply dual in-line memory modules (DIMMs), shown in
Fig. 3.1 (a), because of their high capacity and low cost. In such interfaces, there are
one or more notches in the transfer function of the channel that dissipate the major
part of the transmitted power and create reﬂections. Such frequency notches, mainly
induced by impedance discontinuity and reﬂections, create the multi-drop bus (MDB)
characteristic between controller and DRAMs. The multi-drop nature channel causes
a longer delay, slower rise-and-fall time and long tail pulse response for the data rates
beyond the ﬁrst notch [40]. Fig. 3.1(b) shows the simpliﬁed block diagram of the 3-DIMM
3.1. Multi-Drop Memory Interfaces: Overview 25
DRAM2DRAM1
C
on
tr
ol
le
r
DRAM3
L1
1 2
1 2
1 2
( )
2 2notch
F
L L
? ??? ???? ? ? ?
L2
Memory 
controller DRAMs
Figure 3.1: (a) Controller to DRAM MDB interface. (b) Simpliﬁed block diagram of interface
in Fig. 3.1 (a) showing the multi-path fading.
memory interface in Fig. 3.1 (a), with three possible paths for the controller-DRAM3
communication. While the memory controller is communicating with one DRAM module,
the other modules are idle and can be partially terminated or non-terminated based on
the standard [40]. Hence, in this situation they produce the reﬂection coeﬃcient of Γ1
and Γ2 due to their imperfect termination. The electrical length-diﬀerences in these paths
and the phase shift, caused by Γi, lead to multi-path reﬂections. These reﬂections can
cause destructive superposition at certain frequencies and produce notches in channel
frequency response. Based on this explanation and the block diagram of Fig. 3.1 (b), the
ﬁrst notch in the frequency response of the 3-DIMM memory interface of Fig. 3.1 (a) can
be calculated as
Fnotch =
c
2π
× π − ∠Γ1 − ∠Γ2
ΔL1 +ΔL2
(3.1)
where c is the eﬀective speed of light in PCB traces, Γ1 and Γ2 are the reﬂection coeﬃcients
of the idle DIMM1 and DIMM2, respectively, and ΔL1 and ΔL2 are the physical path
diﬀerence between direct and reﬂected paths to DIMM1 and DIMM2, respectively.
A typical MDB memory channel has frequency notches at a nominal Fnotch and all its
odd multiplications due to frequent reﬂections [41]. To study MDB channel properties, a
26 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
(a)
MDB
Channel
Open
stubs
(b)
-60
-45
-30
-15
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 2.4 4.8 7.2 9.6 12
Figure 3.2: (a) The fabricated MDB channel. (b) The frequency response of a sample MDB
showing ﬁrst notch at 2.5 GHz.
sample multi-drop channel is designed and fabricated on FR-4 material that can mimics
the behavior of a real MDB interface, which has two DIMMs per channel. The DIMMs
are modeled by introducing two open-stubs that create the length diﬀerence between the
direct and the reﬂected signals, similar to what is shown in Fig. 3.1 (b). Fig. 3.2 (a) shows
the fabricated sample channel that is implemented as a diﬀerential wireline interface. The
measured reference MDB channel demonstrates the frequency notches around the nominal
value of (2k − 1)× 2.5 GHz, k ∈ {1, 2...}, as shown in Fig. 3.2 (b). The resonance eﬀect
creates nulls in the frequency response of the channel, causing substantial band limiting
beyond what is expected from simple RC loads. Since transmission lines have periodic
characteristics in the frequency domain, the nulls approximately repeat at odd harmonics
of their fundamental frequency. As (3.1) shows, the frequency of the ﬁrst notch is inversely
proportional to the length of the DIMM stubs. Moreover, having the adjustable stub
lengths in the fabricated channel, we are able to change the notch frequencies in our
sample MDB channel around 40%, as shown in Fig. 3.3. Furthermore, from the time
domain point of view, having notches in the frequency response causes several reﬂections
and, thus, long tail pulse response.
Fig. 3.3 illustrates the measurement results of the channel frequency and pulse response
for diﬀerent stub conﬁgurations, showing the tunability of the ﬁrst notch from 600 MHz
to 3.5 GHz. As Fig. 3.3 demonstrates, the pulse responses have long tail (2 ns to 4 ns),
which makes the equalization process very challenging for conventional baseband TRX.
Overall, in the memory system paradigm, the multi-drop interface between memory units
and the controller renders the data equalization even more complicated. This necessitates
a comprehensive equalization in the receiver front-end. In addition to being power-hungry,
implementing such systems is challenging in the CMOS memory processes since the
DRAM technology normally oﬀers only low-cost and simple processes with slower devices.
3.1. Multi-Drop Memory Interfaces: Overview 27
(e) (f)
Frequency (GHz)
-60
M
ag
ni
tu
de
(d
B
)
0 2.4 4.8 7.2 9.6 12
-45
-30
-15
0
Time (ns)
-100
0
100
200
A
m
pl
itu
de
(m
V
)
6 7.2 8.4 9.6 10.8 12
post-cursors
2ns
(c) (d)
Frequency (GHz)
-60
M
ag
ni
tu
de
(d
B
)
0 2.4 4.8 7.2 9.6 12
-45
-30
-15
0
Time (ns)
-200
0
200
400
A
m
pl
itu
de
(m
V
)
6 7.2 8.4 9.6 10.8 12
post-cursors
3ns
(a) (b)
Frequency (GHz)
-60
M
ag
ni
tu
de
(d
B
)
0 2.4 4.8 7.2 9.6 12
-45
-30
-15
0
Time (ns)
-200
0
200
400
A
m
pl
itu
de
(m
V
)
6 7.2 8.4 9.6 10.8 12
post-cursors
3ns
(g) (h)
Frequency (GHz)
-60
M
ag
ni
tu
de
(d
B
)
0 2.4 4.8 7.2 9.6 12
-45
-30
-15
0
Time (ns)
-100
0
100
200
A
m
pl
itu
de
(m
V
)
6 7.2 8.4 9.6 10.8 12
post-cursors
2ns
Figure 3.3: The measured frequency response and the 3× Fnotch b/s single-bit pulse response for
diﬀerent stub lengths. (a) Fnotch =660 MHz. (b) 2 GHz single-bit response. (c) Fnotch =1.25 GHz.
(d) 3.75 GHz single-bit response. (e) Fnotch =2.5 GHz. (f) 7.5 GHz single-bit response. (g)
Fnotch =3.7 GHz. (h) 11 GHz single-bit response.
28 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
MDB
DFE
EqualizerTX CTLE
RX
18 DFE tapsRJTX = 0.7 ps
(a)
(b) (c)
0.
1
V
0.
11
27 ps
10-16
10-12
10-8
10-4
100
B
E
R
Clock phase (UI)
0.1 0.3 0.5-0.1-0.3-0.5
13.3 ps8% UI
Figure 3.4: (a) Conventional BB transceiver block diagram for communicating at 7.5 Gb/s
over MDB interface. System-level simulation results for the link shown in Fig. 3.2 (b). (b) Eye
diagram for 7.5 Gb/s data rate, (c) bathtub curve, both after optimizing the CTLE and DFE
response.
From the technology-trend point of view, to provide higher speed the maximum number
of drops is reduced from eight in DDR2 to four in DDR3. However, reducing the number
of drops results in fewer memory slots and reduces the memory capacity. In the next
section, we show the complexity of designing a traditional baseband TRX in more detail.
3.1.2 Baseband Signaling in Multi-Drop Interfaces
To illustrate the complexity of designing a traditional BB serial data transceiver for a
MDB memory interface, a conventional BB transceiver is modeled and simulated for
our reference MDB channel. Fig. 3.4 (a) shows the BB transceiver used in system-level
simulation. Here, the transmitter employs NRZ signaling and is assumed to have a
0.7 psrms output jitter, i.e., 0.05% UI jitter. The receiver has continuous-time linear
equalizer (CTLE) and decision feedback equalizer (DFE) for data equalization. Both
CTLE and DFE are considered ideal blocks without random noise. The RX employs
adaptive least mean squares (LMS) algorithm to ﬁnd the DFE tap weights while the
CTLE provides around 8 dB gain boosting at Nyquist frequency. The transceiver should
communicate at 7.5 Gb/s over the reference MDB channel, which has the ﬁrst notch
3.1. Multi-Drop Memory Interfaces: Overview 29
(a) (b)
0.
1
V 27 ps
10-16
10-12
10-8
10-4
100
B
E
R
Clock phase (UI)
0.1 0.3 0.5-0.1-0.3-0.5
13.3 ps
(c) (d)
10-16
10-12
10-8
10-4
100
B
E
R
Clock phase (UI)
0.1 0.3 0.5-0.1-0.3-0.5
13.3 ps
0.
11
0.
2
V 40 ps
31% UI
Figure 3.5: System-level simulation results for the MDB channel of Fig. 3.2 (b) having 0.5%
jitter on both TX and RX. (a) Eye diagram for 7.5 Gb/s data rate, (b) bathtub curve for 7.5 Gb/s
data rate, both after optimizing the CTLE and DFE response. System-level simulation results
for the same interface. (c) Eye diagram for 5 Gb/s data rate, (d) bathtub curve for 5 Gb/s data
rate, both after optimizing the CTLE and DFE response.
frequency at 2.5 GHz, as shown in Fig. 3.2 (b).
The statistical and bit-by-bit system-level simulation is performed to evaluate the bathtub
and eye diagram, respectively, at the receiver output. The system-level simulations show
that using the conventional receiver architecture of Fig. 3.4 (a), a DFE with about 18
taps is needed to equalize the eﬀects of the notches and achieve an 8% UI eye-opening,
UI = 133 ps for a 7.5 Gb/s data rate, at a bit error rate (BER) of 10−12. The equalized
eye diagram and the corresponding bathtub on the RX side are shown in Fig. 3.4 (b)
and (b), respectively. Using an 18-tap DFE will certainly lead to signiﬁcant power
dissipation. Assuming 100 μW/Gb/s per tap, the consumption of the DFE block will
be about 1.8 mW/Gb/s or more [27]. With such a high level of consumption and circuit
complexity, however, the eye-opening would be only 10 ps at BER = 10−12 for 7.5 Gb/s
data rate. Adding the same amount of random noise on the RX side (i.e., 0.7 psrms jitter),
the horizontal eye-opening for the equalized data becomes almost closed (only 1% UI
eye-opening at BER = 10−12), as shown in Fig. 3.5 (a) and (b).
30 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
-80
-60
-40
-20
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 3 6 9 12 15
(a)
Txout spectrum
MDB freq. response
Frequency (GHz)
0 3 6 9 12 15
PAM2 QPSK
(b)
Txout spectrum
MDB freq. response
-80
-60
-40
-20
0
M
ag
ni
tu
de
(d
B
)
Figure 3.6: Transmitted spectrum in a MDB channel. (a) Conventional NRZ signaling. (b)
Hybrid NRZ/MT signaling.
To demonstrate how important is the ratio of the notch frequency to the Nyquist rate of
the data stream, the system level simulations is performed for communicating at 5 Gb/s
over the same MDB interface of Fig. 3.2 (b). In this simulation, each TX and RX has a
1 psrms jitter (i.e., 0.5% UI jitter) and the simulation results for the equalized eye diagram
and the corresponding bathtub are presented in Fig. 3.5 (c) and (d), respectively. These
simulations show that when the Nyquist rate is well above the ﬁrst notch, the equalization
is very challenging, and conventional baseband techniques cannot provide an error free
operation even with an ideal CTLE and many DFE taps. However, if the Nyquist rate is
around the ﬁrst notch the link can be well equalized employing the traditional baseband
techniques. We will study this behavior from the frequency domain point of view, in the
next section.
3.2 Hybrid NRZ/MT Signaling: System Design Overview
In this section, the proposed hybrid NRZ/MT transceiver for communicating over MDB
interfaces is describe in details, and the proposed architecture that facilitates a practical
circuit implementation is introduced.
A NRZ bit stream with a 1/Tb data rate bears around 90% of the transmitted bit energy
below ω = 2π/Tb, and 77% of which is below the Nyquist rate, i.e., ω = π/Tb [42, 43].
Therefore, employing NRZ signaling for communicating over a MDB interface, which
has the ﬁrst frequency notch well below the Nyquist rate of the data streams, leads to a
signiﬁcant energy loss around the ﬁrst notch. Fig. 3.6 (a) shows the output spectrum of a
NRZ transmitter over a MDB channel. The ﬁrst notch wastes a signiﬁcant part of the
spectrum energy since it is well below the desired Nyquist frequency.
Avoiding channel frequency notches by employing a proper modulation scheme can prevent
3.2. Hybrid NRZ/MT Signaling: System Design Overview 31
Tref
LPF
BPF
CKI
CKQ CKQ
CKI
LPF
?T1 BPF
?T2
?T3
Tref
Tref
LPF
Tref
Tref
1/Tref Gb/s
1/Tref Gb/s
1/Tref Gb/s
-60
-45
-30
-15
0
Channel freq. response
Figure 3.7: Proposed mixed NRZ/MT system architecture.
bit-energy waste around the notches, and reduces the equalization circuit complexity.
However, any modulation scheme for wireline communication should be employed carefully
to keep the system requirements feasible within a reasonable power budget.
Fig. 3.6 (b) illustrates the proposed NRZ/MT transmitted spectrum over a MDB channel.
Here, the lower-frequency band of the channel, from DC to the ﬁrst notch (i.e., 0-2.5 GHz),
is used for transmitting up to 2.5 Gb/s NRZ data and constitutes the PAM21 part of the
spectrum. Moreover, the upper-frequency band of the channel, between two notches (i.e.,
2.5-7.5 GHz), is exploited for transmitting 5 Gb/s data in the quadrature-phase shift-
keying (QPSK) format. This sub-band carries 2.5 Gb/s data on each of the in-phase (I)
and quadrature-phase (Q) content of the spectrum.
Furthermore, dividing the total bit-stream into three sub-bands (i.e., baseband, I sub-
band and Q sub-band) helps the latter to experience less channel loss. Hence, in-band
intersymbol interference (ISI) is reduced and as a result the equalizing process can be
more power eﬃcient.
By employing this modulation scheme, the NRZ/MT transceiver represents an RF
paradigm and the linearity and clock harmonic issue should thus be considered prop-
erly [13]. Moreover, in a multi-drop memory channel, although the channel frequency
response is well characterized before designing the frequency plan, the channel frequency
notches may change to some extent after fabrication due to PCB-related non-idealities.
Therefore, the TRX should be able to tolerate these eﬀects, and possibly adapt itself to
the channel frequency response.
1Generally speaking, it can be any kind of PAM-N, ENRZ, or doubinary baseband signaling.
32 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
(a)
-24
-20
-15
-10
-5
0 100-100 150-150 -50 50
(b)
(c)
-24
-20
-15
-10
-5
-24
-20
-15
-10
-5
Clock phase (ps)
-20
-16
-12
-8
-4
-24
-20
-16
-12
-8
-4
-24
B
ER
B
ER
B
ER
-20
-16
-12
-8
-4
-24
0.3
0.1
-0.1
-0.3
M
ag
ni
tu
de
(V
)
0.3
0.1
-0.1
-0.3
M
ag
ni
tu
de
(V
)
0.3
0.1
-0.1
-0.3
M
ag
ni
tu
de
(V
)
0 100-100 150-150 -50 50
Clock phase (ps)
25% UIsub
19%
UIsub
19%
UIsub
Figure 3.8: System-level statistical simulation for the proposed architecture in Fig. 2.9. (a)
Baseband, (b) I sub-band, and (c) Q sub-band statistical eye diagram and corresponding bathtub.
3.2.1 Hybrid NRZ/MT TRX Statistical System-level Modeling
Based on the proposed frequency planning and signaling scheme of Fig. 3.6 (b), Fig. 3.7
illustrates a mixed NRZ/MT serial data transceiver for communicating over a memory
channel, whose frequency response has multi-drop nature as shown in the inset of Fig. 3.7.
Here, the number of sub-bands is reduced to three, hence, the clock harmonics do
not create ICI. Therefore, square-wave clocking scheme is adopted for UP and DOWN
3.2. Hybrid NRZ/MT Signaling: System Design Overview 33
(a)
0 100-100 150-150 -50 50
(c)
Clock phase (ps)
0.3
0.1
-0.1
-0.3
M
ag
ni
tu
de
(V
)
0.3
0.1
-0.1
-0.3
M
ag
ni
tu
de
(V
)
-15
-14
-10
-4
-12
-8
-6
-2
(b)
0.3
0.1
-0.1
-0.3 -15
-14
-10
-4
-12
-8
-6
-2
-15
-14
-10
-4
-12
-8
-6
-2
0.3
0.1
-0.1
-0.3 -15
-14
-10
-4
-12
-8
-6
-2
0 100-100 150-150 -50 50
Clock phase (ps)
(d)
Figure 3.9: Q sub-band eye diagram for diﬀerent amount of clock phase mismatch. (a) 5°, (b)
10°, (c) 15°, and (d) 20° I/Q phase mismatch.
conversion, which can relax mixer and clock generation unit requirements. Moreover,
unlike [13], the baseband part of the channel, which has moderate attenuation, is employed
to send a NRZ-based data stream so as to the channel notch is located beyond baseband
spectrum null. Likewise, a QAM modulation is adopted for the pass-band so as to avoid
multi-level data transmission, which limits the horizontal eye-opening. The only equalizer
that is used in this architecture is the 2nd-order low-pass ﬁlters2 (LPF), and a 2nd-order
band-pass ﬁlter (BPF) on the RX sides, which are used to select the frequency band of
interest. Since the carrier frequencies are integer multiples of the sub-band symbol rate,
the ICI pattern does not change from one symbol to another, and consequently, it can be
canceled through appropriate time-delay between the transmitted sub-streams on TX side,
as shown in Fig. 3.7. The proposed system bears the most essential characteristic of the
MT signaling (i.e., spectrum shaping), whereas the clock harmonics and ICI impacts on
the system performance can be alleviated by proper frequency planning and time-shifting,
2Filers with sharp frequency-domain roll-oﬀ necessitate more in-band equalizer taps (i.e., in-band
DFE and FFE), but cause less ICI, hence, require less cross-band DFE and cross-band FFE.
34 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Table 3.1: The hybrid NRZ/MT performance presented in Fig. 3.7.
Transceiver 
Number of Sub-channels 3 
Modulation PAM-2 (BB)/ 4-QAM (PB) 
Baseband Data Rate 2.5 Gb/s 
Clock RMS Jitter 1% UIsub (4 ps) 
Transmitter 
Equalization Optimum time-delay for each Baseband data 
Pulse Shaping NA 
I/Q Mismatches 
Gain Imbalance: NA* 
Phase Imbalance < 10° 
Receiver 
Horizontal eye-opening at BER=10-15 31%–37% UIsub 
Jitter 1% UIsub (4 ps) 
Equalization LPF/BPF 
* Since full-swing square-wave clocks are used, gain imbalance does not exist.
respectively.
20150 5 10
I/Q phase mismatch (degree)
30
0
50
100
150
H
or
iz
on
ta
l e
ye
-o
pe
ni
ng
 (p
s)
Figure 3.10: Horizontal eye-opening at BER = 10−15 versus I/Q phase mismatch.
Statistical system-level simulation3 of the proposed architecture of Fig. 3.7 has been
performed in MATLAB to render a preliminary performance analysis for very low bit
3 In Appendix A a brief overview of statistical simulation for serial data links is presented, and its
advantages over time-domain simulation is reviewed.
3.2. Hybrid NRZ/MT Signaling: System Design Overview 35
error rates (BER). Fig. 3.8 presents the simulation results for the equalized eye diagrams
on RX side for the BER as low as 10−24. In this simulation the RX has 2%× Tref rms
jitter, the aggregate data rate is chosen to be 7.5 Gb/s (i.e., Tref = 400ps), and ideal clock
with zero I/Q mismatch is employed. As Fig. 3.8 shows, even without any sophisticated
equalization scheme (e.g., DFE and XDFE) the horizontal eye-opening can be as large as
100 ps (i.e., 25% UIsub), and 78 ps (i.e., 19% UIsub), for the baseband and the passband
at BER = 10−20, respectively. This is a promising result since the system building blocks
can be realized with a reasonable eﬃciency, and we can surmise that in a real design
by applying a simple equalization scheme (e.g., CTLE at the RX front-end) the system
performance can be largely improved. However, the I/Q mismatch eﬀect should be also
well considered so as to prevent any impractical speciﬁcation for clock generation unit.
Fig. 3.9 presents the I sub-band eye diagram for diﬀerent I/Q mismatch. Although a
5° phase mismatch does not cause a critical eye-closure, this adverse eﬀect becomes largely
important for I/Q phase mismatch beyond 20°, i.e., 11.1 ps for a 5 GHz clock. The
horizontal eye-opening versus I/Q phase mismatch at BER = 10−15 is shown in Fig. 3.10.
It can be seen that having 5°-10° (the reference is a 5 GHz clock signal) still leaves enough
margin for the eye diagram to provide error free operation. From the circuit perspective,
this amount of phase mismatch can be guaranteed by a power-eﬃcient topology [24].
Table 3.1 summarizes the TRX speciﬁcations used in these system-level simulations. As
can be construed from this Table, the overall TRX requirements can be realized with
power-eﬃcient circuit topologies, and the system can reach a better performance compared
with a conventional BB link system. Moreover, having the carrier frequencies as integer
multiples of the sub-band, the low-pass ﬁlters and the mixer circuits can be implemented
eﬃciently as an integrate-and-dump circuit4 [44]. In the next Chapter these insights will
be applied in a memory system paradigm to realized a very power-eﬃcient and high-speed
serial data transceiver for communicating over a multi-drop channel interface.
3.2.2 Hybrid NRZ/MT TRX Detailed System Design
The proposed architecture for realizing the suggested modulation scheme of Fig. 3.6 (b)
is shown in Fig. 3.11. This architecture is developed to address the aforementioned
multi-tone challenges. The random data is generated by three independent embedded
PRBS15 generators, each operating at Fref , i.e., 2.5 GHz. Two mixers upconvert two sets
of random data into I and Q sub-bands to create the QPSK band. The local frequency
is chosen to be 2× Fref . The third stream is added together with I /Q sub-bands by
the output summer/driver circuit. In this architecture there is no data at the odd
multiplication of Fref . Therefore, the local oscillator harmonics do not cause any data
4Generally speaking, any low-pass ﬁlter that forms a perfect reconstruction set together with its
up-converted versions can replace the integrators.
36 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Fref Gb/s
Stream#1
Fref Gb/s
Stream#2
Fref Gb/s
Stream#3
(2×Fref)
(2×Fref)
(2×Fref)
(2×Fref)
LPF FrefGb/s
LPF FrefGb/s
BPF
LOI
LOQ
LOQ LOQ
LOI
Binary
Baseband Data
LOI
LPF FrefGb/s
Summer 
/Driver
2×Fref 3×FrefFref
QPSKNRZ
Figure 3.11: Proposed hybrid NRZ/MT transceiver.
corruption. From the linearity requirement point of view, QPSK modulation without
pulse shaping does not require a linear output driver [45]. Hence, a current-mode output
driver can appropriately add BB, I, and Q sub-bands together and construct the hybrid
NRZ/MT output stream. Our system-level simulation shows that without the baseband
pulse-shaping, the eye diagram can provide a 55% horizontal opening at BER = 10−12.
Adding a pulse-shaping ﬁlter (i.e. raised-cosine ﬁlter) to TX can improve the eye-opening
value by only 8%, i.e., 63% horizontal eye-opening. Hence, no pulse shaping is employed
due to the cost it adds to the TRX circuit complexity. A source-synchronous architecture
is employed for the clocking scheme. This architecture can relax the complexity of the
CDR circuit on the RX side and provide an inherent tracking of correlated jitter for CDR
purposes [46].
Beside these, the output spectrum of the proposed TRX has an inherent notch at Fref and
3× Fref , as shown in Fig. 3.11. Therefore, by adjusting the reference clock frequency the
TX spectrum shape can track the channel notches and customize the whole link to the
channel. In the NRZ/MT approach, in order to match the transmitted spectrum to that
of the channel, an initial calibration phase is applied which can change the reference clock
frequency depending on the quality of the received signal. The transmitted spectrum at
the output of TX can therefore be shaped with respect to the channel characteristic and
the aggregate TX bit rate equal 3× Fref b/s, while its spectrum has minimum energy
around the notches. The need for frequency adaptability of the system requires the TRX
circuits to employ such topologies that are tunable by changing Fref . In addition, as will
be shown in Section IV, experimental results show that having a 10% variation on the
Fnotch from its nominal value results in an approximately 30% reduction in horizontal
eye-opening and still leaves enough margin for error-free operation.
3.3. Hybrid NRZ/MT signaling: Circuit Design 37
Stream#1
PRBS15 D
D
Q
Q
LOI
Stream#3
PRBS15 D
D
Q
Q
LOQ
Stream#2
PRBS15 D
D
Q
Q Summer
/Driver
Txp
Txn
?? ??
?? ??
?? ??
M
D
LL
2.5 GHz
ref.
????
??
LOQ LOI
2.5 GHz
5 GHz
X1
X3
X2
2.5 Gb/s
BB streams
Figure 3.12: The architecture of the proposed mixed NRZ/MT transmitter.
3.3 Hybrid NRZ/MT signaling: Circuit Design
Based on the system-level requirements, the design of RX, TX, and the clock generation
unit are described in this section. The goal is to communicate at 7.5 Gb/s over the MDB
channel shown in Fig. 3.2 (b).
3.3.1 Transmitter
Fig. 3.12 shows the hybrid NRZ/MT transmitter architecture. Since NRZ plus QPSK
frequency planing is employed, two mixers are required to upconvert baseband data and
construct the QPSK band. The third stream should then be added on top of the QPSK
sub-band. Moreover, as described in Section II, baseband pulse-shaping is not applied.
The concurrent transitions are also avoided at the inputs of the summer/driver circuit.
Diﬀerent clock phases, {Φ1,Φ5}, {Φ2,Φ6}, and {Φ3,Φ7} are used for generating the
baseband streams and proper LOI,Q are employed for upconversion, as shown in Fig. 3.12.
The eye diagram for X1, X2, and X3 (the summer/driver input signals) is plotted in
Fig. 3.13(a). It shows that each transition has at least Tref/16 (i.e. 25 ps for Tref = 400 ps)
time diﬀerence from the nearest transition point. This can eﬃciently reduce the peak-to-
average ratio (PAR) and mitigate inter-channel interference (ICI). Fig. 3.13 (b) illustrates
38 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
-60
-45
-30
-15
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 2.4 4.8 7.2 9.6 12
MDB freq resp.
Simulated TXout
P
ow
er
(d
B
m
)
-80
-70
-60
-50
-40
(b)(a)
0
-1
1
100 200 3000 400
0
-1
1
X
1
(V
)
X
2
(V
)
X
3
(V
)
0
-1
1
Figure 3.13: (a) Eye diagram of the summer/driver input signals. (b) Simulated TX output
spectrum.
VBB1
LOp
LOn
(a) (b)
M1
M2
M3
M4
VCMFB
VCMFB
Txp Txn
Iss
Cmos
M5 M7
M6 M8
M1-4 : 1.16/0.04
M5,7 : 8.95/0.04
M6,8 : 4.95/0.04
Figure 3.14: (a) Upconversion passive mixer schematic. (b) Output driver schematic.
the simulated output power of the proposed TX of Fig. 3.13 (a), and it shows that the
transmitted stream yields matched spectrum notches to which of the MDB interface.
Fig. 3.14 (a) shows the upconversion mixer topology. Two double-balanced passive-type
mixers are used to upconvert two sets of baseband data. The schematic of the current-
mode summer/driver block is shown in Fig. 3.14(b). The LVDS summer/driver is designed
to add the sub-bands together and provide a 300 mV peak-to-peak swing at the output,
3.3. Hybrid NRZ/MT signaling: Circuit Design 39
-20
-15
-10
-5
0
P
ou
t(
dB
m
)
Pin (dBm)
-20 -15 -10 -5 0 5
5
-13dBm
1dB
OP1dB=-13 dBm
Figure 3.15: Output power of the LVDS driver circuit with respect to the 5 GHz input signal
power level.
Din
BPF
LOQ (2×Fref)
LPF
LOI (2×Fref)
LPF
LPF Amp.
Amp.
Amp.
D
B
B
out
D
Q
out
D
Iout
Fref 3×Fref2×Fref
NRZ QPSK
Din
VRF
VRF
VJ VK
Figure 3.16: The architecture of the proposed mixed NRZ/MT receiver.
while the output termination can be tuned by programmable resistors. The driver output
power versus input power is shown in Fig. 3.15 for a 5 GHz sinusoidal signal. The output
1-dB compression point is simulated to be -13 dBm, which is suﬃcient for the proposed
NRZ/MT modulation scheme.
40 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Din
VCMFB
R1
C2
R2
C1
50
?
V R
F
M1 M2
M5 M6
M7
M8R3
R4
50
?
2 pF
VDD
VDD
M3 M4
C3
C4
ISS1 ISS2
VDD
VDD
X
X
Y
Y
ISS3 ISS4
RL
Figure 3.17: Schematic of the BPF to pass QPSK sub-bands.
3.3.2 Receiver
The proposed NRZ/MT receiver is shown in Fig. 3.16. Each of the received sub-bands has
a Fref b/s data rate, which provides a 3×Fref b/s aggregate data rate. The received signal
is ﬁltered by the channel frequency response and is more attenuated for the QPSK band.
The attenuation for the NRZ band is around 5 dB, while the QPSK band experiences a
12 dB tilt from Fref to 3×Fref . Therefore, to recover the NRZ sub-band, low-pass ﬁltering
of the QPSK band provides suﬃcient equalization, and DFE or CTLE is not required.
For the QPSK sub-band, after bandpass ﬁltering and gain boosting, the direct-conversion
architecture is used for downconversion.
The simulation results over diﬀerent process corners show that the received signal has at
least a total power of -19 dBm and -24 dBm for the NRZ and QPSK sub-bands, respectively.
The QPSK modulation necessitates an SNR of about 18 dB for BER = 10−14 [45]. The
required SNR for the NRZ band is calculated to be 13 dB at BER = 10−14. Based on
these speciﬁcations, the noise ﬁgure (NF) of the input ampliﬁer can be calculated as [45]:
NF = Pmin − 10 log(BW )− SNRmin + 174 (3.2)
where Pmin is the minimum received signal power, BW is the frequency band of interest,
and SNRmin is the required SNR level for equalizing received data. Replacing the
corresponding numbers in 3.2, the receiver NF for BER = 10−14 is found to be 35 dB
and 45 dB for the QPSK and NRZ sub-bands, respectively. Hence, the bandpass ﬁlter
(BPF) and the input lowpass ﬁlter (LPF) for the NRZ sub-band should be designed such
that this NF is satisﬁed for each sub-band receiver. Moreover, based on the system-level
simulation, ±5◦ phase mismatch between LOI and LOQ yields a 2 dB SNR penalty in the
3.3. Hybrid NRZ/MT signaling: Circuit Design 41
vicinity of BER = 10−12. This amount of the SNR penalty can be properly compensated
in the input ampliﬁer by providing a 2 dB lower NF speciﬁcation. Hence, the LOI/Q
phase mismatch should be kept lower than ±5◦ and the analog front-end (AFE) should
be designed such that the receiver NF remains below 32 dB. We describe the design of
these circuits in the following.
3.3.2.1 AFE Circuit Design
BPF Design: The schematic of the input ampliﬁer for the QPSK sub-band is shown
in Fig. 3.17. The QPSK sub-band RX front-end employs a bandpass ampliﬁer with
+10 dB of peaking at 2 × Fref to help with channel loss equalization. It should also
suppress the baseband part of the received signal, i.e., from DC to Fref . To achieve this
target, a two-stage peaking ampliﬁer is used that employs capacitive source degeneration.
The ﬁnal stage of the ampliﬁer consists of a source follower buﬀer and an ac-coupling
capacitor, in order to drive the succeeding mixer stages with a suﬃcient current and
provide a constant source impedance during downconversion. In this circuit, M5 and M6
are in strong inversion. This circuit is basically consists of two peaking ampliﬁer and a
high-pass common-source circuit (i.e., M7,8 with capacitor C3,4). The ﬁnal band-pass
transfer function is generated by cascading these three stages. Based n these explanations,
the BPF gain can be calculated as
H(s) =
VRF
Din
(s) = Av01
1 + s/z1
1 + s/p1
·Av02 1 + s/z2
1 + s/p2
·Av03 s/p3
1 + s/p3
× 1
(1 + s/pX)(1 + s/pY)
(3.3)
42 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
where the zeros and poles are associated to each nodes, shown in Fig. 3.17, and the
coeﬃcients can be calculated as
Av01 = − gm1
1 + gm1R2 + gm1/ro1
· [ro1 +R2 (1 + gm1ro1)] ‖ R1 (3.4a)
z1 = − 1
2C1R2
; p1 = −R1 + ro1 +R2 (1 + gm1ro1)
2C1 (ro1 +R1)R2
(3.4b)
Av02 = − gm3
1 + gm3RISS1 + gm3/ro3
· [ro3 +RISS1 (1 + gm3ro3)] ‖ ro5 ≈ − ro5
RISS1
(3.4c)
z2 = − 1
2C2RISS1
; p2 = −ro5 + ro3 +RISS1 (1 + gm3ro3)
2C2 (ro3 + ro5)RISS1
≈ − gm3ro3
2C2 (ro3 + ro5)
(3.4d)
Av03 =
gm7RISS3
1 + gm7RISS3
; p3 = − 1
C3ZL
(3.4e)
pX = − 1
CX [ro1 +R2 (1 + gm1ro1)] ‖ R1 (3.4f)
pY = − 1
CY [ro3 +RISS1 (1 + gm3ro3)] ‖ ro5 (3.4g)
where CX and CY are the parasitic capacitances at nodes X and Y, respectively, RISS1
and RISS3 are the output resistance of the current sources ISS1,2 and ISS3,4, respectively,
ro1, ro3, and ro7 are the output resistance of M1, M3, and M5, respectively, RL is the
load resistance seen by the ampliﬁer, and gm1, gm3, and gm7 are the transconductance of
M1, M3, and M7, respectively. In deriving (3.4) it is assumed that the circuit operates
diﬀerentially and the parasitic capacitors at the drain of M1−4 are negligible.
To have a better insight into the operation of the BPF, the simpliﬁed gain expression of
this ampliﬁer, which is indeed a bandpass CTLE, can be written as
Av(s) =
VRF(s)
Din(s)
≈ Av0 sC3RL (1 + 2sC1R2)
(1 + sC3RL) (1 + 2sC1/gm1)
× 1 + 2sC2RISS1
1 + 2αsC2/gm3
(3.5)
where Av0 = gm1gm3/ [(1 + gm3RISS1) (1 + gm1R2)]× gm7ro5R1RL/ (1 + gm7RL), RISS1 is
the output resistance of the current source ISS1,2, α = ro3/(ro3 + ro5) where ro3 and ro5
are the output resistance of M3 and M5, RL is the load resistance seen by the ampliﬁer,
and gm1, gm3, and gm7 are the transconductance of M1, M3, and M7, respectively. In
deriving (3.5) it is assumed that the circuit operates diﬀerentially, the parasitic capacitors
at the drain of M1−4 are negligible, and gmro  1.
The ampliﬁer frequency response can be adjusted by the diﬀerent settings. Fig. 3.18 plots
the simulation of its transfer function for diﬀerent capacitor values, and in diﬀerent PVT
3.3. Hybrid NRZ/MT signaling: Circuit Design 43
(a)
C1=25 fF
C1=6 fF
TT @ 27°C
(b)
C1=25 fF
C1=6 fF
FF @ -40°C
(c)
Frequency (GHz)
0.1 1 10 1000.01
C1=25 fF
C1=6 fF
SS @ 27°C
(d)
C1=6
fF
SS @ 80°C
C1=25 fF
C1=6 fF
Frequency (GHz)
0.1 1 10 1000.01
Frequency (GHz)
0.1 1 10 1000.01
Frequency (GHz)
0.1 1 10 1000.01
-30
-20
-10
0
10
20
-30
-20
-10
0
10
20
-30
-20
-10
0
10
20
-30
-20
-10
0
10
20
Figure 3.18: Transfer function of the BPF for diﬀerent capacitor value, C2, settings. (a) TT
corner at 27° C. (b) FF corner at -40° C. (c) SS corner at 27° C. (d) SS corner at 80° C.
-40
-30
-20
-10
0
10
20
Frequency (GHz)
0.1 1 100.01 100
Reducing ISS1
Figure 3.19: BPF transfer function for diﬀerent gain settings.
corners, revealing a 10 dB high-frequency boosting, while it suppresses the low-frequency
content of the spectrum. The peak frequency is tunable over the 3.5-7.5 GHz band and
the 3 dB bandwidth is approximately 6.5 GHz for the peak frequency of 5 GHz.
The equalization amount can be changed by changing the second stage gain. This is
done by changing the bias current of M3,4 and adjusting gm3,4. The current source has
four diﬀerent settings. The gain simulation result for diﬀerent current settings is shown
in Fig. 3.19. It shows a 6 dB gain adjustment for the peak frequency.
44 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Din
C1
M3
M1 M2
M4
M5 M6
C1
DBBout
C2
M7 M8
VCMFB
Din X X
VDD VDDVDD
VCMFB
ISS
Figure 3.20: Schematic of the LPF used to select NRZ sub-band.
NRZ Input LPF Design Fig. 3.20 shows the schematic of the input LPF that is employed
for the NRZ sub-band. The Gm-C biquad section consists of transistors M3-M6 and the
capacitors C1 and C2 [47]. In this circuit, M7 and M8 convert the input diﬀerential
current into the voltage for the Gm-C biquad section and they also improve the ﬁlter’s
cut-oﬀ frequency. This circuit implements a low-pass transfer function, which helps to
further suppress the signal from unwanted sub-bands. The biquad section transfer function
(i.e. Hbq(s) = DBBout/VX), and the input stage transfer functions (i.e., Hin(s) = VX/Din)
can be calculated as
Hbq(s)=
s2 +
(
ωz/Qz
)
s+ ω2z
s2 +
(
ω0/Q0
)
s+ ω20
·
(
ω20
ω2z
)
(3.6a)
Hin(s)=
−gm1
[1−Hbq(s)](gm3+sCgs3)+[1+Hbq(s)](gm7+sCgs7) (3.6b)
where ω2z = gm3gm5/[C1(Cgs3−Cgs7)], ω20 = gm3gm5/(2C1C2), gmK represents the transcon-
ductance of MK (K is the transistor number), CgsN is the gate-source capacitance of MN
(N is the transistor number), Qz = ωzC1(Cgs3 − Cgs7)/(gm3Cgs5), and Q0 = 2ω0C2/(gm5).
In deriving (3.6b) it is assumed that the circuit is operating fully diﬀerentially, the
drain-source capacitors are negligible, and C1, C2  CgsN. The overall ﬁlter transfer
function can be expressed by
HLPF(s) = Hbq(s)×Hin(s) (3.7)
3.3. Hybrid NRZ/MT signaling: Circuit Design 45
G
ai
n
(d
B
)
-50
-30
-10
0
10
-20
-40
(a)
Frequency (GHz)
0.1 1 10 20
C1=50 fF
C1=5 fF
G
ai
n
(d
B
)
-50
-30
-10
0
10
-20
-40
(b)
Frequency (GHz)
0.1 1 10
C1=50 fF
C1=5 fF
G
ai
n
(d
B
)
-50
-30
-10
0
10
-40
(c)
Frequency (GHz)
0.1 1 10 20
FF @ -40°C SS @ 100°C
C1=50 fF
C1=5 fF
-20 TT @ 27°C
20
Figure 3.21: Transfer function of the BPF for diﬀerent capacitor value, C2, settings. (a) FF
corner at -40° C. (b) SS corner at 100° C. (c) TT corner at 27° C.
This transfer function presents a LPF transfer function, which removes the QPSK sub-
band and ampliﬁes the NRZ sub-band. The ratio of Cgs3/Cgs7 is around 6 in this design
to improve the bandwidth of the ﬁlter and provide a better time-domain pulse response.
This circuit has a common-mode-feedback (CMFB) that sets the output DC value. The
simulation of the ﬁlter transfer function for diﬀerent capacitor settings and for diﬀerent
PVT corners is plotted in Fig. 3.21, showing a tunable 3 dB bandwidth from 1.7 GHz to
3.3 GHz. Based on these simulations, the maximum interferer energy from QPSK band
(i.e., QPSK-to-NRZ ICI) remains below -15 dBm that is suﬃcient for our application.
By adjusting the current of the input diﬀerential pair (i.e., ISS in Fig. 3.20), gm1,2 is
adjusted and the DC gain can be tuned from -4.5 dB up to 3.9 dB. The current DACs are
placed at the drain and the source of the input diﬀerential pairs and the bias current of
the input stage is adjusted such that the bias current of the second stage is fairly constant.
Therefore, the overall transfer function shape is not changed but the DC gain is tuned.
3.3.2.2 Downconverting Mixer / Filter Unit
To properly recover the data that I/Q sub-bands are carrying over, a switched-capacitor
mixer and ﬁlter (SCMF) circuit is developed. In addition to downconverting the received
signal, the proposed circuit suppresses the unwanted high-frequency components of the
46 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Frequency (GHz)
0.1 1 10 20
G
ai
n
(d
B
)
-50
-30
-10
0
10
-40
-20
Iss=140μA
Iss=50μA
Figure 3.22: LPF transfer function for diﬀerent gain settings in TT corner.
signal, and meanwhile isolates the I and Q sub-bands from each other. Fig. 3.23 shows
the half-circuit implementation of the proposed SCMF, and the baseband ampliﬁer units.
The timing diagram for I sub-band and the received data-stream are shown in Fig. 3.24.
The SCMF is designed such that the received data is multiplied by the orthogonal
basis functions that have been used in the transmitter, i.e., 90° phase-shifted square
waveforms. As shown in Fig. 3.23, M1 is on during {Φ1,Φ2} and also {Φ5,Φ6} time
intervals. Assuming that the sampling clock is properly aligned with the received data,
this circuit is able to detect I sub-band and suppress the Q sub-band. The input signal
(i.e., V +RF), which is in voltage domain, is sampled at the falling edge of the CK
+
I and
CKQ, and is held on the capacitor C2 in the upper and lower paths, respectively. As
shown in the timing diagram of Fig. 3.24, the Q sub-band is held as common-mode
signal on sampling capacitors while the I sub-band remains diﬀerential on V +K1 and V
−
K1.
Therefore, subtracting these voltage nodes yields to I sub-band recovery and Q sub-band
cancellation. At the end of this process, after the data is detected, the capacitors will be
discharged by a reset clock phase, CKRI, and made ready for the next phase.
Based on this operation and the timing diagram shown in Fig. 3.24, the output of SCMF
circuit at t = (n− 1/2)Tref (i.e., after being sampled by CKQ) can be approximated by
V +K1(n− 1/2) =VQ(n) + VI(n) (3.8a)
V −K1(n− 1/2) =VQ(n)− VI(n) (3.8b)
where VI(n) and VQ(n) represent the corresponding I and Q component of VRF, respectively.
Therefore, the output voltage of the ampliﬁer (shown in Fig. 3.17) can be expressed as
DIout(n− 1/2) = 2× gm7,8R1 × VI(n) (3.9)
3.3. Hybrid NRZ/MT signaling: Circuit Design 47
C1 : 11 fF
C2 : 50 fF
M1-6 : 1.16/0.04
M7-8 : 2.40/0.04
-200
-0.1
0.0
0.1
-0.2
0.2
Time (ps)
-100 0 100 200-200
-0.2
0.0
0.2
-0.4
0.4
Time (ps)
-100 0 100 200
A
m
pl
iti
ud
e
(V
)
-200
-0.1
0.0
0.1
-0.2
0.2
Time (ps)
-100 0 100 200
VK1 VK1VJ1 VJ1VRF VRF
C1 C2
C1 C2
VRF
Vref_mix
CKRI
Mixer LPF
CKI
CKI CKQ
CKQ
M1 M2
M3 M4
M5
M6
VJ1
VJ1
VK1
VK1
DIout
Amplifier
M7 M8
VK1 VK1
R1 R1
Figure 3.23: Half-circuit implementation of SCMF and baseband ampliﬁer units.
where gm7,8 is the transconductance of the M7,8 in Fig. 3.23. As (3.9) shows, the
Q sub-band is canceled and the I sub-band is recovered.
3.3.3 Clock Generation Unit
A diﬀerential source-synchronous architecture is employed for clock and data recovery
(CDR) owing to the fact that it inherently tracks the correlated jitter while it reduces the
complexity of CDR circuit [46]. The clock unit on transmitter and receiver includes a
multiplier delay locked-loop (MDLL) block, which generates eight equally spaced phases
from reference input clock, and it provides two orthogonal clock phases at twice of the
reference clock, nominally at 5 GHz. The received clock is forwarded from the TX side
48 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
n n n n n+1n
0 1 2 3 4 5 6 7 0
R T&H H H H R
t=(n-1)Tref t=nTref?T=50ps
CKRI
CKQ
CKI
CKI
VRF,I
VRF,Q nn-1 n n n nn-1
7 1
T&H T&H T&H
R Reset
T&H Track and hold
H Hold
VRF
???
Figure 3.24: The timing diagram of the proposed SCMF.
PFD/PD CP
VCDL
Edge Comb.
C1
LOI
LOQ
??
?out??
VC
??
??
Ckin (Fref)
CK Amp. (2×Fref)
Figure 3.25: Clock generation system block diagram.
and is a single-ended signal, which has a frequency in the range of 1.3-3.2 GHz and an
amplitude of approximately V DD/3.
Fig. 3.25 shows the main building blocks of the proposed MDLL-based CDR unit. Likewise,
in order to facilitate synchronous operation between the TX and RX, the clock unit has
3.4. Measurement Results 49
1.
3
m
m
2.8 mm
Figure 3.26: Chip die photo.
60
μm
150 μm
TRX2
TRX1
Test
chip
1.3 mm
2.
8
m
m
60
μm
85 μm
LVDSMDLL Mixer PRBS50 Ω
res.
CTLE
Amp. I
Amp. Q MDLL
SCMF
50 Ω
res.
BB
filter
RX TX
RX TX
Bias
Figure 3.27: Micrograph and layout of the chip. The die size is 1.3× 2.8 mm2. Two identical
TRXs are placed at the top and bottom of the chip.
the delay ﬁne-tuning input, Vc in Fig. 3.25, for the phase adjustment, which can delay
the input clock within a 100 ps interval and perform phase-alignment function. This
delay is manually swept based on the recovered I/Q sub-band signal quality. The design
and implementation of the clock generation unit is given in Chapter 4 in more detail.
3.4 Measurement Results
The TRX prototype is fabricated in a 40 nm GP 1-poly 10-metal bulk CMOS process,
with the core voltage of 0.9 V. The chip die photo is presented in Fig. 3.26, which
50 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
(a)
Time (ns)
(c)
-200
0
200
400
A
m
pl
itu
de
(m
V
)
6 7.2 8.4 9.6 10.8 12
post-cursors2ns
Frequency (GHz)
(b)
-60
M
ag
ni
tu
de
(d
B
)
0 2.4 4.8 7.2 9.6 12
-45
-30
-15
0
MDB Channel
TX
Ref.
Clock
RX
Figure 3.28: (a) Test setup with MDB channel. (b) Measured channel frequency response. (c)
Measured channel 7.5 Gb/s single-bit pulse response.
(b)
Meas. TXout spectrum
Meas. RXin spectrum
-80
-70
-60
-50
-40
O
ut
pu
tp
ow
er
(d
B
m
)
Frequency (GHz)
0 2.4 4.8 7.2 9.6 12
(a)
Fref =1.3 GHz
Fref =2.5 GHz
Fref =3.2 GHz
Frequency (GHz)
0 2.4 4.8 7.2 9.6 12-80
-70
-60
-50
-40
O
ut
pu
tp
ow
er
(d
B
m
)
Figure 3.29: (a) Measured TX output spectrum at diﬀerent Fref settings. (b) Measured
spectrum at the input of RX.
includes two independent TRXs and a test chip for the proposed MDLL. The die size
is 1.3 × 2.8 mm2 and all the RF pads has ESD protection circuit. The chip-on-board
package is used for testing the prototype. The chip micrograph and layout is shown in
Fig. 3.27, and includes two independent RX and TX circuits occupying 150× 60 μm2 and
85× 60 μm2, respectively. Fig. 3.28 (a) shows the test setup for the proposed NRZ/MT
TRX system. The reference FR-4 channel, 30 cm in length, exhibits frequency notches as
shown in Fig. 3.28 (b). These notches are introduced by two open stubs and can mimic
the frequency response of a MDB channel. For test purposes, using this structure we are
3.4. Measurement Results 51
DIout eye diagram
(b)
50 ps
30
m
V
DQout eye diagram
30
m
VVVVVVVVVVVVVVVVVVVVVVVVVVVVV
(a)
50 ps
DBBout eye diagram
50 ps
30
m
V
(c)
(d)
10-16
10-12
10-8
10-4
100
B
E
R
0.1 0.3 0.5-0.1-0.3-0.5
41% UI
40 ps
Rj = 9.9 ps
Dj =101 ps
Tj =234 ps@BER
1e-12
(e)
10-16
10-12
10-8
10-4
100
0.1 0.3 0.5-0.1-0.3-0.5
43% UI
Rj = 9.3 ps
Dj=99 ps
Tj =225 ps@BER
1e-12
40 ps
B
E
R
(f )
10-16
10-12
10-8
10-4
100
0.1 0.3 0.5-0.1-0.3-0.5
Clock phase (UI)
55% UI
40 ps
Rj = 8.2 ps
Dj=66 ps
Tj =179 ps@BER
1e-12
B
E
R
Figure 3.30: Measured RX eye diagram at 7.5 Gb/s data rate. (a) Q sub-band. (b) I sub-
band. (c) BB sub-band. Corresponding bathtub curve for (d) Q sub-band, (e) I sub-band, (f)
BB sub-band, each operates at 2.5 Gb/s data rate.
able to tune the frequency of the notches within a ±40% range. Fig. 3.28 (c) shows the
measured 7.5 Gb/s single-bit pulse response of this channel, illustrating an approximately
2-ns-long tail and severe post-cursors. The strong reﬂections highlight how challenging it
is to equalize such channels for conventional BB transceivers, as explained in Section 2 of
this chapter.
The measured spectrum of the TX for the maximum, minimum and nominal working
52 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
50 ps
30
m
V
DIout eye diagram
(c)
(a) (b)
10-16
10-12
10-8
10-4
100
B
E
R
0.1 0.3 0.5-0.1-0.3-0.5
Clock phase (UI)
22% UI
40 ps
(d)
Rj = 15.4 ps
Dj =94 ps
Tj =309 ps@BER 10-12
0.1 0.3 0.5-0.1-0.3-0.5
Clock phase (UI)
40 ps
Rj = 24.2 ps
Dj =68 ps
Tj =404 ps@BER 10-12
±5° Mismatch
DIout eye diagram
±10° Mismatch
Figure 3.31: Measured RX eye diagram sensitivity to phase-error. (a) ±5°, and (b) ±10° phase
mismatch.
reference frequency, is shown in Fig. 3.29 (a). Based on this measurement, the TX is able
to customize the frequency spectrum and the related sub-bands within a ±25% range
(i.e., the ﬁrst null frequency in the TX frequency spectrum can be changed between 1.3
and 3.2 GHz). The measured spectrum at the input of RX is shown in Fig. 3.29 (b) and
shows that the QPSK sub-band is more attenuated compared with the NRZ sub-band.
Based on this measurement, the received energy for the NRZ and the QPSK sub-bands
are -18.4 dBm and -23 dBm, respectively.
Fig. 3.30 (a), (b), and (c) show the measured eye diagram for Q, I, and BB sub-bands,
respectively, each having 2.5 Gb/s data rate. The bathtub for each of the Q, BB, and I
sub-bands is shown in Fig. 3.30(d), (e), and (f), respectively. The BB, Q, and I sub-bands
have 220 ps, 164 ps, and 172 ps horizontal margin at BER = 10−12, respectively. Each
of the received sub-bands has suﬃcient eye-opening to ensure a 40% unit-time-interval
horizontal eye-opening (referring to 2.5 Gb/s sub-band data rate) at BER = 10−12.
The optimum phase for the LO signals, which are applied to SCMF unit is adjusted by
3.4. Measurement Results 53
TX component Power (mW) 
Clock Unit 1.275 
LVDS summer/driver 2.4 
Mixer 0.225 
RX component Power (mW) 
Clock Unit 1.575 
Amplifiers 0.3 
SCMF 0.525 
AFE 1.125 
LVDS 32% Cl
oc
k
17
%
Mixer 3%
AFE
15%
SC
M
F
7%
A
m
p.
4%
Cl
oc
k
21
%
(a)
(b)
(c)
Figure 3.32: (a) Power breakdown for the whole TRX. (b) TX power speciﬁcation. (c) RX
power speciﬁcation.
10-12
10-4
10-2
100
B
E
R
0.1 0.3 0.5-0.1-0.3-0.5
Clock phase (UI)
40 ps
10-6
10-8
10-10
Fnotch=2.25 GHz
Fnotch=2.50 GHz
Fnotch=2.75 GHz
Figure 3.33: I sub-band bathtub curve for diﬀerent channel notches.
the delay ﬁne-tuning input. However, if there is a phase-error from this optimum value,
the horizontal eye-opening is degraded. Fig. 3.31 (a) and (b) illustrate the measured eye
diagram for the I sub-band with ±5◦ (i.e., 5.5 ps) and ±10◦ (i.e., 11 ps) phase mismatch,
respectively. The corresponding bathtub curves are shown in Fig. 3.31 (c) and (d) and
demonstrate 22% horizontal eye-opening at BER = 10−12 for a ±5◦ mismatch, while it
starts to be closed at BER = 10−12 for ±10◦ phase mismatch.
The TRX power breakdown for the total data rate of 7.5 Gb/s is shown is Fig. 3.23 (a).
54 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
Table 3.2: TRX Performance Comparison with State-of-the-art Memory Transceivers.
* Area and power are given for one controller unit.
** Core size area.
*** The numbers are reported for BB/RF bands, respectively.
Reference [48] [49] [12] [50] [51] This Work 
Technology 180 nm CMOS 
90 nm 
CMOS 
65 nm  
CMOS 
130 nm 
CMOS 
28 nm LP 
CMOS 
40 nm GP 
CMOS 
Area (mm2) 0.068 ** 0.225 0.14 ** 0.17 ** 0.28 * 0.015 ** 
Channel 
 
4" FR-4 
 
2" FR-4 4" FR-4 
MDB  
(8-drops) 
5" Nelco 
3.5" FR4+ 
DIMM 
connectors 
MDB  
(4-drops) 
12" FR4 
Data rate (Gb/s) 5 8 8.4 (4.6/3.8) *** 4.8 6.4 7.5 
Link power efficiency 
(pJ/bit) 20 4 2.5 (2.3/2.7)
 *** 14 9.1 * 1 
BER /  
Horizontal eye-opening 
10-12 / 
63% UI 
(126 ps) 
10-12 / 
NA 
10-15 / 
NA 
10-9 / 
73% UI 
(152 ps) 
10-10 / 
40% UI 
(62.5 ps) 
10-12 / 
43% UI 
(172 ps) 
Supply (V) 1.8 1.25 1 1.2 1 0.9 
Signaling / 
Architecture 
NRZ / 
FIR 
+CTLE 
NRZ / 
CTLE 
BB+RF 
(23 GHz 
carrier) / 
BPF+LPF 
NRZ / 
FIR+ 
CTLE+ 
DFE 
NRZ / 
CTLE 
+1-tap 
DFE 
BB+I/Q 
(5GHz carrier) 
/ 
LPF+ 
LPF+SCMF 
TX output swing (mVpp) NA 200 350/1000 *** NA 285 280 
The whole chip consumes 7.5 mW from a 0.9 V power supply at this data rate, leading
to 1 pJ/b link eﬃciency over the MDB channel interface. This includes the power
consumption of TX (MDLL, mixers and LVDS driver) and RX (BPF, LPF, SCMF and
MDLL). The consumption of the built-in PRBS15 generator and the I/O buﬀers that
drive the measurement equipments are excluded from this calculation. The TX and RX
consume 52% and 48% of the total power, respectively. The TX and RX circuit power
consumption is shown in Fig. 3.23 (b) and (c), respectively.
The TRX has been designed such that the output spectrum can be well adapted to
the channel frequency characteristic. It is also able to tolerate the mismatch in the
channel notch frequency, while the frequency plan is kept unchanged (i.e., the ﬁrst notch
is not at 2.5 GHz while Fref = 2.5 GHz). To measure the TRX tolerance, the total
data rate is maintained at 7.5 Gb/s (2.5 Gb/s at each sub-band), while the ﬁrst notch
frequency is changed from its nominal value at 2.5 GHz by changing the stub length.
3.4. Measurement Results 55
Table 3.3: Silicon Performance Summary.
Technology 40 nm CMOS GP 1P10M 
Power supply 0.9 V 
Package Wire bond (1.2-1.8 mm length), COB 
Channel MDB (45dB@2.5GHz, 13dB@5GHz ) 
Pad capacitance (including ESD) 250 fF @ RF pad 
Data rate 5.5-8 Gb/s (2.75-4 Gb/s/pin) 
RX/TX link power efficiency 0.48/0.52 pJ/b @ 7.5Gb/s 
Horizontal eye-opening @BER 10-12 43% UI 
Architecture NRZ/multi-tone TRX  (BB / I / Q sub-bands) 
RMS jitter @ RX output 9.2 ps 
TX output swing 280mVpp (70mVrms) 
TX/RX core area 85×60 μm2 /150×60 μm2 
CDR architecture Source synchronous  with fine delay adjustment on RX 
Fig. 3.33 shows the I sub-band bathtub for diﬀerent channel notches and shows that the
eye-opening at BER = 10−12 is 27% UI (108 ps) and 30% UI (120 ps) for Fnotch located
at 2.25 GHz and 2.75 GHz, respectively. This measurement shows that the horizontal
eye-opening degradation is less for the NRZ sub-band than for the QPSK sub-band. The
NRZ sub-band demonstrates an approximately 6% UI degradation at BER = 10−12 due
to the mismatch between the TX spectrum and the channel notches.
Table 3.2 summarizes the silicon performance comparison with state-of-the-art memory
interface transceivers [48, 49, 12, 50, 51]. Compared to the other works, the proposed
TRX has the link power eﬃciency of 1 pJ/b, while it operates over a multi-drop memory
interface. The horizontal eye-opening is at least 164 ps for each sub-band at BER = 10−12.
The total bit-stream is divided into 3 sub-bands, each of which therefore operate at one
third of the total bit rate. The horizontal eye margin thus beneﬁts from this fact and the
link becomes less sensitive to clock jitter and other non-idealities. The silicon performance
summary is given in Table 3.3.
56 Chapter 3. Hybrid NRZ/Multi-Tone Signaling
3.5 Conclusion
It has been shown that hybrid NRZ/MT signaling can be used to implement low-power
and compact serial data transceivers for multi-drop channels. Using careful frequency
planning in addition to discrete-time signal conditioning, a 7.5 Gb/s transceiver with
1 mW/Gb/s energy eﬃciency has been designed and fabricated in a bulk 40 nm CMOS
technology. The measurement results conﬁrm that NRZ/MT serial data TRX can oﬀer
an energy-eﬃcient architecture over MDB interfaces. Choosing the proper sub-channels,
the linearity requirement of the TX is relaxed and the summer and the output driver can
be realized by a LVDS-type output driver. Moreover, by avoiding the channel frequency
notches, there is no need to use DFE on the RX side and the LPF, BPF, and SCMF can
satisfy the equalization requirements. Hence, signiﬁcant power-saving on the RX side is
achieved, leading to 1 pJ/b link eﬃciency in the MDB channel interface.
4 Multi-Phase Clock Genaration for
Hybrid NRZ/MT Transciever
In this chapter we present the design and implementation of a multiplying delay-locked
loop (MDLL), which can be used as clock and data recovery (CDR) unit in source-
synchronous wireline communications. The MDLL doubles the reference frequency and
delivers diﬀerential in-phase (I) and quadrature (Q) clocks by generating 8 equally spaced
clock phases and combining these phases appropriately. The clock generation scheme
can be employed to perform the CDR function in the NRZ/MT transceiver, which is
introduced in the Chapter 3. To improve the system jitter performance, a technique for
reducing deterministic jitter (DJ) in MDLL is proposed. The prototype of the proposed
clock generation unit is implemented, as an independent block, in 40 nm bulk CMOS
process, and it is also used in the proposed NRZ/MT TRX. The prototype dissipates
1.1-1.8 mW over output frequency range of 2.6-6.4 GHz, while the RMS jitter and
I/Q mismatch remain below 3 psrms and ±5◦, respectively, over the entire range. The
core size occupies 60× 40 μm2 silicon area.
4.1 MDLL-Based Clock and Data Recovery: Overview
The continuous growth of wireline data communications has driven the link speeds beyond
10 Gb/s. The data in multi-lane high-speed wireline communication systems are distorted
by both external and internal noise during transmission, which leads to attenuation, jitter,
and skew in the received data. In this paradigm, many emerging I/O standards employ
source-synchronous scheme because of its inherent jitter tracking property [52, 53].
The DLL-based CDR topology is a preferable choice for source-synchronous high-speed
links since it does not have the jitter accumulation issue and it consumes less power
comparing to PLL-based CDR architecture [52]. The primary drawback of DLL-based
CDR is its limit to synthesize frequency. Hence, the clean-up PLL should generate the
57
58 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
PDF/CPLFVCO
Clean-up PLL
Fforwarded
PFD/PD CP VCDL
Edge Comb.
C1
LOI
LOQ
??
?out
??
VC ??
??
VC
?in
CKref
Figure 4.1: Proposed MDLL-based CDR architecture.
full-rate clock and distribute it to all receiver banks. This leads to a considerable power
dissipation. However, by distributing half-rate clock from central PLL and multiplying it
adequately in each RX bank one can save signiﬁcant power dissipation in clock tree. For
this purpose, the multiplier DLL (MDLL) based CDR is proposed in Fig. 4.1. Comparing
to DLL-based CDR topology the total dynamic power consumption in the clock tree is
halved for a given data rate. Moreover, the MDLL-based clock generation shows a better
jitter performance since the accumulation of thermal-noise-induced timing uncertainties
occurs only within one period of the crystal oscillator [54]. Given the same power budget
for both PLL and MDLL-based CDRs, the overall jitter performance of the MDLL-based
CDR architecture can be reasonably improved.
In the remainder of this Chapter, the design and implementation of low-power MDLL-
based CDR unit is presented (highlighted in Fig. 4.1), which can be used in serial data
transceivers that employ half-rate or multi-tone architecture. Moreover, the proposed
MDLL has a delay ﬁne tunning input, which can properly adjust the clock phase without
adding signiﬁcant jitter. This property facilitates employment of the proposed MDLL for
CDR application in the NRZ/multi-tone receivers [24, 26]. This Chapter continues with
the analysis and design of the MDLL building blocks in Section 4.2. The measurement
and the conclusion are presented in Section 4.3 and 4.4, respectively.
4.2. MDLL Circuit Design 59
R2
M1
M2
M4
M3
C1
INV1
CKref +
-
Vc
C
K
ou
t
VDD
+VDD/3
0
VDDVb
VDD
R1
-VDD/3
Figure 4.2: Input clock buﬀer and single-to-diﬀerential stage schematic.
4.2 MDLL Circuit Design
Fig. 4.1 shows the main building blocks of the proposed MDLL-based CDR unit. The
input reference clock (i.e., CKref) is a single-ended signal that has a nominal frequency of
2.5 GHz and can vary within ±25% range, i.e., from 1.3 to 3.2 GHz. The forwarded clock
should be ampliﬁed, converted to diﬀerential clock, delayed, and multiplied appropriately.
Considering these criteria, the design of the required blocks is presented in this section.
4.2.1 Input Clock Buﬀer
A single-ended sinusoid signal is applied to the MDLL clock buﬀer with an amplitude
around VDD/3, which can be attenuated due to the long clock tree length. The clock
buﬀer should convert this reference clock to a diﬀerential square-wave with sharp and
rail-to-rail edges. Fig. 4.2 shows the proposed circuit for this block, which can properly
operate in diﬀerent process-voltage-temperature (PVT) corners. This circuit can generate
a 50% square-wave from the sinusoidal input in PVT corners and prevents additional
spur and jitter generation. The transistors M1 and M2 amplify the input clock and make
it a rail-to-rail signal. The biasing point of the this resistive feedback ampliﬁer is deﬁned
by the feedback loop that consists of M1, M2, and R2. In the PVT corners the bias
point, Vb in Fig. 4.2, can change and adapt itself to the transition voltage accordingly.
Therefore, the succeeding stage receives a 50% duty cycle clock. The second stage in
Fig. 4.2, marked with dashed line, is the single-ended-to-diﬀerential converter stage. A
transmission gate, consisting of M3 and M4, is used to balance the skew between inverted
and non-inverted clocks. The cross-coupled inverters couple the diﬀerential output signals
together and keep the 50% clock in process corners. This block has an input for delay
ﬁne tuning, Vc in Fig. 4.2, which can properly delay the received forwarded clock within
a 100 ps interval. The transistors are sized properly to provide a 20%-80% rise and fall
time of 5 ps for the succeeding stages.
60 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
DN
UP
TD
VDD
VDD
en en
en
?out
??
VDD VDD
Figure 4.3: The schemtic of the PFD circuit.
4.2.2 Phase-Frequency Detector (PFD)
In the MDLL block diagram, shown in Fig. 4.1, the DLL loop input (i.e., CKref) and
output frequency (i.e., Φ1−8) have the same value since the DLL loop just delays the input
reference and does not generate new frequencies. Therefore, in theory it is possible to use
a simple phase-detector instead of PFD, hence, use a simpler circuit. However, the phase
detector needs to distinguish not only the absolute phase diﬀerence, but also the phase
relationship (i.e., lag and lead phases) to remove the static phase error. Hence, in our
application the PFD is preferred since it can detect the phase relationship and removes
the ﬁnite static phase error more eﬀectively [54], [55]. Fig. 4.3 presents the schematic of
the PFD, which employs TSPC logic for high-speed operation. In this circuit, transistors
M1 −M3 are used to initialize UP/DN signals such that in the starting point the DLL
loop has the highest speed (i.e., the minimum line delay), thus, it can track the input
phase by increasing the delay properly. The en signal in Fig. 4.3 is an active high signal
and is low just in the beginning of lock process. This initialization helps the overall MDLL
to have a better lock range and reaches a reasonable lock time in PVT corners [27].
In the PDF circuit, the static phase error between the PFD inputs, commonly referred as
dead-zone, translates to spurious tones with the rate of Fref in the lock condition. To
alleviate this problem, a delay element can be added to the reset path of PFD circuit
( shown in Fig. 4.4) for increasing both UP/DN pulse widths by the same amount of TD
and leave a suﬃcient time interval for the PFD to response. However, due to imperfection
4.2. MDLL Circuit Design 61
UP
DN C1
Ip
In
UP
DNVc
Tidle
Toffset
Vc
t
(a) (b)
Figure 4.4: (a) CP conceptual schematic. (b) Eﬀect of mismatch on the control voltage.
of the preceding charge pump (CP) circuit, the added delay should be kept as small as
possible to minimize the undesired eﬀects of circuit mismatch and random noise. More
discussion and also an eﬀective technique for reducing CP mismatch is given in next
subsection.
4.2.3 Charge Pump (CP)
Fig. 4.4 (a) shows the conceptual schematic for a charge pump. The UP/DN signals
generated by the PFD are used to switch the current sources (i.e., Ip and In) into the
loop ﬁlter capacitor, i.e., C1. During the idle interval in the lock condition, which is
inevitable because of PFD response time, the switches S1 and S2 are commutating
simultaneously. As a consequence, due to the transistor channel length modulation, an
inevitable systematic mismatch between charging (discharging) current is induced into
the CP circuit. This non-ideality causes the loop capacitor to be charged (discharged)
during the idle interval and forces the overall DLL loop to compensate the introduced
mismatch with a static phase error on the recovered clock, i.e., Φout in Fig. 4.3. The
static delay can be calculated as
Toﬀset =
ΔI · Tidle
(In + Ip)/2
(4.1)
where Toﬀset and Tidle are the static time oﬀset and idle time, respectively, shown in
Fig. 4.4(b). In (4.1) the ΔI/(In+Ip) ratio is constant regardless of absolute Ip,n values [55],
and it can be in order of 10% for deep sub-micron technologies. The conventional cascoding
techniques for decreasing the channel length modulation are not applicable here due to
limited core voltage headroom, i.e., 0.9 V in 40 nm technology. Likewise, due to the
variation of the control voltage (i.e., Vc) in diﬀerent PVT corners, the jitter performance
62 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
UP DN
M1 M2
M3 M4
M5 M6
M7
M9
M8
M10
UPDN
Iss
Ip
In
Vc
C1
X Y
VDDVDD
Figure 4.5: Proposed CP schematic with reduced mismatch.
of the DLL can be exacerbated.
The voltage ripple on Vc causes spurious tones at Fref , which translates to deterministic
jitter (DJ) as [56]
DJout =
2
π
TOUT × 10Spur(dBc)/20 (4.2)
where DJout is peak-to-peak induced DJ, TOUT is the period of the output clock, and
Spur(dBc) is the relative power diﬀerence between the output clock and the spurious tone.
To reduce this unfavorable eﬀect in low-voltage regime, the CP in Fig. 4.5 is proposed.
In this circuit, M3, M4, and the NAND gate perform the mismatch reduction function.
When only one of the UP and DN signals is high, M3 and M4 are oﬀ and the circuit does
its normal operation to charge or discharge C1. When both UP and DN signals are high
at the same time, which means the idle interval, the NAND gate turns on both M3 and
M4 and forces the node voltages X and Y to VDD. As a result, M8 and M10 turn oﬀ and
the output stage stop charging (discharging) the loop ﬁlter capacitor. This can eﬀectively
reduce the ripple caused by CP mismatch on the VCDL control voltage. The reduced
static delay can be calculated as
Toﬀset,new =
Tcomp
Tidle
· Toﬀset (4.3)
where Tidle is the idle interval, Tcomp is the compensating circuit response time, and
Toﬀset is the static oﬀset before compensation. The ratio of (Tcomp/Tidle) can be in the
4.2. MDLL Circuit Design 63
D1 D 2 D3 D4 D5 D6
CKin
?5
out
M5 M6
M8M7
M1
M2
M3
M4
Vcp
Vcn
(a)
(b)
?2 ?7 ?4
?1 ?6 ?3 ?8
?out
?out
VDD
Figure 4.6: (a) Schematic of the VCDL. (b) Schematic of the D-cell.
order of 4 to 5, which indeed reduces the static oﬀset, thus, the jitter performance can
be improved. The Simulation result shows that turning on the mismatch compensation
circuit (highlighted in Fig. 4.5) can reduce the spurs at Fref from -21 dBc to -33 dBc,
which translates to 7 pspp jitter reduction for 6 GHz output.
4.2.4 Voltgae-Controlled Delay Line (VCDL)
4.2.4.1 Delay cell (D-cell)
To double the input clock frequency and generate diﬀerential I/Q clocks at 2× Fref ,
8 diﬀerent phases at Fref are required. This can be realized by 4 diﬀerential delay cells
(D-cells), as shown in Fig. 4.6 (a). In this circuit, the symmetry between the core D-cells is
the key point to guarantee the desirable performance of the MDLL in PVT corners, thus,
prevent static phase error. Hence, the input clock, coming from the clock buﬀer shown
in Fig. 4.2, ﬁrst goes through the D1 to generate a diﬀerential reference that has the
same loading as the other outputs. In this circuit D6 is also used to keep the symmetry
between the core D-cells. This ensures identical loading on all the outputs and improves
matching.
64 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
M1 M2
Iss
Vc Vb1
R1 R2
Vb3
C1
C2
Vcp
Vcn
M3 M4
M6M5
M7 M8
M10M9
M11 M12
M13 M14
Vb2
VDD
VDD
0
40
80
120
160
200
0.1 0.3 0.5 0.7
Vc (V)
D
el
ay
(p
s)
SS
TT
FF
(b)(a)
Figure 4.7: (a) Schematic of the proposed V-I converter. (b) D-cell tunning curve using the
proposed V-I convertor.
The D-cells employ diﬀerential current starved push-pull topology as shown in Fig. 4.6 (b).
In this circuit, the current of M1-M4 is adjusted by the current source transistors, M5-M8,
whereas the cross-coupled inverters pairs the quasi-diﬀerential output signals and play as
a symmetrizer, thereby improving the duty cycle.
4.2.4.2 V-to-I converter
In the D-cell circuit, shown in Fig. 4.6 (b), mismatch is dominated by tail current
transistors, M5−8, since the mismatch of the inverter devices is degenerated by the output
resistance of the tail devices. The currents of M5,6 should well match to those of M7,8
to prevent systematic mismatch as well. In order to improve the matching and provide
suﬃcient current to facilitate fast lock at 2.5 GHz, tail devices should operate in strong
inversion regime. Therefore, low-voltage cascode current mirror topology is proposed
in Fig. 4.7 (a) as bias circuit to provide suﬃcient current matching. In this circuit, R1
and R2 are degeneration resistors, which reduce the corner variation and improve the
linearity of the input diﬀerential pairs, i.e., M1 and M2. Moreover, C1 and C2 (65 fF
MOS capacitor at VGS = 0.5 V ) can ﬁlter the noise generated by V-to-I converter and
reduce the jitter. Fig. 4.7 (b) illustrates the D-cell tunning curve in diﬀerent PVT corners
using the V-I converter of Fig. 4.7 (a). Sizing the transistors properly, the delay can be
adjust within the range of 30-60 ps in all PVT corners, hence, the DLL locking condition
is guaranteed.
4.2. MDLL Circuit Design 65
(a)
A B
O
VDD
out5 (out6)
LOI
(LOQ)
out7 (out8)
out1 (out2)
out3 (out4)
(b)
Figure 4.8: (a) Edge combiner circuit. (b) Symmetrical NAND gate.
4.2.4.3 Edge combiner
The edge combiner circuit shown in Fig. 4.8 (a), is implemented by exclusive addition
(XOR) of consecutive phases. The edge combiner should have matched loading eﬀect on
all its inputs to avoid additional jitter on the outputs. The NAND gates are designed
fully symmetrical to meet this requirement as shown in Fig. 4.8 (b).
4.2.5 Loop Filter Design
The proposed MDLL (shown in Fig. 4.1) is a type-I DLL, and the loop ﬁlter consists
of capacitor C1 that is connected to the CP output. Although the DLL loop can
be fairly modeled in the s-domain 1 (i.e., continuous-time domain) the system should
be modeled in the discrete-time domain for analyzing the loop behavior and stability
more precisely [57, 58]. This can provide a better insight in the DLL dynamic, hence,
prevents undesired long settling-time or unwanted oscillation, especially, in PVT corners.
Fig. 4.9 (a) represents the closed-loop discrete-time model for analyzing the DLL dynamic
behavior [57]. Based on this model, the phase transfer function (i.e., Φout/Φin) can be
expressed as
Φout
Φin
=
KCP ·KVCDL
1− (1−KCP ·KVCDL)z−1 (4.4)
where KCP = ICP/C1 is the CP gain assuming Ip = In = ICP, and KVCDL represents the
linearized VCDL gain around the VCDL operating control voltage. The transfer function
in (4.4) contains a pole at (1−KCPKVCDL), thus, the open-loop gain needs to meet the
1This estimation is only correct if the DLL loop bandwidth is much narrower than the input reference
frequency.
66 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
11
CPK
z??
1z?
in? out?
VCDLK
(a) (b)
0 0.2 0.4 0.6 0.8 1
VGS (V)
I g
(u
A
)
0
2
4
6
8
SS
TT
FF
Figure 4.9: (a) DLL discrete-time model. (b) Gate leakage current for 0.6 pF MOS capacitor
in 40 nm.
following constraint in the all PVT corners to guarantee the loop stability [58]
0 ≤ KCP ·KVCDL ≤ 2. (4.5)
Likewise, the minimum loop gain (i.e., KCP ·KVCDL) to ensure that the DLL loop settles
within a n/Fref time interval by a 99% accuracy, can be calculated as [59]
KCP ·KVCDL = 1− exp
(
1
n+ 1
ln(0.01)
)
. (4.6)
Replacing n by 20 in (4.6) to yield a maximum 8 ns lock-time, the loop gain can be
estimated as
KCP ·KVCDL = 0.197 . (4.7)
The estimated loop gain in (4.7) satisﬁes (4.5), thereby, the DLL remains stable in the
PVT corners. Having (4.7), the value of the loop ﬁlter capacitor (i.e., C1 in Fig. 4.1)
should be calculated with the worsen design parameters in the PVT corners. In the FF
corner, ICP ≈ 400 μA, and KVCDL ≈ 0.5 ns/V, hence, C1 ≈ 1 pF. This quite large
capacitor can be realized either by a MOS capacitor, or by a metal-oxide-metal capacitor
(MOM capacitor). Although the former oﬀers an eﬃcient die area, it is not used in our
design due to gate leakage issue in deep sub-micron regime. The MOS capacitor gate
leakage current in 40 nm CMOS technology is depicted in Fig. 4.9 (b) for a 60μm/0.5μm
low-voltage n-type device (600 fF at VGS = 0.6). This current can discharge the loop
capacitor and introduces undesired voltage ripple on the VCDL control voltage. Therefore,
the MOM capacitor is employed in the loop ﬁlter, which occupies a 20× 14 μm2 die area
using inter-digitated MOM capacitor structure. Applying all available metal layers (i.e.,
M1−8) in the MOM capacitor, the occupied die area is comparable to the MOS capacitor
counterpart.
4.3. Measurement results 67
MDLL
60 μm
40
μm
Cap
CapCap
DLL
EC
CK Amp.
Figure 4.10: Micrograph and layout of the test chip.
10ps
25
m
V
2k#
2°
1.1°[RMS]
9.2°[pk-pk]
800k Hits
80
90
100
Ph
as
e 
(d
eg
re
e)
(a) (b)
Figure 4.11: (a) Measured I/Q clocks at 5 GHz output. (b) Measured long-term I/Q phase
mismatch at 5 GHz output.
4.3 Measurement results
The proposed MDLL is designed and fabricated in 40 nm bulk CMOS technology. Fig. 4.10
shows the die micrograph of this circuit. The core occupies 60 × 40 μm2 silicon area
while 25% of this area is used for loop-ﬁlter capacitor. The quadrature I/Q clocks at
5 GHz are shown in Fig. 4.11 (a). The measured phase mismatch over 800 k hits, shown
in Fig. 4.11 (b), is 9.2° peak-to-peak.
The jitter accumulation is investigated by measuring long-term jitter. Fig. 4.12 (a) shows
the measured jitter histogram at 6 GHz output frequency. The output jitter is about
1.3 psrms and 13.3 pspp over 800 k hits.
The measured peak-to-peak jitter degradation due to external supply noise, calculated
68 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
(a) (b)
4.8ps
5m
V
1.29ps[RMS]
13.25ps[pk-pk]
800k Hits
4.8ps
5m
V
3.08ps[RMS]
19.87ps[pk-pk]
800k Hits
Figure 4.12: (a) Measured long-term jitter histogram at 6 GHz MDLL output. (b) Measured
long-term jitter histogram at 6 GHz MDLL output with 200 mVpp external supply noise at
250 MHz (worst case) noise frequency.
Supply noise frequency (MHz)
Ji
tte
rd
eg
ra
da
tio
n
(p
s p
p)
12
16
8
4
0
1 10 100 1000
1 10 100 1000
Reference noise offset frequency (MHz)
Ref. noise
Supply noise
Figure 4.13: Measured peak-to-peak jitter degradation versus supply noise and reference oﬀset
noise frequency for 6 GHz output. The supply has 200 mVpp sinusoidal noise. The reference
clock has -28 dBc single-tone sideband.
by subtracting the jitter in absence of supply noise, is shown in Fig. 4.13. A 200 mVpp
sinusoidal signal is added to MDLL supply and the noise frequency is swept from 1 MHz to
700 MHz. The maximum jitter degradation is 6.6 ps at 250 MHz noise frequency resulting
in the supply noise sensitivity of 33 fs/mV. Fig. 4.12 (b) shows the jitter histogram at
6 GHz output in the worst case noise frequency, i.e., 250 MHz.
By adding a single tone spur to the phase of the reference clock (i.e. single-tone sideband
with -28 dBc relative magnitude on the reference clock) and sweeping its frequency from
1 MHz to 700 MHz, the jitter transfer characteristics of the circuit is measured in Fig. 4.13.
It shows the peaking of the jitter transfer function occurs at 250 MHz oﬀset frequency.
Fig. 4.11 (a) shows the measured phase noise at 6 GHz output frequency. The measured
4.3. Measurement results 69
(a)
Frequency offset (MHz)
0.1 1 10 100 1000
Reference noise
MDLL noise
-135
-125
-115
-105
-95
-85
P
ha
se
 n
oi
se
 (d
B
c/
H
z)
Jitterrms: 1.17 (100kHz-3GHz)
-106dBc/Hz @ 1MHz
(b)
0
-20
-40
-60
-80
Ref. spur : -31dBc Noise spur : -35dBc
DJ1=1.9psppDJ2=3pspp
DJtotal  < DJ1+DJ2=4.9pspp
700MHz
P
ow
er
 (d
B
m
)
Figure 4.14: (a) Measured phase noise at 6 GHz output. (b) Measured MDLL reference and
spurs with 200 mVpp supply noise at jitter-peaking frequency.
phase noise at 1 MHz and 250 MHz oﬀset is nearly -106 dBc/Hz and -105 dBc/Hz,
respectively. It is interesting to note that phase noise also shows a peaking at 250 MHz,
very similar to the results shown in Fig. 4.13. For measuring the eﬀect of jitter peaking
on the deterministic jitter (DJ) generation, a 250 MHz sinusoidal signal with 200 mVpp is
added to MDLL supply and the measured spectrum is shown in Fig. 4.14(b). The reference
spur and the noise spur is -31 dB and -35 dB which translates to 2.9 ps (DJ2) and 1.9 ps
(DJ1) deterministic jitter, respectively, using (4.2). Having DJtotal < DJ1 +DJ2 [56], this
measurement shows that the maximum DJ remains under 5 ps, which is suﬃcient for
70 Chapter 4. Multi-Phase Clock Genaration for Hybrid NRZ/MT Transciever
Table 4.1: Comparison of MDLL performance for frequency generation.
Ref. [56] [60] [61] This Work
Technology 0.13 μm
CMOS
90 nm
CMOS
90 nm
CMOS
40 nm
CMOS
Supply (V) 1.1 1.0 1.2 0.9
Power (mW/GHz) 0.6 1.125 3.7 0.28b
Fref (GHz) 0.375 2∼5 0.225∼0.9 1.3∼3.2
Fout (GHz) 1.5 40 0.45∼5.4 2.6∼6.4
Multiplying Factor 4 9 2, 3, 6 2
Jitter RMS/PP (ps) 0.9/9.2 0.87/7.56a NA 1.3/13.25b
Supply Sen. (fs/mV) 20 NA NA 33
Active area (mm2) 0.25 0.122 0.04 0.0024
a For 5 GHz input.
b For 6 GHz output.
source-synchronous serial link applications [24].
The power consumption of the core system is 1.1-1.8 mW for the output frequency range
of 2.6-6.4 GHz. All measurement results are in good agreement with simulation results
and in general satisfy the speciﬁcation requirement of the system. Table 4.1 compares
the measured performance of the fabricated DLL-based frequency multiplier with some
published works.
4.4 Conclusion
This chapter presents the design and implementation of a DLL-based frequency multiplier
circuit in 40 nm bulk CMOS. The proposed circuit adequately reduces the complexity
and power dissipation of CDR unit in wireline transceivers, while satisfying the required
jitter and phase mismatch speciﬁcations. New strategies are proposed to satisfy these
speciﬁcations, while they fulﬁll the stringent power consumption requirements in wireline
links. The RMS jitter remains below 3 ps over the entire input frequency range. A low-
voltage technique is proposed to improve the mismatch eﬀect in CP circuit and reduces
the deterministic jitter. Based on the measurement results, the power consumption
does not exceed 1.8 mW over the entire input frequency range results in 280 fJ energy
consumption for 6 GHz output.
5 Hybrid NRZ/MT Signaling for
Controlling ISI and Crosstalk in
Dense Interconnects
This Chapter studies the properties of multi-tone signaling for controlling the eﬀect of
crosstalk in high-density and compact links constructed using low-cost material such
as FR-4. As the distance between the lanes has to diminish in order to allow for more
density, crosstalk turns out to have more and more severe eﬀects on performance of the
system. We will exploit the properties of multi-tone signaling, especially orthogonality
amongst diﬀerent sub-bands, to reduce the eﬀect of crosstalk. A four-channel transceiver
has been implemented in a standard CMOS 40 nm technology in order to demonstrate
the performance of NRZ/MT signaling in presence of high channel loss and strong
crosstalk [26].
To further study NRZ/multi-tone (NRZ/MT) signaling and improve the data rate in
dense memory interconnects, ﬁrst, we have implemented four closely spaced transceivers
on each die. Likewise, the circuit design of some key building-blocks have been modiﬁed
in order to improve the performance, and provide 20% higher date rate per pin. Moreover,
the crosstalk is studied and analyzed for the NRZ/MT signaling and the silicon results
proved its eﬃciency of the proposed scheme in controlling the crosstalk induced noise.
Furthermore, the design ﬂow and measurement results of a low-power 4-channel hybrid
NRZ/MT transceiver for multi-drop bus (MDB) memory interfaces in 40 nm CMOS
technology has been presented in this Chapter. The proposed system achieves 1 pJ/bit
power eﬃciency, while communicating over an MDB channel with 45 dB loss at 3 GHz.
The multi-tone (MT) nature of the proposed transceiver helps to control the intersymbol
interference (ISI) and reduce the far-end crosstalk (FEXT), which results in a very
71
72
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
energy-eﬃcient implementation. The core size area is 80× 60 μm2 and 130× 60 μm2 for
the TX and RX blocks (including the clock unit), respectively.
The remainder of this Chapter is organized as follows. Section 5.1 studies the perfor-
mance of NRZ/MT signaling in presence of channel loss and far-end crosstalk (FEXT).
Sections 5.2 and 5.3 explain the system-level and circuit-level techniques for realizing an
9 Gb/s/lane mixed NRZ/MT serial data transceiver, respectively. Section 5.4 presents
the experimental data for the proposed multi-channel transceiver operating at aggregated
36 Gb/s data rate over a multi-drop bus (MDB) channel, and demonstrates the eﬃciency
of the proposed ISI/FEXT reduction schemes at low energy cost. Section 5.5 summarizes
this Chapter.
5.1 Analysis of ISI and FEXT for BB and NRZ/MT Signaling
The challenges in design of high data rate links for memory interfaces in presence of ISI
and FEXT are studied in this section. Meanwhile, it is shown how NRZ/MT signaling
can help to control and reduce the eﬀect of these types of imperfections.
5.1.1 Signal Integrity in Dense Interconnects: Introduction
Overcoming the limitations of data rate and power eﬃciency in wireline serial data
transceivers, data communication have emerged as some of the major challenges to
improve speed and performance in modern computing systems. While industry continues
the demand for higher data transfer speed over low-cost, high-loss channels, more enhanced
communication and circuit techniques are needed to satisfy the required speciﬁcations.
Moving toward higher data rates over denser communication channels exacerbates both
crosstalk and intersymbol interference (ISI). As a consequence, modern serial data
transceivers are equipped with very sophisticated equalizers and crosstalk cancellation
units [62, 63, 64, 65], which in return increase system complexity and energy consumption.
Therefore, due to the power overhead that these techniques add to the transceiver,
they could not adapt well to the I/O systems where the power eﬃciency is the main
concern [66]. To alleviate the design challenges and implement a low-power link, more
advanced signaling methods can be employed [12, 26, 11, 25, 67, 68, 69]. As shown in [24],
hybrid NRZ/multi-tone (MT) signaling can be used to relax the equalization requirements
and implement a very energy eﬃcient transceiver. In this signaling method, the spectrum
of the transmitted signal has been shaped such that energy loss due to frequency domain
notches of the channel is minimized. Therefore, simple continuous-time linear equalization
(CTLE) is suﬃcient to reconstruct the received signal and achieve 1 pJ/b link energy
eﬃciency.
5.1. Analysis of ISI and FEXT for BB and NRZ/MT Signaling 73
Channel 1
TX1 RX1
D1
??1
D2
TX2 RX2
D2
??2
D3
TX3 RX3
D3
??3
D4
TX4 RX4
D4
??4
10101
CK Amp.
Channel 2
Channel 3
Channel 4
D1
Figure 5.1: Overall block diagram of the 4-channel TRX.
5.1.2 Channel Conﬁguration and BB signaling
The block diagram of the 4-channel transceiver (TRX) arrangement to be studied in this
work is shown in Fig. 5.1, with four identical TX/RX pairs implemented side-by-side,
in close proximity. The source-synchronous architecture is employed for this wireline
interface.
Fig. 5.2 (a) shows a stylized view of the diﬀerential parallel multi-lane channel, which is
used in our link design. Fabricated on FR-4 substrate, this memory interface consists of
four diﬀerential lanes in total, each with 30 cm length and 3 mm width, and with the
channel spacing of W=3 mm within and between the bundles. Moreover, dual-in-line
memory module (DIMM) is employed for communicating between the memory controller
and DRAMs. Fig. 5.2 (b) shows the simpliﬁed block diagram of a DIMM interface where
the controller is writing to one of the DRAMs on the Rank1. Here, while the controller
communicates to DRAM5, the electrical length-diﬀerences between the controller and
74
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
(a)
(b)
1
1
( )
2notch
F
L
? ???? ?
DRAM1
(Rank0)
DRAM5
(Rank1)
DRAM2
(Rank0)
DRAM6
(Rank1)
DRAM3
(Rank0)
DRAM7
(Rank1)
DRAM4
(Rank0)
DRAM8
(Rank1)
Controller
L
1.6mmFR-4
W
W W W
+C
h 1
-C
h 1
+C
h 2
-C
h 2
Ground Plane
W W
W=3mm
W
W W
+C
h 3
-C
h 3
+C
h 4
-C
h 4W
WW W
WW
Figure 5.2: (a) Stylized view of the 4 diﬀerential channels, side-by-side. (b) Simpliﬁed block
diagram of a DIMM interface showing the multi-path fading.
the DRAM paths (i.e., ΔL1), and the phase shift, caused by Γ1, lead to multi-path
reﬂections [41]. These can cause destructive superposition at certain frequencies and
create notches in the channel frequency response. Employing NRZ signaling, the received
power spectrum in the absence of aggressor transmitters can be expressed as [43]
Sin(ω) = Tb|H(ω)|2
⎡
⎣sin
(
ωTb
2
)
(
ωTb
2
)
⎤
⎦
2
(5.1)
where Tb is the bit period, and H(ω) is the channel frequency response. This equation
shows that 90% of the received power is located below ω = 2π/Tb and 77% of which is
below the Nyquist rate, i.e., ω = π/Tb. Hence, if H(ω) bears a notch below the Nyquist
rate a signiﬁcant part of the transmitted energy will be wasted and the equalization will
be challenging.
When the aggressor NRZ signal is transmitted on closely spaced channels, there will be
strong FEXT induced on the adjacent lanes. Assuming a low-impedance microstrip line
of Fig. 5.2 (a), the inductive coupling is dominant and the crosstalk transfer function
5.1. Analysis of ISI and FEXT for BB and NRZ/MT Signaling 75
-60
-45
-30
-15
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
Through
Xtalk
0 3 6 9 12 15
(a)
-100
0
100
200
h(
t)
(m
V
)
6 6.8 7.6 8.4 9.2 10
Time (ns)
1ns
-30
0
30
60
h F
E
X
T(
t)
(m
V
)
(b)
Through
Xtalk
Figure 5.3: (a) Measured channel frequency response. (b) Measured channel 9 Gb/s single-bit
pulse response.
is proportional to the derivative of the main channel transfer function [70]. Therefore,
the power spectrum of the FEXT coupled to an adjacent diﬀerential channel can be
calculated as
SFEXT(ω) = βτ
2
f ω
2 × Sin(ω) (5.2)
where τf is the forward coupling time constant, and the value of β, which depends on
the board type, is between 1/2 and 1/3 for the diﬀerential channels [65]. Fig. 5.3 (a)
shows the measured channel frequency response. The through response shows multiple
notches at 3 and 9 GHz, while the crosstalk frequency response is fairly proportional to
the derivative of the main channel response and it demonstrates multi-drop nature. From
the time domain point of view, the multi-drop frequency characteristic leads to a long tail
pulse response. Fig. 5.3 (b) presents the measured 9 Gb/s pulse response for the through
and crosstalk transfer functions. Both demonstrate about 1 ns long tail pulse responses.
Therefore, the equalization task can be quite challenging for a conventional NRZ-based
transceiver.
76
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
DFE
EqualizerTX1 CTLE
RX1
22.2ps-0.15V
0
0.15V
22.2ps-0.15V
0
0.15V
DFE
EqualizerTX2 CTLE
RX2
(a)
(b) (c)
Figure 5.4: (a) Conventional BB transceiver block diagram for communicating at 9 Gb/s over
MDB interface. System-level channel simulation eye diagrams for: (b) without, (c) with crosstalk,
both after optimizing the CTLE and DFE blocks.
To illustrate the complexity of designing a traditional BB serial data transceiver for a
MDB memory interface, a conventional BB transceiver is modeled and simulated for
communicating at 9 Gb/s over the channel, which its frequency response is shown in
Fig. 5.3 (a). The block diagram of the BB transceiver used in the system-level simulation
is presented in Fig. 5.4 (a). The transmitter employs NRZ signaling and is assumed to
have 700 fsrms output jitter. The receiver employs CTLE and decision-feedback equalizer
(DFE) for data equalization, and both are considered to be ideal blocks without random
noise. Fig. 5.4 (b) represents the equalized eye diagram on the RX side when there
is no aggressor present on the TX side. The statistical simulation shows that using
the conventional receiver architecture of Fig. 5.4 (a), a DFE equalizer with about 15
taps is needed to equalize the eﬀects of the notches and achieve 8% of UI, UI = 111 ps,
eye-opening, at a bit error rate (BER) of 10−12. However, adding one aggressor TX to
this simulation causes a complete eye closure at BER = 10−12, as shown in Fig. 5.4 (c).
Hence, to equalize this link at 9 Gb/s, either advanced techniques should be employed
(e.g., XDFE [9], XCTLE [65] ), to reduce crosstalk, or the PCB should be redesigned
to have more spacing between the lanes, hence, exhibit less crosstalk. Both approaches
impose higher cost and make the link design more challenging.
5.1. Analysis of ISI and FEXT for BB and NRZ/MT Signaling 77
(a)
S/
P
C
on
ve
rt
or
Rate
=3/Tref
Tref /2
LPF
HPF
CKI
CKQ
LPF
CKQ
CKI
LPF
P/
S
C
on
ve
rt
or
Rate
=3/Tref
TXout
-80
-60
-40
-20
0
Frequency (GHz)
0 4 8 12 16 20
PAM2 QPSK
Txout spectrum
MDB freq. response
M
ag
ni
tu
de
(d
B
)
(b)
Figure 5.5: (a) Hybrid NRZ/MT transceiver architecture. (b) Output TX spectrum when
Fref =3 GHz.
5.1.3 Hybrid NRZ/MT Signaling for Controlling ISI and FEXT
A new Hybrid NRZ/MT architecture suitable for high-speed multi-drop links has been
recently proposed in [71, 24]. Employing this modulation scheme has several advantages
for controlling ISI. Here we show that this signaling method can also be very useful
to reduce the eﬀect of FEXT in high-dense interconnects. Without losing generality,
Fig. 5.5 (a) demonstrates the simpliﬁed system block diagram that consists of three
orthogonal sub-bands. Suppose the input data is constructed from three diﬀerent data
streams each operates at Fref bit rate. Each sub-stream is consequently modulated to its
respective carrier frequency and the combined signal is sent over the line. The output
spectrum of the TX along with a MDB interface frequency response is presented in
Fig. 5.5 (b). Here, the lower-frequency band of the channel, from DC to the ﬁrst notch (i.e.
78
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
PB
(a)
0 0{ ( ), ( )}Cos ? t Sin ? t
1/Tb
0 0{ ( ), ( )}Cos ? t Sin ? t
1/Tb1/Tb
BB( )H ?
-40
-30
-20
-10
0
M
ag
ni
tu
de
(d
B
)
Frequency (GHz)
0 4 8 12 16 20
(b)
Frequency
0
0( )
4
H ? ??0( )
4
H ? ??
( )PBH ?
(c)
Tb
Tb
Tb
Figure 5.6: (a) Hybrid NRZ/MT simpliﬁed block diagram. (b) The channel frequency response
used in C2C communications. (c) PB channel construction.
0-3 GHz), is used for transmitting up to 3 Gb/s NRZ data and constitutes the PAM21
part of the spectrum. Moreover, the upper-frequency band of the channel, between two
notches (i.e. 3-9 GHz), is exploited for transmitting 6 Gb/s data in the quadrature-phase
shift-keying (QPSK) format. This sub-band carries 3 Gb/s data on each of the in-phase (I)
and quadrature-phase (Q) content of the spectrum.
5.1.3.1 ISI Controlling Analysis
Applying NRZ/MT signaling can eﬃciently reduce the ISI since each of the sub-band
experiences less loss at its corresponding Nyquist frequency. For the MDB interfaces,
avoiding channel frequency notches by employing a proper modulation scheme can prevent
bit-energy waste around the notches, as shown in Fig. 5.5(b), and reduces the equalization
circuit complexity. However, the potential of the proposed signaling scheme of Fig. 5.5 (a)
is not limited to the MDB channels and it can equally be employed for lossy backplane
and chip-to-chip (C2C) communications.
1Generally speaking, it can be any kind of PAM-N, ENRZ, or doubinary baseband signaling.
5.1. Analysis of ISI and FEXT for BB and NRZ/MT Signaling 79
(a)
0
150
300
450
h(
t)
(m
V
)
4 4.2 4.4 4.6 4.8 5
333 ps
(b)
-50
50
150
250
h(
t)
(m
V
)
4 4.2 4.4 4.6 4.8 5
Time (ns)
333 ps
(c)
66.6ps
(e)
66.6ps
(f)
0
70
140
210
h(
t)
(m
V
)
4 4.2 4.4 4.6 4.8 5
111 ps
22.2ps
(d)
Figure 5.7: (a) 9 Gb/s single-bit pulse response for the C2C channel. (a), (b) BB and PB 3
Gb/s single-bit pulse response. (c) The eye diagram for 9 Gb/s NRZ signaling. (d), (e) BB and
PB eye diagram for aggregate data rate of 9 Gb/s.
Fig. 5.6 (a) shows a simpliﬁed system block diagram for in-band ISI analysis. The channel
is designed for C2C communication, and it has 40 cm length with the frequency response
shown in Fig. 5.6 (b). Here, the bit period for each of the sub-bands is Tb, therefore, for
BB sub-band the channel loss at the Nyquist rate is
∣∣∣H(ω = πTb )
∣∣∣. This is typically smaller
than the channel loss at ω = 3× πTb , thus, by parallelizing the input stream into three
streams the ISI for BB sub-band is considerably mitigated. Considering passband (PB)
sub-streams, the channel frequency response should be calculated after downconverting
the signal and removing the interfering signals, using a low-pass ﬁlter, on the RX side.
Then, the eﬀective channel frequency response for the PB can be written as
HPB(ω) =
1
4
[H(ω − ω0) +H(ω + ω0)] (5.3)
where ω0 is the carrier frequency. In (5.3) it is assumed that the LPF in Fig. 5.6 (a) can
80
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
Driver
Aggressor
RX1
RX2
V F
EX
T
TX1
HFEXT( )
X1
X2
X3
NRZ
MT
Fref 3×Fref2×Fref
NRZ QPSK
Xn1 Xn2 Xn3
VFEXT
VTX2
Fref 3×Fref2×Fref
NRZ QPSK
X1 X2 X3V
TX
2
Victim
Figure 5.8: FEXT generation mechanism in hybrid NRZ/MT signaling.
eﬃciently remove all the frequency content around 2× ω0. As (5.3) shows, the eﬀective
channel for PB is the superposition of shifted versions of the main channel frequency
response. Fig. 5.6 (c) illustrates the construction of HPB(ω) ﬁguratively. Although the
absolute channel loss is pronounced, the loss tilt from DC to ω = π/Tb is mitigated
thanks to the averaging of the shifted versions of the channel. Hence, we can surmise
that the ISI is consequently reduced.
To show ISI controlling property in time-domain, the single-bit pulse response over the
reference channel, shown in Fig. 5.6 (b), is plotted for the conventional NRZ signaling,
BB, and PB in Fig. 5.7 (a), (b), and (c), respectively, for ω0 = 4π/Tb. Comparing pre-
and post-cursors in traditional NRZ signaling with mixed NRZ/MT one, we can conclude
that the equalization complexity is reduced in hybrid NRZ/MT signaling, regardless of
channel frequency shape. To evaluate the eye diagram that is generated from these pulse
responses, a 9 Gb/s data stream is sent over this channel, using both conventional NRZ
and NRZ/MT signaling, and the received data eye diagram is presented in Fig. 5.7 (d),
(e), and (f) for conventional, BB, and PB, respectively. As expected, the horizontal and
vertical eye-opening are better for the hybrid NRZ/MT signaling.
5.1.3.2 FEXT Controlling Analysis
Similar to any other signaling, transmitting mixed NRZ/MT signal induces a volt-
age noise, VFEXT, into the neighboring lanes, as shown in Fig. 5.8. Assuming that
vTX2(t) = x1(t) + cos(ω0t)x2(t) + sin(ω0t)x3(t), the crosstalk noise for each of the sub-
5.1. Analysis of ISI and FEXT for BB and NRZ/MT Signaling 81
band in the hybrid NRZ/MT transceiver, shown in Fig. 5.6 (a), can be calculated as
VFEXT,BB
2 =
∫ ωb
0
SFEXT,X1(ω)dω (5.4a)
VFEXT,I
2 =
∫ ωb
0
[cos2(θ)SFEXT,X2(ω) + sin
2(θ)SFEXT,X3(ω)]dω (5.4b)
VFEXT,Q
2 =
∫ ωb
0
[sin2(θ)SFEXT,X2(ω) + cos
2(θ)SFEXT,X3(ω)]dω (5.4c)
where ωb = 2π/Tb is the sub-band bit rate, SFEXT,XK is the K-th sub-stream FEXT
power spectrum given in (5.2), and θ is the phase shift between HFEXT(ω) and H(ω) and
is roughly equal to -90° if the crosstalk is more inductive than capacitive.
As (5.4) shows, the FEXT voltage that is induced on the victim channel remains orthogonal
as HFEXT(ω) does not change this property. Thus, in the victim receivers, the same
downconverting/integrating process which is performed for recovering the main TX signal,
can automatically cancel out the orthogonal aggressors. Hence, the inter-orthogonal
sub-band conversion (IOSC) is reduced in the victim sub-band equalizer, and the eﬀective
crosstalk energy is equal to only one of the aggressor sub-bands, and two thirds of crosstalk
energy is canceled. For NRZ signaling, as long as the hFEXT(t) does not settle down to
zero, the transmitted bits can convolve with the channel pulse response and they can
induce voltage noise on the victim channel. However, in NRZ/MT signaling only one
of the three sequences acts as the aggressor for the victim receiver. Therefore, eﬀective
VFEXT is reduced for each sub-band RX, and hFEXT(t) long tail will not cause severe
crosstalk compared with the conventional NRZ signaling.
82
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
5.2 System Design Overview
In this Section, the system design of the proposed hybrid NRZ/MT transceiver for
communicating over a 4-lane diﬀerential MDB interface at the aggregate data rate of
36 Gb/s is presented. The proposed hybrid NRZ/MT architecture for communicating
over the MDB channel of Fig. 5.3 is presented in Fig. 5.5 (a). The random data is
generated by three independent embedded PRBS15 generators, each operating at Fref ,
i.e., 3 GHz. Two mixers upconvert two sets of random data into I and Q sub-bands to
create the QPSK band. The local frequency is chosen to be 2× Fref . The third stream is
added together with I /Q sub-bands by the output summer/driver circuit. There is no
data around 3rd clock harmonic in this architecture, thus, we have avoided employing
pulse-shaping ﬁlters for the baseband data to relax the linearity requirements on TX
output driver [71]. Although pulse-shaping has not been employed the inter-channel
interference (ICI) still leaves enough margin for error-free operation of the link. The
main reason is that only 4.7% of spectrum energy is located between ωb = 2π/Tb and
ωb = 4π/Tb, as can be calculated by integrating the NRZ power spectrum, given in
(5.1), in the desired bandwidth. From the linearity requirement point of view, QPSK
modulation without pulse shaping does not require a linear output driver [45]. Hence,
a current-mode output driver can appropriately add BB, I, and Q sub-bands together
and construct the hybrid NRZ/MT output stream. Our system-level simulation shows
that without baseband pulse-shaping, the eye diagram can provide up to 60% horizontal
opening at BER = 10−14.
The QPSK and NRZ signaling necessitate an SNR of about 18 dB and 13 dB for
BER = 10−14 [45], respectively. Hence, the minimum received power versus the noise
ﬁgure (NF) of the RX can be calculated as [41]
Pmin −NF = 10 log(BW ) + SNRmin − 174 (5.5)
where Pmin is the minimum received signal power, BW is the frequency band of interest,
and SNRmin is the required SNR level for equalizing received data. Substituting the
minimum required SNR in (5.5), Pmin −NF should be -58 dB and -63 dB for the BB
and the PB at BER = 10−14, respectively. This trade-oﬀ between Pmin and NF gives a
valuable insight for optimizing the power eﬃciency. The TX power can be reduced up to
the point that NF on RX side becomes dominant, thereby forcing an increase of power
on the RX side.
Moreover, based on the system-level simulation, ±5◦ phase mismatch between LOI and
LOQ yields a 3 dB SNR penalty in the vicinity of BER = 10−14. This amount of the
SNR penalty can be compensated either on the TX side by increasing the TX power
5.3. Circuit Design 83
by 3 dBm, or on the RX side by providing a 3 dB lower NF speciﬁcation. Hence, the
LOI/Q phase mismatch should be kept lower than ±5◦ and the analog front-end (AFE)
should be designed accordingly. A source-synchronous architecture is employed for the
clocking scheme, as shown in Fig. 5.1, which relaxes the complexity of the CDR circuit
and provide an inherent tracking of correlated jitter [27, 46].
Besides, the output spectrum of the proposed TRX has the inherent notches at f = 1/Tref
and f = 3/Tref , as shown in Fig. 5.5 (b). Therefore, for the MDB channel scenario the TX
spectrum shape can track the channel notches by adjusting the reference clock frequency,
thus, TRX can be customized to the channel response. In the NRZ/MT approach, in
order to match the transmitted spectrum to that of the channel, an initial calibration
phase is applied, which can change the reference clock frequency depending on the quality
of the received signal. Therefore, the TX spectrum can be shaped with respect to the
channel characteristic and no bit energy would be wasted around the channel notches.
Thus, the proposed signaling scheme is applicable to any channel, regardless of the number
of the notches and their frequencies, and even if there is no notch in the channel frequency
characteristic.
5.3 Circuit Design
Based on the system-level requirements, the design of the transceiver is described in the
following. The goal is to communicate at 4×9 Gb/s over four diﬀerential MDB channels
shown in Fig. 5.3, side-by-side.
5.3.1 Transmitter
Fig. 5.9 (a) shows the proposed hybrid NRZ/MT transmitter, which consists of four
diﬀerential TX all bundled together with a clock lane. The modulation scheme is based
on [24] that applies FDM scheme and combines three baseband data streams into three
sub-band (BB, I and Q) single stream. Each of the independent baseband data streams,
generated by an embedded PRBS15, has Fref b/s data rate, i.e., 3 Gb/s. Two passive
mixers upconvert the two sets of baseband streams into PB sub-band using 2 × Fref
orthogonal clock phases. An LVDS-type output driver adds the PB with the BB sub-band
and generates 9 Gb/s single stream. These three sub-bands are orthogonal within one sub-
unit interval, UIsub = 333, and can be recovered by downconverting/integrating within
one UI on the RX side. Moreover, four TX are placed side-by-side to support an aggregate
36 Gb/s data rate, and as described in Section II, baseband pulse-shaping is not applied.
The TX output power is adjustable by using a 3-bit current DAC as the current source,
84
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
(a)
(b)
M
D
LL
3GHz Fref
????
??
LOQ LOI
3GHz
6GHz
LVDS Summer/Driver?? ??
LOQ
LOQ
?? ??
3Gb/s
PRBS15
?? ??
LOI
VCMFB
Iss
Txp1
Txn1
VCMFB
×4
TRX1
Cmos
3Gb/s
PRBS15
3Gb/s
PRBS15
D1
D2
D3
D
2
(V
)
0
-1
1
0
-1
1
D
1
(V
)
0
-1
1
D
3
(V
)
0 Tref
4
Tref
2
Tref
4/3
Tref
0
-1
1
LO
I(
V
)
0
-1
1
LO
Q
(V
)
0 Tref
4
Tref
2
Tref
4/3
Tref
Figure 5.9: Proposed 4×9 Gb/s TX architecture.
i.e., ISS in Fig. 5.9 (a). Based on (5.5), the optimum value for Pmin is found to be -27 dBm
and -22 dBm for PB and BB, respectively, which leaves a suﬃcient margin for NF to be
realized by a low-power circuit on RX side. By applying an appropriate timing for the
baseband data and the local clocks, as shown in Fig. 5.9 (b), each transition has at least
Tref/16 time diﬀerence from the nearest transition point. Therefore, the peak-to-average
ratio (PAR) and ICI are eﬃciently mitigated, and the peak-to-peak output swing of the
TX is set to be around 300 mV.
5.3. Circuit Design 85
Rx
BPF
LOQ
LPF
LOI
LPF
LPF Amp.
Amp.
Amp.
B
B
out
3 Gb/s
Q
out
Iout
×4
3 Gb/s
3 Gb/s
1.00.1 10
Frequency (GHz)
0
10
-10
-15
-5
5
G
ai
n
(d
B
)
-3dB~7GHz
Clean-up
MDLL
LOI3 GHz
forwaded CK
6GHz25% DutyCycle
logic LOQ
Rx
VRF
VRF
Figure 5.10: Proposed 4×9 Gb/s mixed NRZ/MT receiver.
5.3.2 Receiver
Fig. 5.10 shows the RX block diagram, which supports four diﬀerential I/Os side-by-side.
The CDR method for the link is based on the proposed architecture in Chapter 3, which
employs forwarded clock scheme to amplify, multiply, and delay the received clock on
RX side. To reduce the power dissipation, each of TRX has its own clean-up multiplier
DLL (MDLL) blocks, while only one single-ended reference clock is forwarded at Fref
rate. At the input of RX, the received signal is ﬁltered by the channel frequency response
and is more attenuated for the PB sub-band. The attenuation for the BB sub-band is
5 dB at the Nyquist frequency, while the PB sub-band experiences a 14 dB tilt from
Fref to 3× Fref . Therefore, the BB sub-band, can be equalized by low-pass ﬁltering of
I/Q contents. Then the signal is ampliﬁed and passed to an output buﬀer. For the PB
sub-band, after bandpass ﬁltering and gain boosting, the direct-conversion architecture
is used for signal reconstruction. The PB sub-band ﬁrst pass through a band-pass ﬁlter
(BPF), which suppresses the BB content. Then, a switched-capacitor mixer/ﬁlter (SCMF)
unit is used to perform downconverting/integrating task.
The BPF consists of a two-stage peaking ampliﬁer, which provides around 10 dB gain
boosting at passband and removes the baseband [71]. The input low-pass ﬁlter (LPF)
for BB sub-band, presented in [71], is a 2nd-order ﬁlter, which suﬃciently suppresses
86
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
Figure 5.11: Half-circuit implementation of SCMF and baseband ampliﬁer units.
the PB sub-band. The design and analysis of these building-blocks has been presented
in Chapter 3. Comparing to the work that is presented in Chapter 3, SCMF unit
has a diﬀerent design and an improved clocking scheme, which yields a better circuit
performance. The design of this circuit is described in the following.
5.3.2.1 Downconverting Mixer / Filter Unit
The half-circuit implementation of the proposed SCMF, and the baseband ampliﬁer is
presented in Fig. 5.11, where the Gm stage models the preceding BPF block. In this
circuit, a current-driven mixer with a 25% duty cycle LO is used to downconvert the
received signal, while the non-overlapping clocking scheme helps to provide a better
5.3. Circuit Design 87
0 1 2 3 4 5 6 7 0
t=(n-3/8)Tref ?T=Tref/8
CKQ
CKI
CKQ
CKR
IRF,Q
IRF,I
n n n n n+1
nn-1 n n
n
n nn-1
7 1
?1? Reset 
?0,2,4,6 ( )
?0
01
1 t T
RFt
I t dt
C
??
?3,7 Charge sharing 
?5 Hold 
t=(n+5/8)Treft=nTref
Figure 5.12: The timing diagram of SCMF in Fig. 5.11
low-pass ﬁltering. The downconverted signal is then given to a sense ampliﬁer that
properly ampliﬁes the signal, hence, relaxes the noise and linearity requirements of the
BPF.
In addition to downconverting the received signal, the proposed circuit isolates the orthog-
onal I and Q sub-bands from each other. The timing diagram for Q sub-band is presented
in Fig. 5.12. The received data-stream can be modeled in the current domain, and it is
composed of two orthogonal parts, as shown in Fig. 5.12, IRF(t) = IRF,I(t) + IRF,Q(t),
where IRF,I(t) = I(t)cos(ω0t), and IRF,Q(t) = Q(t)sin(ω0t). The SCMF is designed such
that the received current is integrated by the orthogonal basis functions, i.e., 90° phase-
shifted square waveforms. As shown in Fig. 5.11, M1 is on during Φ2 and also Φ6 time
intervals, while M3 conducts during Φ0 and Φ4 time intervals. Having the proposed timing
diagram of Fig. 5.12, the integration of IRF,I(t) is zero at the end of all Φ0, Φ2, Φ4, and
Φ6 time intervals. Therefore, assuming that the sampling clocks are properly aligned with
the received data, this circuit is able to suppress the I sub-band by appropriate current
integration. Meanwhile, the Q sub-band (which is in current domain) is integrated over
88
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
Φ0, Φ2, Φ4, and Φ6 time intervals to amplify and downconvert the main signal. During
Φ3 and Φ7 time intervals, the charge-sharing between C1 and C2 capacitors implements a
low-pass IIR ﬁlter, which can further reduce the unwanted high-frequency signals. At the
end of this process, after the data is detected, the capacitors will be discharged by a reset
clock phase, CKR, and made ready for the next phase.
Based on this operation and the timing diagram shown in Fig. 5.12, the output of SCMF
during Φ3-Φ6 can be expressed by
VY1(n) = αVY1(n− 1) +A0
[∫ t1+ΔT
t1
IRF(t)dt−
∫ t0+ΔT
t0
IRF(t)dt
]
(5.6)
where α = C1/(C1 + C2), A0 = 1/[C1(C1 + C2)], t0 = (n− 3/8)Tref , t1 = (n− 1/8)Tref ,
ΔT = Tref/8, and Tref is the sub-band unit interval time.
As Fig. 5.12 shows,
∫ t1+ΔT
t1
IRF(t)dt = −
∫ t0+ΔT
t0
IRF(t)dt = ΔT ×Q(n). Therefore, the
output voltage of the baseband ampliﬁer can be calculated as
Qout(z) =
2ΔTA0A1
1− α · z−1 ×Q(z) (5.7)
where z = ejωTref , and A1 = gm7,8 × (ro7,8||ro9,10 − 1/gm9,10), where gm7−10 and ro7−10
denote the small-signal transconductance and output resistance of M7−M10, respectively.
As (5.7) shows, the Q sub-band is downconverted and low-pass ﬁltered by charge-sharing
mechanism. The ratio of C2/C1 deﬁnes the ﬁlter bandwidth, which is selected to be
around 3 in order to provide suﬃcient ﬁltering and gain characteristics.
5.3.3 Clock and Data Recovery
The DLL-based CDR topology is a preferable choice for source-synchronous high-speed
links since it does not have the jitter accumulation issue, while it consumes less power
compared with PLL-based CDR architecture. The design and analysis of the multiplier
DLL (MDLL)-based CDR, which can be eﬃciently employed in NRZ/MT transceivers, is
presented in Chapter 4. Most of the CDR building-blocks in our new 36 Gb/s has the
same topologies that are presented in Chapter 4. Comparing to the MDLL-based CDR
in Chapter 4, We have improved the DLL performance to operates at higher Fref (i.e.,
3 GHz nominal frequency), and we add 25% duty cycle clock generation logic to the edge
combiner circuit. Therefore, the required clock phases has been generated by some minor
modiﬁcations of our previous design while the core circuit remains intact, and the power
consumption is not degraded.
5.4. Measurement Results 89
Figure 5.13: (a) Test setup with MDB channel. (b) Chip die photo and layout in 40 nm
CMOS.
5.4 Measurement Results
Fig. 5.13 (a) shows the test setup for the proposed NRZ/MT TRX system. The reference
FR-4 channel, 30 cm in length, exhibits frequency notches as shown in Fig. 5.3. These
notches are introduced by two open stubs and can mimic the frequency response of a
typical MDB channel. Using this structure we are able to tune the frequency of the
notches within a ±30% range, for test purpose. The prototype is fabricated in a 40 nm
GP 1-poly 10-metal bulk CMOS process and includes four independent RX and TX
circuits, each occupying 130× 60 μm2 and 80× 60 μm2, respectively. The die micrograph
is shown in Fig. 5.13 (b).
The measured output spectrum of TX for diﬀerent Fref setting is shown in Fig. 5.14 (a).
Based on this measurement, the TX is able to adjust its spectrum to the frequency
90
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
(b)
O
ut
pu
tp
ow
er
(d
B
m
)
(a)
-80
-70
-60
-50
-40
O
ut
pu
tp
ow
er
(d
B
m
)
Frequency (GHz)
Fref =1.0 GHz
Fref =3.0 GHz
Fref =3.5 GHz
0 3 6 9 12 15
-80
-70
-60
-50
-40
Frequency (GHz)
0 3 6 9 12 15
Meas. TXout spectrum
Meas. RXin spectrum
Figure 5.14: (a) Measured TX output spectrum at diﬀerent Fref settings. (b) Measured
spectrum at the input of RX.
response of the channel within a ±25% range, i.e., the ﬁrst null frequency in the TX
frequency spectrum can be changed between 1 and 3.5 GHz. The measured spectrum
at the input of RX is shown in Fig. 5.14 (b) and shows that the PB sub-band is more
attenuated compared with the NRZ sub-band. Based on this measurement, the received
energy for the BB and the PB sub-bands are -21.5 dBm and -26 dBm, respectively.
Fig. 5.15 (a), (b), and (c) show the measured eye diagram for Q, I, and BB sub-bands,
respectively, each having 3 Gb/s data rate, when no aggressor transmitter is present. The
bathtub for each of the Q, I, and BB sub-bands is shown in Fig. 5.15 (d), (e), and (f),
respectively. The BB, Q, and I sub-bands have 182 ps, 169 ps, and 177 ps horizontal
margin at BER = 10−12, respectively. In this measurement, the conﬁguration bits for
oﬀset, bandwidth, and gain setting of ampliﬁers have been calibrated based on quality of
the sub-band eye diagrams, through a serial peripheral interface.
Having aggressor TX on Channel 2 and Channel 4, the measured FEXT on Channel 3
5.4. Measurement Results 91
Qout eye diagram
(a)
83.3 ps
30
m
V
Iout eye diagram
(b)
83.3 ps
30
m
V
BBout eye diagram
(c)
83.3 ps
30
m
V
(d)
10-16
10-12
10-8
10-4
100
B
E
R
0.1 0.3 0.5-0.1-0.3-0.5
(f )
10-16
10-12
10-8
10-4
100
0.1 0.3 0.5-0.1-0.3-0.5
Clock phase (UI)
33.3 ps
55% UI
Rj = 6.8 ps
Dj= 54.5 ps
Tj =151 ps@BER 1e-12
(e)
10-16
10-12
10-8
10-4
100
0.1 0.3 0.5-0.1-0.3-0.5
33.3 ps
53% UI
Rj = 7.1 ps
Dj= 54.5 ps
Tj =156 ps@BER 1e-12
33.3 ps
50% UI
Rj = 7.3 ps
Dj = 57.3 ps
Tj =164 ps@BER 1e-12
B
E
R
B
E
R
Figure 5.15: Measured RX eye diagram at 9 Gb/s data rate. (a) Q sub-band. (b) I sub-
band. (c) BB sub-band. Corresponding bathtub curve for: (d) Q sub-band, (e) I sub-band, (f)
BB sub-band, each operates at 3 Gb/s data rate.
is shown in frequency and time domain in Fig. 5.14(a), and (b), respectively. Although
the peak-to-peak FEXT is about 60 mV its spectrum has NRZ/MT shape, as discussed
in Section II. As explained in Section II, the strong reﬂections and severe FEXT make
92
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
-90
-80
-70
-60
-50
X
ta
lk
po
w
er
(d
B
m
)
Frequency (GHz)
0 3 6 9 12 15
(b)
(a)
41.6 ps10 mV
Figure 5.16: Measured FEXT on Channel 3 (a) in frequency, and (b) in time domain.
it impossible to equalize the link by employing conventional NRZ signaling at 9 Gb/s.
However, by applying the hybrid NRZ/MT signaling the measured eye diagram for Q,
BB, and I sub-bands for diﬀerent lanes are shown in Fig. 5.16. Here, the middle lanes are
labeled as Channel 2 and Channel 3 while the edge lanes are labeled as Channel 1 and
Channel 4, as shown in Fig. 5.1(b). As expected, the middle lanes are more aﬀected by
FEXT, hence, the eye-opening is less for them. However, all of the received sub-bands
have suﬃcient eye-opening to ensure a 40% unit-time-interval horizontal eye-opening
(referring to 3 Gb/s sub-band data rate) at BER = 10−12. Each sub-band operates at
3 Gb/s data rate and the total data rate is 36 Gb/s over four diﬀerential lanes. The
bathtub is shown for the PB receivers as the BB receivers demonstrate a better eye-
opening at BER = 10−12. This measurement shows that crosstalk degrades the horizontal
eye-opening, shown in Fig. 5.17, by about 15% in the worst case.
5.4. Measurement Results 93
BER BBout Qout Iout
0.1
0.3
0.5
-0.1
-0.3
-0.5
C
lock
phase
(U
Isub )
33.3ps
333.3ppps
40%
U
I
10
-16
10
-12
10
-8
10
-4
10
0
C
hannel4
40mV
83.3ps
0.1
0.3
0.5
-0.1
-0.3
-0.5 33.3ps
C
lock
phase
(U
Isub )
333.3ppps
35%
U
I
10
-16
10
-12
10
-8
10
-4
10
0
C
hannel383.3ps
0.1
0.3
0.5
-0.1
-0.3
-0.5
35%
U
I
33.3ps
C
lock
phase
(U
Isub )
3355%%%
UUU
II
333.33pps
10
-16
10
-12
10
-8
10
-4
10
0
C
hannel2
40mV
83.3ps
40mV
10
-16
10
-12
10
-8
10
-4
10
0
0.1
0.3
0.5
-0.1
-0.3
-0.5
C
lock
phase
(U
Isub )
33.3ps
33.33ppss
45%
U
I
C
hannel183.3ps
40mV
Figure 5.17: Measured RX eye diagram at RX side. The aggregate data rate is 36 Gb/s while
each sub-band operates at 3 Gb/s data rate.
94
Chapter 5. Hybrid NRZ/MT Signaling for Controlling ISI and Crosstalk in Dense
Interconnects
(a)
LVDS 32% Cl
oc
k 
17
%
Mixer 3%
AFE
 14
%
SC
M
F 
7%
A
m
p.
 4
%
Cl
oc
k 
22
%
TX component Power (mW) 
Clock Unit 1.53 
LVDS summer/driver 2.88 
Mixer 0.27 
RX component Power (mW) 
Clock Unit 1.95 
Amplifiers 0.4 
SCMF 0.64 
AFE 1.33 
(b)
(c)
Figure 5.18: (a) Power breakdown for the whole TRX. (b) TX power speciﬁcation. (c) RX
power speciﬁcation.
Table 5.1: TRX Performance Comparison with State-of-the-art Memory Transceivers.
* Core size area.
** Power is calculated for the whole RX.
Reference [65] [48] [69] This work 
Technology 65nm 45nm SOI 180nm 40nm GP 
RX/TX Area (mm2) 0.036*/NA 0.014*/NA 0.036*/0.018* 0.008*/0.005* 
Channel 6" FR4 5" FR4 4" FR4 MDB-12" FR4  
Multi-lane # × Data rate  
(Gb/s) 4 × 12 2 × 12.5 8 × 5 4 × 9 
RX/TX power efficiency 
(pJ/bit) 1.78
**/NA 0.52**/NA 6.2/11.6 0.49/0.5 
BER /  Horizontal  
eye-opening 
10-8 / 
38% UI (32ps) 
10-12 / 
15% UI (12ps) 
10-12 / 
63% UI (126ps) 
10-14 / 
35% UIsub (117ps) 
Supply (V) 1.1 1.2 1.8 0.9 
IO type /  
Architecture 
Single-ended / 
Analog-IIR 
Differential / 
SC-DFE 
Single-ended / 
Staggered bus 
Differential /  
Multi-Tone 
The TRX power breakdown for the total data rate of 9 Gb/s is shown is Fig. 5.18 (a).
The whole chip consumes 36 mW from a 0.9 V power supply at this data rate, leading
to 1 pJ/b link eﬃciency over the MDB channel interface. This includes the power
consumption of TX (MDLL, mixers and LVDS driver) and RX (BPF, LPF, SCMF and
MDLL). The consumption of the built-in PRBS15 generator and the I/O buﬀers that
drive the measurement equipment are excluded from this calculation. The TX and RX
consume 51% and 49% of the total power, respectively. The TX and RX circuit power
5.5. Conclusion 95
consumption are shown in Fig. 5.18 (b) and (c), respectively.
Table 5.1 summarizes the silicon performance comparison with state-of-the-art memory
interface transceivers. Compared to the other works, the proposed TRX has the link
power eﬃciency of 1 pJ/b, while it operates over a multi-drop memory interface. The
horizontal eye-opening is at least 110 ps for each sub-band at BER = 10−14. The total
bit-stream is divided into three sub-bands, therefore, the sub-receivers operate at one
third of the total bit rate. The horizontal eye margin, thus, beneﬁts from this fact and
the link becomes less sensitive to clock jitter and other non-idealities.
5.5 Conclusion
A multi-lane 4×9 Gb/s transceiver with 1 pJ/b energy eﬃciency has been demonstrated
in 40 nm CMOS technology. Communicating over MDB interface, hybrid NRZ/MT
signaling has been employed in order to relax the required equalization, and hence reduce
energy consumption. Moreover, it has been shown in this article that NRZ/MT signaling
can eﬃciently reduce the inﬂuence of ISI as well as crosstalk. This property is especially
interesting for modern high density links where the distances between the lanes are
gradually reduced and the eﬀect of crosstalk becomes more evident. Experimental results
for the proposed four lane link show that the recovered data demonstrates error-free data
transmission, while operating at 9 Gb/s/lane. A modiﬁed discrete-time mixer/ﬁlter has
been employed in the receiver in order to reject more eﬃciently the side-channels, and
hence achieving better eye-opening.

6 Conclusion
6.1 Achievements
In this thesis a new signaling method for wireline application has been investigated,
and eﬃcient system architecture and circuit topologies have been introduced in order
to implement the proposed signaling method on silicon. This is mainly motivated by
the fact that in IoT era the ever increasing demand for higher bandwidth necessitates a
tremendous improvement in I/O speed, whereas the energy eﬃciency should meet the
stringent system requirements. While industry continues the demand for higher data
transfer speed over low-cost, high-loss channels, a very complex digital and analog signal
processing is required to mitigate the channel impairments. Therefore, modern serial data
transceivers are equipped with very sophisticated equalizers and crosstalk cancellation
units, which in return increase system complexity and energy consumption. Although
process scaling has fueled the drastic increase in processing power over the last decade,
the energy eﬃciency improvement has diminished due to supply voltage saturation in ﬁner
technology nodes. As a consequence, if the baseband signaling continues to be used in this
paradigm, the I/O power cannot scale proportionally with the data bandwidth. Hence,
bearing in mind that the solution should be found in system architecture, signaling scheme,
and circuit innovation, this thesis focused on development of a Hybrid NRZ/multi-
tone signaling scheme in order to implement high-speed and low-power serial data
transceivers, especially for the channel interfaces where conventional baseband signaling
transceivers (e.g., NRZ, PAM-4, ENRZ) cannot oﬀer power eﬃcient solution.
In this Chapter 3 we have proposed a new signaling scheme, called hybrid NRZ/multi-
tone (MT), which can shape the transmitted spectrum of the transmitter (TX) and
be customized to the characteristics of the channel, thus, it provides a power eﬃcient
solution. Having the proposed signaling scheme, the circuit design and implementation of a
97
98 Chapter 6. Conclusion
7.5 Gb/s hybrid NRZ/MT transceiver for multi-drop bus memory interfaces
in 40 nm bulk CMOS technology is presented. Reducing the complexity of the
equalization circuitry on the receiver (RX) side, the proposed architecture achieves
1 pJ/bit link eﬃciency for a MDB channel with 45 dB loss at 2.5 GHz. The
transmitted spectrum is composed of baseband (BB) and I /Q sub-bands with the ability
to match the modulation frequency of the entire TRX with respect to the channel
response over a ±25% range. A switched-capacitor-based mixer/ﬁlter is developed to
eﬃciently down convert and equalize the I /Q sub-bands in the RX. The core size area is
85× 60 μm2 and 150× 60 μm2 for the TX and RX, respectively.
In this Chapter 4, the clock and data recovery unit for the proposed hybrid NRZ/MT
transceiver has been explained in details. The proposed circuit adequately reduces the
complexity and power dissipation of CDR unit in wireline transceivers, while satisfying the
required jitter and phase mismatch speciﬁcations. New strategies are proposed to satisfy
these speciﬁcations, while they fulﬁll the stringent power consumption requirements in
wireline links. The RMS jitter remains below 3 ps over the entire input frequency range.
A low-voltage technique is proposed to improve the mismatch eﬀect in charge pump
circuit and reduces the deterministic jitter. Based on the measurement results, the power
consumption does not exceed 1.8 mW over the entire input frequency range results in
280 fJ energy consumption for 6 GHz output.
In Chapter 5, we have exploited the properties of multi-tone signaling, especially or-
thogonality among diﬀerent sub-bands, to reduce the eﬀect of crosstalk. A four-channel
transceiver has been implemented in a standard CMOS 40 nm technology in order to
demonstrate the performance of NRZ/MT signaling in presence of high channel loss
and strong crosstalk. Comparing to the ﬁrst prototype presented in Chapter 3, we
have extended the proposed NRZ/MT serial data transceiver to support four closely
spaced diﬀerential channels, to study the performance of the system in presence of severe
crosstalk. Furthermore, we have improved the receiver equalization scheme to reach 9
Gb/s (4.5 Gb/s/pin) that is 20% faster compared with our earlier work without degrading
energy eﬃciency. The multi-tone nature of the proposed transceiver helps to control the
ISI and reduce the far-end crosstalk (FEXT), which results in a very energy-eﬃcient
implementation. The ﬁnal prototype can deliver an aggregate 36 Gb/s data rate while
demonstrates an error-free operation over our reference MDB channel. The core size area
is 80× 60 μm2 and 130× 60 μm2 for the TX and RX blocks (including the clock unit),
respectively.
6.2. Future Works 99
6.2 Future Works
There are still some aspects in the proposed method to be improved for communicating at
higher speeds and lower power, and for delivering robust product-level hybrid NRZ/MT
I/O cells. The automatic adaptability of the proposed TRX is one of the important subject
to be investigated so as to the whole TRX can be customized to the link characteristics
and operates at the optimum power eﬃciency. It should be done by delivering a ﬁxed data
rate, while the modulation scheme adapts to link characteristics through a background
calibration. Likewise, employing diﬀerent modulation schemes (ENRZ, duobinary,PAM-4,
etc.) for diﬀerent sub-bands should be investigated in the proposed multi-tone method.
Based on a preliminary study, this technology shows a great potential for communicating
beyond 100 Gb/s/wire over lossy backplane channels, where the channel impairments
becomes so severe that baseband TRX cannot oﬀer any functional solution. The CDR
implementation, which is presented in Appendix B, needs careful circuit implementation
and precise system-level investigation so as to the circuit realization becomes robust and
energy eﬃcient. Moreover, employing MT technique for capacitive-coupled 3D integration
can be a subject of future research.

A System-Level Statistical BER Mod-
eling for Hybrid NRZ/MT Link
Accurate modeling and analysis of a link is critical to evaluate system timing and voltage
margin by including link imperfections and environmental noises, e.g., deterministic and
random jitter. In today’s high-speed I/O paradigm, where the error-free system operation
is the designers’ milestone, such system-level simulation should evaluate the link system
performance for a very low BER (i.e., < 10−16), and should also provide a time-eﬃcient
solution for system designers to optimize the link performance.
Traditional SPICE-based simulation techniques can precisely simulate various determinis-
tic jitter sources, such as ISI and crosstalk from passive channels. However, the inclusion of
random jitter in SPICE simulations necessitates very long simulation time. Likewise, the
very-low targeted BER requires simulation of tremendous bit counts, thereby time-domain
SPICE-based simulation becomes more and more time consuming in today’s high-speed
serial link modeling. Innovative simulation techniques have been recently introduced to
accurately model link performance for low BER eﬃciently. The most popular approaches
are based on a statistical eye consisting of the ISI probability distributions at diﬀerent
sampling phases [72, 73, 74, 75]. In this Appendix we brieﬂy review the statistical eye
analytical model, and explain the algorithm that we used in our MATLAB simulations
for extracting the eye diagram and bathtub for hybrid NRZ/MT system presented in
Chapter 2.
The statistical eye methodology is based on the calculation of the probability distribution
function (PDF) for a received bit at a given sampling time. Assuming the link as a linear
time-invariant (LTI) system, the PDF for a received bit is calculated by convolving the
PDF for ICI, ISI, TX/RX induced jitter, which are evaluated independently. Then, the
BER is estimated from the resulting PDF. To include the TX and RX jitter into the
101
102 Appendix A. System-Level Statistical BER Modeling for Hybrid NRZ/MT Link
Figure A.1: Transmitter and receiver jitter models.
model, the Dual Dirac model for jitter is used [73].
Transmitting binary data over all sub-bands, the received signal at the corresponding
sub-RX front-end (before sampling) can be written as
y(t) =
∑
k
(ak − ak−1)sth(t− εTXk − kTb) +
∑
l
(bl − bl−1)sICI(t− lTb)
+
∑
p
(cp − cp−1)sXTK(t− pTb) (A.1)
where sth(t), sICI(t), and sXTK(t) are the through, ICI, and crosstalk step response
(including mixer and ﬁlter eﬀects into the step responses), respectively, Tb is the bit
period, k, l, and p are the transmitted symbol index, ak is the binary output of the
sub-TX, bl, and cp represent the output of the aggressor sub-band TX, and the aggressor
neighbor TX, respectively, and εTXk is the main sub-TX jitter. At the corresponding
sub-RX side and after sampling at t = mTb + εRXm , where εRXm is the sub-RX jitter, the
sampled signal can be written as
ym =
∑
k
(ak − ak−1)sth[εRXm − εTXk + (m− k)Tb]+
∑
l
(bl − bl−1)sICI [εRXm + (m− l)Tb]+
∑
p
(cm − cm−p)sXTK [εRXp + (p− l)Tb].
(A.2)
The transmitter and receiver jitter are modeled as impulses at the transition edge times,
formally known as a Dual Dirac model, as shown in Fig. A.1 [76, 73]. Hence, the received
signal at the input of the sub-RX sampler, given by (A.2), can be fairly estimated using
103
Taylor series expansion as follows:
ym ∼=
∑
k
(ak − ak−1)sth[(m− k)Tb] +
∑
l
(bl − bl−1)sICI [(m− l)Tb]
+
∑
p
(cp − cp−1)sXTK [(m− p)Tb] +
∑
k
(ak−1 − ak)εTXk hthm−k
+ εRXm
[∑
k
(ak − ak−1)hthm−k +
∑
l
(bl − bl−1)hICIm−l +
∑
p
(cp − cp−1)hXTKm−p
]
=
∑
k
akp
th
m−k +
∑
l
blp
ICI
m−l +
∑
p
cpp
XTK
m−p + n
TX + nRX (A.3)
= y0 + ISI + ICI +Xtalk + n
TX + nRX (A.4)
where hthm , hICIl , and h
XTK
p are the data-rate sampled impulse response of the through,
ICI, and crosstalk channel, respectively, ISI, ICI, and Xtalk are the amount of ISI,
ICI, and crosstalk at the sample time, respectively, nTX , and nRX represent the eﬀective
voltage noise for TX and RX timing jitter, respectively, pthm , pICIm , and pXTKm are the pulse
response of the through, ICI, and crosstalk channel, respectively, and y0 is the received
signal without ISI, ICI, and crosstalk.
Each of the ISI, ICI, and crosstalk PDFs can be calculated as [72, 77]
PISIk = P0δ(x) + P1δ(x− pthk ) (A.5)
PICIl = P0δ(x) + P1δ(x− pICIl ) (A.6)
PXtalkp = P0δ(x) + P1δ(x− pXTKp ) (A.7)
PISI = · · · ⊗ PISI−2 ⊗ PISI−1 ⊗ PISI1 ⊗ PISI2 ⊗ · · · (A.8)
PICI = · · · ⊗ PICI−2 ⊗ PICI−1 ⊗ PICI1 ⊗ PICI2 ⊗ · · · (A.9)
PXtalk = · · · ⊗ PXtalk−2 ⊗ PXtalk−1 ⊗ PXtalk1 ⊗ PXtalk2 ⊗ · · · (A.10)
where PISIk , PISIl , and PISIp represent the PDF of the ISI, ICI, and crosstalk from the
k-th, l-th, and p-th bit, respectively, PISI, PICI, and PXtalk are the overall PDF of ISI,
ICI, and crosstalk, respectively, ⊗ represent the convolution symbol, and δ(x) is the unit
impulse function.
The variables in (A.4) are statistically correlated since they are all functions of symbol
pattern and diﬀerent channel impulse responses. Although the exact method for BER
calculation should analyze the correlation between these variables, it can be fairly assumed
that they are independent to simplify the computation [76, 72]. With this assumption,
the overall jitter PDF can be calculated by convolving the PDFs of ISI, ICI, crosstalk,
104 Appendix A. System-Level Statistical BER Modeling for Hybrid NRZ/MT Link
Figure A.2: Flowchart of BER calculation using statistical eye.
and TX/RX jitter. The resulting PDF is then used to calculate BER as follows
BER(vREF) = P (JTOT < vREF − y0|1)P1 + P (JTOT > vREF − y0|0)P0 (A.11)
where JTOT = ISI + ICI +Xtalk + nTX + nRX represents the total induced jitter at
the main sub-RX, vREF is the reference voltage, which is typically zero for diﬀerential
signaling I/Os, and P0, and P1 are the probabilities of the input bit being 0 and 1,
respectively.
The statistical eye diagram is calculated by sweeping (A.11) over the sampling phase and
the reference voltage. The horizontal bathtubs can then be obtained from horizontal and
vertical slices of the statistical eye diagram. Fig. A.2 illustrates the ﬂowchart of the BER
bathtub calculation presented in this Appendix.
The ﬂowchart in Fig. A.2 along with the mathematical analysis of BER calculation in
this Appendix has been implemented in a MATLAB code to evaluate and optimize the
proposed hybrid NRZ/MT transceiver for BER as low as 10−20. It provides an accurate
link estimation while the simulation time can be reasonably short, e.g., less than 3 minutes
for calculating the eye diagram up to BER = 10−24. The results of such simulations have
been presented in Chapter 2.
B CDR Techniques for Hybrid
NRZ/MT Link System
An important part of a serial link transceiver is the clocking circuitry, which can consume
around 40% of the total power budget in the link design. Therefore, the demand for
high-performance clock and data recovery (CDR) fabricated in standard CMOS processes,
which can provide a power eﬃcient solution, continues to grow where the accumulated jitter
performance is paramount to CDR performance. In this paradigm, most of today’s high-
performance I/O interfaces may be categorized as either mesochronous1 or plesiochronous2,
and both require clock and data resynchronization at the RX side.
Source-synchronous links, which is a subset of mesochronous system, are widely adopted
in variety of high-speed parallel I/O interfaces such as GPU to memory, CPU to memory,
CPU to bridge chips, where the cost and power overhead of additional forwarded clock
link is amortized across multiple TRX banks in the system. Likewise, owing to inherent
tracking of the correlated jitter between data and clock in such clocking architecture,
the jitter performance of the link system is improved [78, 79, 80]. The proposed hybrid
NRZ/MT link system presented in Chapter 3 and Chapter 5 employs source-synchronous
architecture, as it has been explained in these chapters. However, the delay ﬁne-tuning,
which is incorporated in the proposed MDLL and is used for adjusting the clock phase at
RX, is manually tuned based on the received signal quality, as discussed in Chapter 4.
Nevertheless, it is possible to design a CDR unit so as to automatically adjust the RX clock
phases and facilitate the transceiver error-free operation. In this Appendix, we explain
the system design of a CDR unit that can be used in the proposed hybrid NRZ/MT
transceiver to ﬁnd the optimum clock phase.
1Mesochronous interfaces adopt precisely identical clock frequencies at either ends of the link, however,
the TX and RX clocks have diﬀerent phase shifts relative to the date.
2Plesiochronous interfaces operate at almost, but not precisely, the same frequencies at the transmitter
and receiver.
105
106 Appendix B. CDR Techniques for Hybrid NRZ/MT Link System
10101
CK Amp.
TX BPF
Data
LPF
CKQ
CKI
LPF
C
D
R
1
M
D
LL
CKSQ
CKSI
??
LPF CDR2
L1
L2
L3
Figure B.1: Forwarded clock CDR architecture for proposed hybrid NRZ/MT link system.
Fig. B.1 shows the proposed CDR architecture for the hybrid NRZ/MT system presented
in Chapter 3. Here, the I/Q sub-bands have a shared CDR unit while NRZ sub-band has
its own CDR. The CDR for the NRZ sub-band (i.e., CDR2 and ΔΦ blocks in Fig. B.1) can
have a conventional architecture, which most NRZ links employ, e.g., the CDR topology
in [81, 82]. It should nominally operates at 2.5 GHz, detect the 1→0 and 0→1 transitions,
adjust the phase of received forwarded clock (e.g, phase interpolator), and the operation
frequency be tunable within a 30% frequency range. The design of such block with these
speciﬁcations is well studied in literatures, therefore, we focus on the design of CDR for
pass-bands in this Appendix.
CDR circuits incorporating bang-bang phase detectors (BBPDs) can be a good choice
for realizing the CDR unit in Fig. B.1, since they do not require a charge pump in
their architecture; thereby, eliminating the need for ampliﬁcation of very short high-
frequency pulses. Moreover, if a tristate PD such as Alexander topology is employed [83]
pattern-dependent jitter can be eﬃciently suppressed, therefore, deterministic jitter (DJ)
performance can be fairly improved [84, 85]. Furthermore, the proposed MDLL circuit in
Chapter 4, which incorporates delay ﬁne-tuning in its topology, can be used eﬃciently
along with an Alexander PD, therefore, the implementation of CDR circuits can be done
with a set of minor changes in the proposed hybrid NRZ/MT transceiver3. The timing
3The PD should be designed and implemented.
107
Figure B.2: The timing diagram of the clock signals for recovering I sub-band, which presented
in Fig 3.23.
diagram for the clock signals, which have been used in SCMF unit, has been illustrated
in Fig B.2 along with the recovered I/Q sub-bands (i.e., DIout and DQout in Fig 3.23),
which have been aligned with the other waveforms in Fig B.2. The Φ1 and Φ2 are the
DLL output phases shown in Fig 3.25.
Based on the timing diagram in Fig. B.2, the Alexander PD principle, also known as
"early-late" phase detection, can be employed to sample the recovered I/Q sub-bands at
the middle point of the eye diagrams, as it is marked on DIout and DQout in Fig. B.2. The
circuit topology for realizing the CDR circuit for the pass-band, which can replace CDR1
in Fig. B.1, is illustrated in Fig. B.3. The output of this block is then wired to the MDLL
delay ﬁne-tunning input, i.e., Vc in Fig. 4.2 and Fig. 4.3. In this circuit, the transistors
108 Appendix B. CDR Techniques for Hybrid NRZ/MT Link System
C1
D
D
Q
Q
D
D
Q
Q
D
D
Q
Q
D
D
Q
Q
?2
CKI
CKI
DIout
V/I
D
D
Q
Q
D
D
Q
Q
D
D
Q
Q
D
D
Q
Q
?1
CKQ
CKQ
DQout
V/I
M1
M2
M3
M4
Vc
Figure B.3: The proposed CDR architecture for pass-band data.
M1,2 and M3,4 are used to let the data be sampled only at Fref frequency within the
timing windows, which are provided by the Φ2 and Φ1 clock phases, respectively. Then,
the sampled data is given to the XOR gates to create "early" and "late" signals. The
XOR gates drive voltage-to-current (V/I) converters, and the two outputs of the V/I
converters are summed in current domain, and the result is applied to the integrating
capacitor, i.e., C1 in Fig. B.3. Having summed the I/Q sub-band PDs together, the ﬁnal
sampling phase will be set so as to an optimum sampling point for both of I/Q sub-bands
is achieved. A preliminary study of the performance of this type of CDR indicates that
the performance of the proposed circuit is satisﬁable, although the statistical delay caused
by M1−4 can increase DJ, hence, degrade the system jitter performance.
List of Acronyms
• 3D - Three-Dimensional
• ADC - Analog-to-Digital Converter
• AFE - Analog Front-End
• ASIC - Application-Speciﬁc Integrated Circuit
• BB - Base Band
• BER - Bit Error Rate
• BPF - Band Pass Filter
• CML - Current-Mode Logic
• CMOS - Complementary Metal-Oxyde-Semicondutor
• CP - Charg Pump
• CPU - Central Processing Unit
• CTLE - Continuous-Time Linear Equalizer
• DAC - Digital-to-Analog Converter
• DDR - Double Data Rate
• DFE - Decision Feedback Equalizer
• DIMM - Dual In-line Memory Module
• DLL - Delay Locked Loop
• DRAM - Dynamic Random Access Memory
109
110 Appendix B. CDR Techniques for Hybrid NRZ/MT Link System
• FEXT - Far-End Cross Talk
• FFE - Feed-Forward Equalizer
• FIFO - First In First Out
• FIR - Finite Impulse Response
• FOM - Figure-of-Merit
• FPGA - Field-Programmable-Gate-Array
• GDDR - Graphics Double Data Rate
• HPC - High-Performance Computing
• I/O - Input/Output
• ICI - Inter-Channel Interference
• IIR - Inﬁnite Impulse Response
• IoT - Internet of Things
• ISI - Inter-Symbol Interference
• LF - Loop Filter
• LPF - Low Pass Filter
• LPDDR - Low Power Double Data Rate
• LTI - Linear Time Invariant
• LUT - Look-Up Table
• LVDS - Low-Voltage Diﬀerential Signaling
• MDB - Multi-Drop Bus
• MDLL - Multiplying Delay Locked Loop
• MT - Multi-Tone
• NRZ - Non-Return to Zero
• NRZ/MT - Non-Return to Zero/Multi-Tone
• PAM - Pulse-Amplitude Modulation
111
• PC - Personal Computer
• PCB - Printed Circuit Board
• PDF - Probability Density Function
• PLL - Phase-Locked Loop
• QAM - Quadrature Amplitude Modulation
• QPSK - Quadrature Phase Shift Keying
• RAM - Random Access Memory
• RX - Receiver
• RZ - Return to Zero
• SC - Switch Capacitor
• SCMF - Switched-Capacitor Mixer/Filter
• SoC - System-on-Chip
• SPI - Serial Peripheral Interface
• SRAM - Static Random Access Memory
• TX - Transmitter
• TRX - Transceiver
• V/I - Voltage-to-Current
• XDFE - Cross Decision Feedback Equalizer
• ZF - Zero Forcing

Bibliography
[1] Cisco VNI forecast widget @ONLINE, Feb. 2015. http://ciscovni.com/
forecast-widget/index.html.
[2] http://www.google.com/about/datacenters/gallery/#/tech.
[3] Gray Nichol. Chip-to-module interface requirements a system vendor’s perspective.
In OIF 400G workshop, Feb. 2014.
[4] Ken Chang, G. Zhang, and C. Borrelli. Evolution of wireline transceiver standards:
Various, most-used standards for the bandwidth demand. IEEE Solid-State Circuits
Mag., 7(4):47–52, Fall 2015.
[5] T. Anand, M. Talegaonkar, A. Elkholy, S. Saxena, A. Elshazly, and P.K. Hanumolu. A
7 Gb/s embedded clock transceiver for energy proportional links. IEEE J. Solid-State
Circuits, 50(12):3101–3119, Dec. 2015.
[6] S. Narendra, L. Fujino, and K. Smith. Through the looking glass?the 2015 edition:
trends in solid-state circuits from ISSCC. IEEE Solid-State Circuits Mag., 7(1):14–24,
winter 2015.
[7] T. Beukema, M. Sorna, K. Selander, S. Zier, B.L. Ji, P. Murfet, J. Mason, Woogeun
Rhee, H. Ainspan, B. Parker, and M. Beakes. A 6.4-Gb/s CMOS SerDes core
with feed-forward and decision-feedback equalization. IEEE J. Solid-State Circuits,
40(12):2633–2645, Dec. 2005.
[8] T. Toiﬂ, M. Ruegg, R. Inti, C. Menolﬁ, M. Brandli, M. Kossel, P. Buchmann, P.A.
Francese, and T. Morf. A 3.1mW/Gbps 30Gbps quarter-rate triple-speculation
15-tap SC-DFE RX data path in 32nm CMOS. In IEEE VLSI Symp. Dig. Tech.
Papers, pages 102–103, June 2012.
[9] A. Cevrero, Cosimo Aprile, P. A. Francese, U. Bapst, C. Menolﬁ, et al. A 5.9mW/Gb/s
7Gb/s/pin 8-lane single-ended RX with crosstalk cancellation scheme using a XCTLE
113
114 Bibliography
and 56-tap XDFE in 32nm SOI CMOS. In IEEE VLSI Symp. Dig. Tech. Papers,
pages C228–C229, June 2015.
[10] A. Amirkhany, A. Abbasfar, V. Stojanovic, and M.A. Horowitz. Practical limits of
multi-tone signaling over high-speed backplane electrical links. In Communications,
2007. ICC ’07. IEEE International Conference on, pages 2693–2698, June 2007.
[11] A. Amirkhany, A. Abbasfar, J. Savoj, M. Jeeradit, B. Garlepp, R.T. Kollipara,
V. Stojanovic, and M. Horowitz. A 24 Gb/s software programmable analog multi-
tone transmitter. IEEE J. Solid-State Circuits, 43(4):999–1009, Apr. 2008.
[12] Gyung-Su Byun, Yanghyo Kim, Jongsun Kim, Sai-Wang Tam, and M.-C.F. Chang.
An energy-eﬃcient and high-speed mobile memory I/O interface using simultaneous
bi-directional dual (base+RF)-band signaling. IEEE J. Solid-State Circuits, 47(1):117–
130, Jan. 2012.
[13] S. Ibrahim and B. Razavi. Design requirements of 20-Gb/s serial links using multi-
tone signaling. In Signals, Circuits and Systems, ISSCS 2009 Symposium on, pages
1–4, July 2009.
[14] Bo Zhang, K. Khanoyan, H. Hatamkhani, H. Tong, K. Hu, S. Fallahi, M. Abdul-
Latif, K. Vakilian, I. Fujimori, and A. Brewster. A 28 Gb/s multistandard serial
link transceiver for backplane applications in 28 nm CMOS. IEEE J. Solid-State
Circuits, 50(12):3089–3100, Dec. 2015.
[15] OIF CEI-25G-LR OIF2008.161.12 . CEI-25G-LR Long Reach Interface.
[16] CEI-25G-LR and CEI-28G-VSR multi-vendor interoperability testing. CEI-28G-VSR
Very Short Reach Interface, 2012.
[17] CEI-28G-SR OIF2008.029.12. CEI-28G-SR Short Reach Interface.
[18] K. Gopalakrishnan, A. Ren, A. an, A. Farhood, et al. A 40/50/100Gb/s PAM-4
ethernet transceiver in 28nm CMOS. In IEEE Int. Solid-State Circuits Conf. (ISSCC)
Dig. Tech. Papers, Feb. 2016.
[19] H-Y. Joo, S-J. Bae, Y-S. Sohn, Y-S. Kim, et al. A 20nm 9Gb/s/pin 8gb GDDR5 dram
with an NBTI monitor, jitter reduction techniques and improved power distribution.
In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2016.
[20] K. Chang, F. O’Mahony, E. Alon, Hyeon-Min Bae, N. Da Dalt, and E. Fluhr. F6:
I/O design at 25Gb/s and beyond: Enabling the future communication infrastructure
for big data. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
pages 1–2, Feb. 2015.
Bibliography 115
[21] F. O’Mahony, G. Balamurugan, J.E. Jaussi, J. Kennedy, M. Mansuri, S. Shekhar,
and B. Casper. The future of electrical i/o for microprocessors. In VLSI Design,
Automation and Test, 2009. VLSI-DAT ’09. International Symposium on, pages
31–34, April 2009.
[22] K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, N. Masuda, T. Takemoto,
F. Yuki, and T. Saito. A 12.3-mW 12.5-Gb/s complete transceiver in 65-nm CMOS
process. IEEE J. Solid-State Circuits, 45(12):2838–2849, Dec. 2010.
[23] M. Mansuri, J.E. Jaussi, J.T. Kennedy, Tzu-Chien Hsueh, S. Shekhar, G. Balamuru-
gan, F. O’Mahony, C. Roberts, R. Mooney, and B. Casper. A scalable 0.128-1 Tb/s,
0.8-2.6 pJ/bit, 64-lane parallel I/O in 32-nm CMOS. IEEE J. Solid-State Circuits,
48(12):3229–3242, Dec. 2013.
[24] Kiarash Gharibdoust, Armin Tajalli, and Yusuf Leblebici. A 7.5mW 7.5Gb/s mixed
NRZ/multi-tone serial-data transceiver for multi-drop memory interfaces in 40nm
CMOS. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pages
180–181, Feb. 2015.
[25] K. Therdsteerasukdi, Gyung-Su Byun, J. Ir, G. Reinman, J. Cong, and M.-C.F.
Chang. Utilizing radio-frequency interconnect for a many-DIMM DRAM system.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2(2):210–227, June 2012.
[26] Kiarash Gharibdoust, Armin Tajalli, and Yusuf Leblebici. A 4x9 Gb/s 1 pJ/b
NRZ/multi-tone serial-data transceiver with crosstalk reduction architecture for
multi-drop memory interfaces in 40nm CMOS. In IEEE VLSI Symp. Dig. Tech.
Papers, pages C180–C181, June 2015.
[27] T. Toiﬂ, C. Menolﬁ, M. Ruegg, R. Reutemann, D. Dreps, et al. A 2.6 mW/Gbps 12.5
Gbps RX with 8-tap switched-capacitor DFE in 32 nm CMOS. IEEE J. Solid-State
Circuits, 47(4):897–910, Apr. 2012.
[28] V. Stojanovic, A. Ho, B.W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R.T.
Kollipara, C.W. Werner, J.L. Zerbe, and M.A. Horowitz. Autonomous dual-mode
(PAM2/4) serial link transceiver with adaptive equalization and data recovery. IEEE
J. Solid-State Circuits, 40(4):1012–1026, Apr. 2005.
[29] A. Manian and B. Razavi. A 40-Gb/s 9.2-mW CMOS equalizer. In IEEE VLSI
Symp. Dig. Tech. Papers, pages C226–C227, June 2015.
[30] Jun Won Jung and B. Razavi. A 25-Gb/s 5-mW cmos CDR/deserializer. In IEEE
VLSI Symp. Dig. Tech. Papers, pages 138–139, June 2012.
116 Bibliography
[31] J.F. Bulzacchelli. Equalization for electrical links: Current design techniques and
future directions. IEEE Solid-State Circuits Mag., 7(4):23–31, Fall 2015.
[32] S. Kasturia and J.H. Winters. Techniques for high-speed implementation of nonlinear
cancellation. IEEE J. Sel. Areas Commun., 9(5):711–717, Jun 1991.
[33] J.F. Bulzacchelli, A.V. Rylyakov, and D.J. Friedman. Power-eﬃcient decision-
feedback equalizers for multi-Gb/s CMOS serial links. In IEEE Radio Frequency
Integrated Circuits (RFIC) Symp., pages 507–510, June 2007.
[34] B. Saltzberg. Performance of an eﬃcient parallel data transmission system. Commu-
nication Technology, IEEE Transactions on, 15(6):805–811, Dec. 1967.
[35] S. Weinstein and P. Ebert. Data transmission by frequency-division multiplexing
using the discrete fourier transform. Communication Technology, IEEE Transactions
on, 19(5):628–634, Oct. 1971.
[36] B. Hirosaki. An orthogonally multiplexed QAM system using the discrete fourier
transform. Communications, IEEE Transactions on, 29(7):982–989, Jul. 1981.
[37] A. Hormati, A. Tajalli, C. Walter, K. Gharibdoust, and A. Shokrollahi. A versatile
spectrum shaping scheme for communicating beyond notches in multi-drop interfaces.
In DesignCon 2016, Santa Clara, CA, USA, Jan. 2016.
[38] Kyomin Sohn, Taesik Na, Indal Song, Yong Shim, Wonil Bae, et al. A 1.2 V 30
nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM with dual-error detection and PVT-tolerant
data-fetch scheme. IEEE J. Solid-State Circuits, 48(1):168–177, Jan. 2013.
[39] Addendum No. 1 to JESD79-3 - 1.35 V DDR3L-800, DDR3L-1066, DDR3L-1333,
DDR3L-1600, and DDR3L-1866. JEDEC Std., JESD79-3-1A.01, May 2013.
[40] B. Jacob, D. Wang, and S. Ng. Memory Systems: Cache, DRAM, Disk. Morgan
Kaufmann, 2008.
[41] F. Aryanfar and A. Amirkhany. A low-cost resonance mitigation technique for
multidrop memory interfaces. IEEE Trans. Circuits Syst. II, 57(5):339–342, May
2010.
[42] Kiarash Gharibdoust, Armin Tajalli, and Yusuf Leblebici. A 4x9 Gb/s 1 pJ/b hybrid
NRZ/multi-tone I/O with crosstalk and ISI reduction for dense interconnects. IEEE
J. Solid-State Circuits, in press, 2016.
[43] Behzad Razavi. Design of Integrated Circuits for Optical Comunications. McGraw-
Hill, Avenue of the Americas, NY, USA, 1 edition, 2003.
Bibliography 117
[44] S. Sidiropoulos and M. Horowitz. A 700-Mb/s/pin CMOS signaling interface using
current integrating receivers. Solid-State Circuits, IEEE Journal of, 32(5):681–690,
May 1997.
[45] Behzad Razavi. RF Microelectronics. Prentice Hall Press, Upper Saddle River, NJ,
USA, 2 edition, 2012.
[46] R. Reutemann, M. Ruegg, F. Keyser, J. Bergkvist, D. Dreps, T. Toiﬂ, and M. Schmatz.
A 4.5mW/Gb/s 6.4Gb/s 22+1-lane source-synchronous link RX core with optional
cleanup PLL in 65nm CMOS. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, pages 160–161, Feb. 2010.
[47] A. Tajalli and S.M. Atarodi. A compact biquadratic gm-C ﬁlter structure for low-
voltage and high frequency applications. In Proc. IEEE Int. Symp. Circuits and
Systems (ISCAS), May 2003.
[48] Kwang-Il Oh, Lee-Sup Kim, Kwang-Il Park, Young-Hyun Jun, Joo Sun Choi, and
Kinam Kim. A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk
suppression scheme. IEEE J. Solid-State Circuits, 44(8):2222–2232, Aug. 2009.
[49] Young-Sik Kim, Seon-Kyoo Lee, Seung-Jun Bae, Young-Soo Sohn, Jung-Bae Lee,
Joo Sun Choi, Hong-June Park, and Jae-Yoon Sim. An 8Gb/s quad-skew-cancelling
parallel transceiver in 90nm CMOS for high-speed DRAM interface. In IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pages 136–138, Feb. 2012.
[50] Woo-Yeol Shin, Gi-Moon Hong, Hyongmin Lee, Jae-Duk Han, Sunkwon Kim, Kyu-
Sang Park, Dong-Hyuk Lim, Jung-Hoon Chun, Deog-Kyoon Jeong, and Suhwan Kim.
A 4.8Gb/s impedance-matched bidirectional multi-drop transceiver for high-capacity
memory interface. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, pages 494–496, Feb. 2011.
[51] K. Kaviani, M. Bucher, B. Su, B. Daly, B. Stonecypher, et al. A 6.4Gb/s near-ground
single-ended transceiver for dual-rank DIMM memory interface systems. In IEEE
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pages 306–307, Feb. 2013.
[52] Mingta Hsieh and G. Sobelman. Architectures for multi-gigabit wire-linked clock
and data recovery. IEEE Circuits Syst. Mag., 8(4):45–57, Fourth Quarter 2008.
[53] P. Gui, F.E. Kiamilev, Xiaoqing Wang, M.J. MacFadden, Xingle Wang, Nick Waite,
M.W. Haney, and C. Kuznia. A source-synchronous double-data-rate parallel optical
transceiver ic. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 13(7):833–842,
July 2005.
118 Bibliography
[54] G. Chien and P.R. Gray. A 900-MHz local oscillator using a DLL-based frequency
multiplier technique for PCS applications. IEEE J. Solid-State Circuits, 35(12):1996–
1999, Dec. 2000.
[55] B. Razavi. The role of PLLs in future wireline transmitters. IEEE Trans. Circuits
Syst. I, 56(8):1786–1793, Aug. 2009.
[56] A. Elshazly, R. Inti, B. Young, and P.K. Hanumolu. Clock multiplication techniques
using digital multiplying delay-locked loops. IEEE J. Solid-State Circuits, 48(6):1416–
1428, June 2013.
[57] M.-J.E. Lee, W.J. Dally, T. Greer, Hiok-Tiaq Ng, R. Farjad-rad, J. Poulton, and
R. Senthinathan. Jitter transfer characteristics of delay-locked loops - theories and
design techniques. IEEE J. Solid-State Circuits, 38(4):614–621, Apr. 2003.
[58] Keng-Jan Hsiao and Tai-Cheng Lee. The design and analysis of a fully integrated mul-
tiplying DLL with adaptive current tuning. IEEE J. Solid-State Circuits, 43(6):1427–
1435, June 2008.
[59] Tai-Cheng Lee and Keng-Jan Hsiao. The design and analysis of a DLL-based
frequency synthesizer for UWB application. IEEE J. Solid-State Circuits, 41(6):1245–
1252, June 2006.
[60] Chi-Nan Chuang and Shen-Iuan Liu. A 40 GHz DLL-based clock generator in 90nm
CMOS technology. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, pages 178–595, Feb. 2007.
[61] Fang-Ren Liao and Shey-Shi Lu. A programmable edge-combining DLL with a
current-splitting charge pump for spur suppression. IEEE Trans. Circuits Syst. II,
57(12):946–950, Dec. 2010.
[62] Hae-Kang Jung, Kyoungho Lee, Jong-Sam Kim, Jae-Jin Lee, Jae-Yoon Sim, and
Hong-June Park. A 4 Gb/s 3-bit parallel transmitter with the crosstalk-induced
jitter compensation using TX data timing control. IEEE J. Solid-State Circuits,
44(11):2891–2900, Nov. 2009.
[63] J.F. Bulzacchelli, C. Menolﬁ, T.J. Beukema, D.W. Storaska, J. Hertle, et al. A
28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32-nm SOI cmos technology.
IEEE J. Solid-State Circuits, 47(12):3232–3248, Dec. 2012.
[64] T. Musah, J. Jaussi, G. Balamurugan, S. Hyvonen, T.-C. Hsueh, et al. A 4-32
Gb/s bidirectional link with 3-tap FFE/6-tap DFE and collaborative CDR in 22 nm
CMOS. IEEE J. Solid-State Circuits, 49(12):3079–3090, Dec. 2014.
Bibliography 119
[65] Taehyoun Oh and R. Harjani. A 12-Gb/s multichannel I/O using MIMO crosstalk
cancellation and signal reutilization in 65-nm CMOS. IEEE J. Solid-State Circuits,
48(6):1383–1397, June 2013.
[66] G. Balamurugan, J. Kennedy, G. Banerjee, J.E. Jaussi, M. Mansuri, F. O’Mahony,
B. Casper, and R. Mooney. A scalable 5-15 Gbps, 14-75 mw low-power I/O transceiver
in 65 nm CMOS. IEEE J. Solid-State Circuits, 43(4):1010–1019, Apr. 2008.
[67] W.T. Beyene and A. Amirkhany. Controlled intersymbol interference design tech-
niques of conventional interconnect systems for data rates beyond 20 Gbps. IEEE
Trans. Advanced Packaging, 31(4):731–740, Nov. 2008.
[68] W. Volkaerts, N. Van Thienen, and P. Reynaert. An FSK plastic waveguide commu-
nication link in 40nm CMOS. In IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, pages 178–179, Feb. 2015.
[69] M.H. Nazari and A. Emami-Neyestanak. A 15-Gb/s 0.5-mW/Gbps two-tap DFE
receiver with far-end crosstalk cancellation. IEEE J. Solid-State Circuits, 47(10):2420–
2432, Oct. 2012.
[70] Taehyoun Oh and R. Harjani. A 6-Gb/s MIMO crosstalk cancellation scheme for
high-speed I/Os. IEEE J. Solid-State Circuits, 46(8):1843–1856, 2011.
[71] Kiarash Gharibdoust, Armin Tajalli, and Yusuf Leblebici. Hybrid NRZ/multi-tone
serial data transceiver for multi-drop memory interfaces. IEEE J. Solid-State Circuits,
50(12), 2015.
[72] B.K. Casper, M. Haycock, and R. Mooney. An accurate and eﬃcient analysis method
for multi-Gb/s chip-to-chip signaling schemes. In IEEE VLSI Symp. Dig. Tech.
Papers, pages 54–57, June 2002.
[73] V. Stojanovic and M. Horowitz. Modeling and analysis of high-speed links. In IEEE
Custom Integr. Circuits Conf., pages 589–594, Sept. 2003.
[74] B. Ahmad. Performance speciﬁcation of interconnect. In DesignCon 2003, Santa
Clara, CA, USA, Jan. 2003.
[75] Kyung Suk Oh, F. Lambrecht, Sam Chang, Qi Lin, Jihong Ren, Chuck Yuan,
J. Zerbe, and V. Stojanovic. Accurate system voltage and timing margin simulation
in high-speed I/O system designs. IEEE Trans. Adv. Packag., 31(4):722–730, Nov.
2008.
[76] J. Caroselli and C. Liu. An analytic system model for high speed interconnects and
its application to the speciﬁcation of signaling and equalization architectures for 10
120 Bibliography
Gbps backplane communication. In DesignCon 2004, Santa Clara, CA, USA, Jan.
2006.
[77] Kangmin Hu, L. Wu, and P.Y. Chiang. A comparative study of 20-Gb/s NRZ and
duobinary signaling using statistical analysis. IEEE Trans. VLSI Syst., 20(7):1336–
1341, July 2012.
[78] Kangmin Hu, Tao Jiang, Jingguang Wang, F. O’Mahony, and P.Y. Chiang. A 0.6
mW/Gb/s, 6.4-7.2 Gb/s serial link receiver using local injection-locked ring oscillators
in 90 nm CMOS. IEEE J. Solid-State Circuits, 45(4):899–908, Apr. 2010.
[79] M. Hossain and A.C. Carusone. 7.4 Gb/s 6.8 mW source synchronous receiver in 65
nm CMOS. IEEE J. Solid-State Circuits, 46(6):1337–1348, June 2011.
[80] Sang-Hye Chung, Lee-Sup Kim, Seung-Jun Bae, Kyoung-Soo Ha, Jung-Bae Lee, and
Joo Sun Choi. An 8Gb/s forwarded-clock I/O receiver with up to 1GHz constant
jitter tracking bandwidth using a weak injection-locked oscillator in 0.13 um CMOS.
In IEEE VLSI Symp. Dig. Tech. Papers, pages 84–85, June 2011.
[81] Seong-Jun Song, Sung Min Park, and Hoi-Jun Yoo. A 4-Gb/s CMOS clock and
data recovery circuit using 1/8-rate clock technique. IEEE J. Solid-State Circuits,
38(7):1213–1219, July 2003.
[82] J.L. Sonntag and J. Stonick. A digital clock and data recovery architecture for
multi-gigabit/s binary links. IEEE J. Solid-State Circuits, 41(8):1867–1875, Aug.
2006.
[83] J.D.H. Alexander. Clock recovery from random binary signals. Electronics Letters,
11(22):541–542, Oct. 1975.
[84] Jri Lee, K.S. Kundert, and B. Razavi. Analysis and modeling of bang-bang clock and
data recovery circuits. IEEE J. Solid-State Circuits, 39(9):1571–1580, Sept. 2004.
[85] Jri Lee and Ke-Chung Wu. A 20-Gb/s full-rate linear clock and data recovery circuit
with automatic frequency acquisition. IEEE J. Solid-State Circuits, 44(12):3590–3602,
Dec. 2009.
Kiarash Gharibdoust 
Chemin des Abbesses 18 
1027 Lonay, Switzerland 
+41(0)787006612 
Nationality: Iranian (Swiss B permit) 
Birth Year: 1984 
kiarash.gharibdoust@gmail.com 
https://www.linkedin.com/in/kiarashgharibdoust 
 
PROFILE 
• Proactive R&D and Analog/Mixed-signal circuit designer with strong motivation in seeking innovative 
solution for cutting-edge IC products. 
• Expertise in high-speed wireline and wireless custom IC design in advanced CMOS technology node. 
• Remarkable presentation skill sharpened through experiences at prestigious international conferences.   
 
WORK EXPERIENCE 
EPFL, Microelectronic System Laboratory (LSM) Lausanne, Switzerland 
Research Assistant February 2012 – February 2016 
• Hybrid NRZ/Multi-Tone Serial Data Transceiver for Memory Interfaces, mandated by Kandou Bus SA 
• High-Speed and Low-Power Multi-Phase Clock Generation for Wireline Applications 
• Crosstalk Reduction Techniques for Parallel I/O in Dense Interconnects 
Kandou Bus SA Lausanne, Switzerland 
Contractor February 2012 – July 2012 
 July 2015 – August 2015  
• Advanced Low-Power Techniques for High-Speed Wireline Transceivers 
• Multi-Tone Signaling for Communicating beyond 100Gb/s/pin over Backplane Channel 
Sharif University of Technology, Integrated System Design Laboratory (ISDL) Tehran, Iran 
Junior Analog Design Engineer March 2010 – December 2011 
• Design and Implementation of X-Band Phased-Array Transceiver in 180nm CMOS 
Sharif University of Technology, Electronics Research Institute (ERI) Tehran, Iran 
Developer Engineer December 2008 – December 2009 
• Design and Software Implementation of Audio Streaming Channel over ZigBee Network 
Sharif University of Technology, Integrated System Design Laboratory (ISDL) Tehran, Iran 
Research Assistant September 2006 – October 2008 
• Design and Implementation of Low-Noise Band-Pass Filter for FM Receivers  
EDUCATION 
Swiss Federal Institute of Technology in Lausanne (EPFL) Lausanne, Switzerland 
Doctoral of Philosophy (PhD) in Microsystems and Microelectronics Feb. 2012- Feb. 2016 
Supervisor: Prof. Yusuf Leblebici 
Sharif University of Technology (SUT) Tehran, Iran 
Master in Electronic Circuits (Major of Microelectronic Circuits) Sept. 2006- Oct. 2008 
Supervisor: Prof. Mehrdad Sharif Bakhtiar  
Iran University of Science and Technology (IUST) Tehran, Iran 
Bachelor in Electrical Engineering (Major of Electronics)  Sept. 2002- Aug. 2006 
AWARDS /  HONORS 
• IEEE Solid-State Circuits Society Student Travel Grant Award (STGA) for ISSCC 2016. 
• ISSCC Certificate of Recognition for 2015 Demonstration Session. 
• Ranked 1st among Bachelor graduates, with highest honors, Iran University of Science and Technology. 
• Ranked 2nd (Silver Medal) in 2006 National Electrical Engineering Olympiad. 
• Ranked 2nd in Nationwide M.Sc. Entrance Exam of more than 14,000 participants.  
SKILLS /  INTERESTS 
• Language: English: fluent (C1), French: upper intermediate (B2), Farsi (Native speaker)  
• CAD Tools: Cadence Virtuoso, Calibre, Agilent ADS, Momentum ADS, ANSYS HFSS, HSPICE, 
Modelsim, Encounter Digital Implementation (EDI), Altium designer 
• Programing Language: Verilog, C, VHDL, Verilog-A, Matlab 
• Text editing and OS: Latex, Lyx, MS Office Suite, Adobe Illustrator, MS Windows, Linux, Mac OSX 
• Extracurricular Activities: Alpine sports, Cycling, Persian-Swiss fusion cuisine, travelling 
MAIN PUBLICATIONS 
• K. Gharibdoust, A. Tajalli, Y. Leblebici, “A 7.5mW 7.5Gb/s Mixed NRZ/Multi-Tone Serial Data Transceiver 
for Multi-Drop Memory Interfaces in 40nm CMOS,” ISSCC Dig. Tech. Papers, pp. 180-181, 2015. 
• K. Gharibdoust, A. Tajalli, Y. Leblebici, “A 4×9 Gb/s 1 pJ/b NRZ/Multi-Tone Serial-Data Transceiver with 
Xtalk Reduction Architecture for Multi-Drop Memory Interfaces in 40nm CMOS,” Symposium on VLSI 
Circuit (VLSIC), pp. C180-C181, June 2015. 
• K. Gharibdoust, A. Tajalli, Y. Leblebici, “Hybrid NRZ/Multi-Tone Serial Data Transceiver for Multi-Drop 
Memory Interfaces,” IEEE Journal of Solid State Circuits (JSSC), vol. 50, num. 12, 2015. 
• K. Gharibdoust, A. Tajalli, Y. Leblebici, “A 4×9 Gb/s 1 pJ/b Hybrid NRZ/Multi-Tone I/O with Crosstalk and 
ISI Reduction for Dense Interconnects,” IEEE Journal of Solid State Circuits (JSSC), vol. 51, num. 4, 2016. 
• K. Gharibdoust, N. Mousavi, M. Kalantari, M. Moezzi, A. Medi, “A Fully Integrated 0.18-um CMOS 
Transceiver Chip for X-Band Phased-Array Systems,” IEEE Transaction on Microwave Theory and 
Techniques (MTT), vol. 60, num. 7, pp. 2192-2202, 2012. 
• K. Gharibdoust, M. Sharif Bakhtiar, “A Method for Noise Reduction in Active-RC Circuits,” IEEE 
Transaction on Circuits and Systems-II, Express Briefs (TCAS-II), vol. 58, num. 12, pp. 906-910, 2011. 

