System-Aware Design of Energy-Efficient High-Speed I/O Links by Lakshmi Narasimha, Rajan
c© 2011 Rajan Lakshmi Narasimha
SYSTEM-AWARE DESIGN OF ENERGY-EFFICIENT HIGH-SPEED I/O LINKS
BY
RAJAN LAKSHMI NARASIMHA
DISSERTATION
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2011
Urbana, Illinois
Doctoral Committee:
Professor Naresh Shanbhag, Chair
Professor Andrew Singer
Professor Emeritus Dilip Sarwate
Professor Jose´ Schutt-Aine´
ABSTRACT
Today’s high-speed I/O links operate under stringent specifications: few tens of Gb/s data
rates over 20 inches of copper trace, power efficiencies of the order of 10-to-30 mW/Gb/s
and a bit error-rate (BER) target of 10−12 or lower. State-of-the-art I/O links consist of a
transmit driver, PLL, equalizer, clock recovery unit and comparator, primarily implemented
with mixed-signal components, to achieve the desired BER without forward error correction
(FEC). More recently, analog-to-digital converter (ADC) based receivers have gained accep-
tance owing to the advantages of digital-intensive designs, viz., their propensity to benefit
from technology scaling and Moore’s law. In this thesis, we propose a system-assisted mixed-
signal (SAMS) design approach, wherein mixed-signal components of a communication link
are designed to meet a system-level performance metric such as BER, in contrast to the
component-level design techniques prevalent today.
First, we propose FEC-assisted I/O link design, where the FEC coding-gain is leveraged to
relax the specifications of the mixed-signal components of such links, and achieve improved
energy-efficiency. In particular, we demonstrate that FEC coding gain can be leveraged to
achieve improvements in transmit driver swing, ADC precision, timing jitter and comparator
offset specifications necessary to meet the link BER specification. We demonstrate that
through a combination of coding and modulation, improvements in link energy-efficiency
can be achieved. We propose binary BCH codes, as these codes offer sufficient coding-
gain at moderate to high code-rates, and can be implemented at low power via techniques
presented in this thesis. Further, we propose an accurate statistical model to evaluate the
impact of FEC on DFE-based links, and employ this model to evaluate random and burst
ii
error-correcting binary BCH codes. This is necessary to evaluate FEC performance at the
low error rates of interest, where simulations are not feasible.
Second, we propose the design of low precision analog-to-digital converters for high-speed
I/O links based on the BER metric. In such an ADC, referred to as a BER-optimal ADC,
the quantization levels and thresholds are optimized based on BER. We demonstrate the
benefits of BER-optimal ADCs for typical high-speed I/O links. Further, we propose an
adaptation algorithm called AMBER (approximate minimum BER), for quantization levels
and thresholds. An architecture to implement this algorithm is proposed and evaluated to
prove that this algorithm can be implemented in practice.
Finally, we address the issue of creating models for mixed-signal components that would
facilitate link optimization based on the SAMS approach. Such a model should capture
the system-level behavior of the component when it operates in an unconventional, low-
power, error-prone performance envelope. We study the example of a digital latch, which
finds widespread application in high-speed ADCs and PLLs. We develop an input-swing
dependent finite state machine (FSM) model of such a latch in order to capture performance-
power trade-offs.
iii
To my adviser and my family
iv
ACKNOWLEDGMENTS
This thesis would not have been possible without the support and encouragement of profes-
sors and colleagues whose inspiring presence made my years at the University of Illinois an
enriching, multi-faceted experience.
First and foremost, I would like to thank my adviser Prof. Naresh Shanbhag for his un-
stinting support and encouragement through six years of graduate study. His role as a guide,
philosopher and friend have been integral in my professional and personal development over
these years. My interactions with him have significantly enriched my thinking, philosophy
and attitude towards life. He leads by example, and sets very high standards of discipline,
punctuality and hard work. I have been awed by his command over multiple disciplines such
as communication theory, signal processing and circuits. I would like to specifically express
my deep gratitude to him for recommending me to Texas Instruments, for a summer intern-
ship position in 2007. I am also grateful to him for his great help and support in reviewing
my thesis drafts.
I would like to thank Prof. Elyse Rosenbaum for the many engaging discussions and inputs
during weekly group meetings. Her guidance and presence greatly helped me perceive ideas
in communication theory and signal processing from the point of view of a circuit designer.
It also helped improve my skills while addressing an audience from a different technical
community.
I would also like to thank Prof. Andrew Singer for several insightful discussions during
the last two years of my graduate study. His intellect, approach to problem solving and
wonderful sense of humor have truly left me humbled.
v
My very sincere thanks to Prof. Dilip Sarwate and Prof. Jose´ Schutt-Aine´ for serving on
my PhD committee. I am grateful for the many fruitful discussions which helped refine my
ideas and provided me with new problems. I greatly appreciate the patience and encourage-
ment of all the mentioned faculty members through the course of my PhD. Their comments
and suggestions have helped greatly improve this thesis.
I would like to acknowledge the significant role played by my colleagues at UIUC and
Texas Instruments for sharing their intuition during grad school and summer internships
respectively. I would like to make special mention of the following colleagues and office
mates: Nirmal Warke and Andy Joy (TI), Arshad Ahmed, Rami Abdallah, Girish Varatkar,
Jayanand Asok Kumar, Adam Faust, Ankit Srivastava, Chhay Kong, Karan Bhatia, Eric
Kim, Samer Ghanem, Yu Hung, Aolin Xu, Yingyan Lin, Peter Kairouz and Gong Zhang
(UIUC).
On the personal front, I would have had to plead insanity if not for the following bud-
dies from school and elsewhere: Anjan, Jayanand, JK, Dinesh, Prasad, Hemant, Chai-
tanya, Shankar Sivaramakrishnan, Shankar Sadasivam, Kunal, Anand, Gayatri, Sujana, Vi-
jay, Arvind, Sreeram, Jaykrishnan, Jagan, Anil and Rami (UIUC); Keya (Stanford), Smita
(Phoenix) and Ramya (North Carolina); Viswanath (CMU), Purushottam (UCLA) and
Venkat (Cornell).
Finally, I would like to express my deepest love and gratitude to my parents for always
having encouraged me to do what I love most. Their role and immense sacrifices in developing
and nurturing my interest in academics and engineering and enabling me pursue a doctoral
degree have been exemplary.
The material in this thesis is based upon work supported by SRC under Tasks 1305 and
1836. I would like to thank SRC for supporting my research.
vi
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Application Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Noise and Other Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 System-Assisted Mixed-Signal Design . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER 2 FORWARD ERROR CORRECTION IN HIGH-SPEED I/O LINKS . 13
2.1 Forward Error-Control (FEC): Preliminaries . . . . . . . . . . . . . . . . . . 14
2.2 Statistical Link Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Coding Gain vs. ISI Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Coding Gain vs. Energy-Efficiency Trade-Offs . . . . . . . . . . . . . . . . . 22
2.5 Low-Power FEC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 iOpener: System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
CHAPTER 3 IMPACT OF FEC ON DFE-BASED I/O LINKS . . . . . . . . . . . 36
3.1 Modeling DFE Error Propagation . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 FEC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Latency vs. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
CHAPTER 4 BER-OPTIMAL ADC ARCHITECTURE . . . . . . . . . . . . . . . 50
4.1 ADC Design Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Analysis and Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vii
CHAPTER 5 ADAPTATION ALGORITHMS AND EQUALIZER ARCHITEC-
TURES FOR BER-OPTIMAL ADCS . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1 Fixed Reference Level ADC, Fixed Coefficient Equalizer: Precision Re-
quirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 ADC Reference Level Adaptation Algorithm . . . . . . . . . . . . . . . . . . 74
5.3 Adaptive Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
CHAPTER 6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . 84
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2 Modeling of Analog Mixed-Signal Components for SAMS-Based Design . . . 85
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
viii
LIST OF TABLES
2.1 FEC vs. ADC Power, BER = 10−12, Vdd = 1.2 V . . . . . . . . . . . . . . . 26
2.2 Exploiting Coding-Gain to Improve Jitter Tolerance: TX Jitter Permissi-
ble at BER = 10−12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Transceiver Performance Summary . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Transceiver Power and FOM Summary . . . . . . . . . . . . . . . . . . . . . 33
3.1 Markov Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Validation of Error Pattern Statistic Computation . . . . . . . . . . . . . . . 44
3.3 Codes Evaluated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 3-bit ADC: SQNR vs. SNR (dB) . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1 Power(µW) and Area(µm2) Comparison . . . . . . . . . . . . . . . . . . . . 74
5.2 Finite-Precision BER Comparison . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Complexity Comparison (Full-Adders (FAs)) . . . . . . . . . . . . . . . . . . 83
ix
LIST OF FIGURES
1.1 I/O link block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Back-plane system cross-section indicating different sections of the signal-
ing path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Frequency response of a 20” FR4 back-plane channel [1]. . . . . . . . . . . . 4
1.4 Frequency dependent attenuation and distortion results in a transmitted
symbol spreading out over several symbol periods when it arrives at the
receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Analog peaking equalization at receiver using RC degeneration. . . . . . . . 6
1.6 Mixed-signal transmit FIR filter. Currents I[0], I[1] and I[2] are set pro-
portional to the tap-weights w[0], w[1] and w[2] using a digital-to-analog
converter (DAC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Mixed-signal design approaches: a) conventional vs. digitally-assisted ana-
log (DAA), and b) system-assisted mixed-signal (SAMS). . . . . . . . . . . . 9
2.1 An FEC based high-speed link. . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Simplified ISI-channel and additive noise model. . . . . . . . . . . . . . . . . 17
2.3 Performance of Hamming codes. . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Performance of a (327, 265, 7) BCH code. . . . . . . . . . . . . . . . . . . . 22
2.5 Coding gain vs. transmit swing using a (255, 247, 1) code at low (0.01 unit
interval (UI) rms) and high (0.03 UI rms) transmit jitter for 2-PAM and
4-PAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 TX power vs. swing Vsw and FEC power vs t, evaluated in 90 nm IBM CMOS. 25
2.7 Coding gain vs. ADC precision using a (255,247,1) code for 2-PAM and
4-PAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 BER versus voltage offset for 2-PAM and 4-PAM. . . . . . . . . . . . . . . . 28
2.9 FEC system architecture: a) parallel FEC, and b) performance benefits. . . . 30
2.10 Proposed low-power BCH decoder architecture. . . . . . . . . . . . . . . . . 31
2.11 iOpener test chip architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.12 iOpener layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Markov chain state transitions for a 2-tap DFE. Only some transitions are
illustrated for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Trellis paths of weights j. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Trellis paths of burst length j. . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Error statistics in 50 bit block. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
x
3.5 Effect of error propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Performance evaluation of block codes. . . . . . . . . . . . . . . . . . . . . . 47
3.7 Performance evaluation of block codes. . . . . . . . . . . . . . . . . . . . . . 47
4.1 Role of an ADC in a communication link: a) block diagram of a commu-
nication link, b) functional diagram of an ADC, and c) eye diagram and
PDF of the sampled received signal xc[nT ]. . . . . . . . . . . . . . . . . . . . 51
4.2 Detection based ADC design example: (a) a communication link where
the receiver ADC acts as detector, and (b) signal distribution at the ADC
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Validating BER analysis through simulation and importance sampling. . . . 62
4.4 Performance comparison between the BER-optimal and Lloyd-Max ADC
for channel h = [0.1 0.7 0.4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Performance for a low-ISI channel employing 2-PAM modulation and a LE:
a) sampled impulse response of a backplane-like channel, and b) BER vs.
SNR curves for a 3-bit uniform, 3-bit BER-optimal, 4-bit uniform, and
infinite-precision ADC, respectively. . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Performance for a high-ISI channel employing 2-PAM modulation and a
LE: a) sampled impulse response of a backplane-like channel, and b) BER
vs. SNR curves for a 3-bit uniform, 3-bit BER-optimal, 4-bit uniform, and
an infinite-precision ADC, respectively. . . . . . . . . . . . . . . . . . . . . . 65
4.7 Performance for a high-ISI employing 2-PAM modulation and a DFE:
a) sampled impulse response of a backplane-like channel, and b) BER vs.
SNR curves for a 3-bit uniform, Lloyd-Max (LM) and BER-optimal ADC,
4-bit BER-optimal ADC, and 5-bit uniform ADC, respectively. . . . . . . . 66
4.8 Performance for a low-ISI channel employing 4-PAM modulation and a
DFE: a) sampled impulse response of a backplane-like channel, and b)
BER vs. SNR curves for a 4-bit uniform, Lloyd-Max (LM), BER-optimal
and 5-bit uniform ADC, respectively. . . . . . . . . . . . . . . . . . . . . . . 67
4.9 ADC input signal distribution and reference level settings for uniform, LM
and BER-optimal quantization: a) high-ISI channel, 2-PAM, LE (Case
B), b) high-ISI channel, 2-PAM, DFE (Case C), and c) low-ISI channel,
4-PAM, DFE (Case D), respectively. . . . . . . . . . . . . . . . . . . . . . . 68
5.1 Two equalization techniques: a) LUT-based non-linear, and b) linear
equalizer (LE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 ADC output encoder architecture. . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Performance evaluation of AMBER algorithm for reference level updates. . . 76
5.4 Adaptive receiver architecture with a LE. . . . . . . . . . . . . . . . . . . . . 78
5.5 Flash ADC architecture consisting of a bank of pre-amps that amplify the
difference between the input signal and the quantization threshold. This
is followed by latches that quantize the pre-amp output and a transition
detector and encoder that generate the Bx-bit ADC output. . . . . . . . . . 79
5.6 RL-UD architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.7 Reference level rj update block. . . . . . . . . . . . . . . . . . . . . . . . . . 81
xi
6.1 The latch: a) generic block diagram, and b) a specific circuit schematic. . . . 86
6.2 Latch voltage waveforms for a 010110 input sequence with input swing
Vsw = 600mVppd with two different initial output voltages (labeled as IC
for initial condition): (a) Vout = 0.75 V, and (b) Vout = −0.7 V. . . . . . . . . 89
6.3 Latch voltage waveforms for a 010110 input sequence with input swing
Vsw = 800 mVppd with two different initial output voltages: (a) Vout =
0.25 V, and (b) Vout = −0.2 V. . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4 Markov model for Vsw = 600 mVppd input. . . . . . . . . . . . . . . . . . . . 90
6.5 Markov model for Vsw = 800 mVppd input. . . . . . . . . . . . . . . . . . . . 91
xii
CHAPTER 1
INTRODUCTION
1.1 Background
The past decade has witnessed an explosive growth in the volume of data transported glob-
ally. The increased throughput requirements over twisted pair telephone lines (DSL) in
access networks and optical links in metro and ultra long-haul links have driven advances in
communication and signal processing techniques. These developments have created an im-
pact on the date rates supported at the lower levels of the system interconnect hierarchy. For
example, back-plane links present in internet routers have evolved from simple input/output
drivers to advanced high-speed link circuits performing modulation, equalization, clock re-
covery and decision making. Elsewhere, the rapid growth in processor operating frequencies
has shifted the bottleneck from high-speed computation in CPUs to high-speed data trans-
fer between processor and memory chips in computers. The desire to reduce pin count in a
chip’s package has resulted in parallel data being serialized prior to transmission. This has
accentuated the back-plane interconnect data-rate requirements.
At sub-Gb/s data-rates, I/O link design focused on addressing the limitations imposed
by the intrinsic gate speeds, and improvements in the design of the timing loops followed.
By the late 1990s, timing noise dominated system performance. Towards the mid-ten Gb/s
rates (2000-2005), the channel bandwidth limitation became significant, and transceivers
started employing some form of equalization. Multi-level signaling was employed to reduce
the data signal bandwidth.
Figure 1.1 shows a block diagram of a back-plane link. The serialized data at the transmit-
1
Clock Data 
Recovery
LE/DFE
PLL
Ref Clock
Transmitter Receiver
ChannelSerial
Data In
Serial
Data OutPre-emphasis
Driver
AmplifierSerializer
Parallel
Data In
DeSerializer
Parallel
Data Out
Figure 1.1: I/O link block diagram.
ter is clocked using a transmit phase locked loop (PLL) which performs clock multiplication.
It is then shaped via a pre-emphasis filter to boost the high-frequency components. A trans-
mit driver generates the appropriate voltage/current levels at the channel input. The path
connecting the transmitter to the receiver is typically a copper trace running over FR4 di-
electric. At the receiver, the received signal is processed by a receive amplifier, which serves
the dual purpose of bandlimiting the noise, and amplifying the signal to levels that can be
reliably processed by the receiver circuitry. The receive clock is recovered from the data
signal by a clock and data recovery unit (CDR), or generated in a source-synchronous man-
ner by a delay locked loop (DLL) that employs the transmit clock as a reference. A linear
equalizer (LE) or decision feedback equalizer (DFE) is present to further remove intersymbol
interference, and the equalized output is sliced to make a decision. The data is de-serialized
after detection.
1.2 Application Specifications
Figure 1.2 illustrates the cross-section of a typical back-plane link found in large internet
routers and racks of blade servers employing linecards. Such linecard modules can be in-
stalled and upgraded as and when required. They accept data from an external network
such as SONET, and employ serializer-deserializer (SerDes) chips to communicate data to
the destination linecard that contains the desired output port of the router. The transceivers
for such links are required to operate in an environment comprising up to 1 m (39”) of printed
2
Transmit 
Device
Receive 
Device
Line Card
Trace
Backplane
Connectors
Bondwire
Die 
Package
Backplane
Trace
Backplane
Via
Backplane
Via
Figure 1.2: Back-plane system cross-section indicating different sections of the signaling
path.
circuit board copper trace with two connectors. These links carry stringent specifications
- few tens of Gb/s data rates, power efficiencies in the 10-to-30 mW/Gb/s range, and a
bit error-rate (BER) target of 10−12 (e.g., the 802.3ap standard specifies BER < 10−12
whereas some network solutions providers require BER < 10−15). The 802.3ap standard
contained a proposal to implement a (2112, 2080) shortened binary cyclic code tailored for
burst correction [2]. This suggests that a latency of about 2000 bits is acceptable.
1.3 Channel
The back-plane channel consists of the bondwires connecting the transmit and receive devices
to the respective packages, linecard traces, edge connectors, vias, and the back-plane trace
as illustrated in Fig 1.2. At the high signaling rates of interest, the back-plane channel
imposes a bandwidth limitation, resulting in attenuation and distortion of the transmitted
signal. Figure 1.3 depicts the transfer function of a typical 20” FR4 channel. The frequency
dependent attenuation and distortion are primarily due to skin-effect and dielectric losses.
3
0 2 4 6 8 10
−50
−45
−40
−35
−30
−25
−20
−15
−10
−5
0
Frequency in GHz
|H
(jω
)| d
B
20" FR4 Channel
Figure 1.3: Frequency response of a 20” FR4 back-plane channel [1].
The channel loss for a 20” FR4 channel is typically about 40-45 dB at 10 GHz. Further,
impedance discontinuities resulting from vias, stubs, connectors and die packages cause
notches in the transfer function as shown in Fig. 1.3. At multi-Gb/s data rates, these
 
Figure 1.4: Frequency dependent attenuation and distortion results in a transmitted symbol
spreading out over several symbol periods when it arrives at the receiver.
impairments cause intersymbol interference (ISI); i.e., a transmitted pulse of unit symbol
duration is spread out across multiple symbol periods. The channel output sample in a
specific symbol period is a function of the preceding and succeeding transmitted symbols,
resulting in post-cursor and pre-cursor ISI, respectively, as shown in Fig. 1.4. Typical back-
plane links implement equalization to compensate for ISI. However, complexity constraints
4
and channel estimation errors prevent the complete cancellation of ISI. The ISI remaining
at the slicer/comparator input, known as residual ISI, degrades the voltage margin at the
slicer.
In addition to ISI, back-plane links are also impaired by crosstalk arising out of inductive
and capacitive coupling from neighboring lines. Crosstalk can be classified as far-end (FEXT)
and near-end (NEXT) crosstalk. FEXT occurs due to coupling from a signal traveling in
the same direction on a neighboring line. NEXT occurs when the aggressor signal travels in
a direction opposite to the desired signal. This happens when the receiver for the desired
signal is co-located with a transmitter driving a signal on a neighboring line. The FEXT
signal travels a greater distance on the main line, and hence experiences greater attenuation
at high frequencies compared to the NEXT signal.
1.4 Noise and Other Impairments
Besides ISI and crosstalk, back-plane links suffer from impairments such as thermal noise,
transmit and receive jitter, quantization errors and comparator offsets which have a signifi-
cant impact on link performance at the low error rates of interest.
Thermal noise is mainly generated by the 50 Ω terminations at the receiver. The device
noise of receiver circuits adds to the noise level. The work in [3] estimates the total input
referred random noise for a 5 GHz receiver to have an rms value of 0.3 mV, which is 40 dB
down from the equalized signal at the receiver. Thus, thermal noise is not the dominant
cause of bit errors in back-plane links. The transmitter and receiver contain a phase locked
loop (PLL) or a delay locked loop (DLL). The clock signal from these components is impaired
by timing jitter. Supply noise and reference clock phase noise are the prime noise sources
in a PLL and can contribute to a few picoseconds of timing jitter at the clock output. High
frequency transmit jitter modulates the energy of the transmitted symbol sequence, since
the jitter on adjacent symbol pulses is correlated. Receiver jitter results in a deviation of the
5
sampling instant from the instant where the eye opening is maximum. In mixed-signal and
fully digital links, quantization errors result from coefficient and data quantization. These
errors could propagate along the receive chain, resulting in detection errors at the slicer.
The equalized data is detected by a comparator/slicer. Impairments such as static offset,
input referred supply noise, and metastability affect comparator performance.
1.5 State of the Art
Modern state-of-the-art high-speed (Gb/s) I/O links today rely exclusively upon an equal-
ization based transceiver to compensate for ISI and achieve BER < 10−15 [4–6]. Current
day links are ISI-dominated with high receive signal-to-noise ratio (SNR) (e.g., > 30 dB),
and hence consume more power than necessary. Most state-of-the-art transceivers employ
analog circuits to efficiently implement linear and decision-feedback equalizers.
50W 50W
b
VDD
Vo
+
b
Rs
Cs
Vo
-
Figure 1.5: Analog peaking equalization at receiver using RC degeneration.
High-speed continuous time peaking equalizers are typically implemented using differential
pairs (Fig. 1.5). A source-degenerated differential amplifier provides a single-zero boost to
compensate for the channel attenuation. The degeneration resistance and capacitance are
tuned to adapt to the channel characteristic.
6
50W 50W
VDD
b[k-2]
I[2]
b[k-2]
(w[2])
b[k-1] b[k-1]
I[1]
(w[1])
I[0]
(w[0])
b[k] b[k]
Vo
+
Vo
-
Figure 1.6: Mixed-signal transmit FIR filter. Currents I[0], I[1] and I[2] are set proportional
to the tap-weights w[0], w[1] and w[2] using a digital-to-analog converter (DAC).
Fig. 1.6 illustrates the most commonly used topology to implement a mixed-signal FIR
filter at the transmitter. Each transmitter tap (w[i]) employs a DAC to adjust the source
current in a differential pair. The data bits are stored in a delay chain and used to drive
the differential pairs corresponding to the filter taps. The drain currents are summed at
the output node to achieve filtering operation. A decision-feedback equalizer employs an
identical topology, except that the decoded past bits act as inputs to the delay chain.
For back-plane links employing analog and mixed-signal equalization, the transmit driver,
clock generation and recovery units and comparator (slicer) dominate the link power budget.
Assuming an equalization based transceiver, as is the state-of-the-art today, [3] predicts a
four-fold increase in power when the data-rate is increased from 5-to-12 Gb/s to 25 Gb/s
and higher, for a fixed process technology node (130 nm). This clearly implies a need to
explore alternative communication techniques to design power-optimal I/O links. Larger size
constellations such as 4-PAM help bring down the bandwidth requirement, but are limited
by the peak-SNR constraint imposed by a given technology [5], [7], [8].
The need to improve energy efficiency at increasing data-rates has motivated considerable
research activity. Recent work such as [9] proposes simultaneous system and circuit design
7
space exploration to determine the optimal architecture and allocation of resources in a
given system. Circuit level techniques such as low-swing voltage mode drivers [10], inductive
clock distribution [11], and software-based CDR calibration [10] have been proposed. Passive
equalization through RL terminations is proposed in [12] to reduce equalizer complexity. The
transmitter and receive clock-deskew power are identified as dominant in parallel links with
forwarded global clock, where the clock distribution power is amortized across many lanes.
This work [12] also suggests strategies such as dual-supply and CML driver implementation
to enable optimization of link power.
1.6 System-Assisted Mixed-Signal Design
In the early and mid-2000s, back-plane links were predominantly equalizer based. Most
transmitters employed 3-4 taps of pre-emphasis and the receiver employed 4-5 taps of decision
feedback. These circuits, mostly analog, were being designed at high link rates to achieve
BER = 10−15 directly at the comparator output. Research in the area was primarily driven
by analog circuit designers; several well known techniques from digital communication theory,
such as maximum likelihood sequence detection (MLSE), Tomlinson-Harashima (TH) pre-
coding, forward error correction (FEC) and trellis coded modulation (TCM) were not yet
exploited.
This thesis proposes a system-assisted mixed-signal design (SAMS) approach, where mixed-
signal components are designed in the context of a larger system such as a communication
link. This is in contrast to traditional design of mixed-signal components, which carry strin-
gent specifications as they are designed as stand-alone components. This component-level
design principle also applies to the more recent digitally-assisted analog (DAA) techniques
which relax analog component design by complementing it with digital post-processing
[13,14]. DAA is motivated by the fact that digital circuits benefit more from Moore’s law than
analog circuits. Hence, employing a digital post-processor to enhance analog performance
8
Digital
Post-processor
x(t)
x[n]
x[n]
x(t)
Accurate Analog
(High Power)
Crude Analog
(Low Power)
Conventional
DAA
(a)
Noise 
v(t)
ADC
Digital 
Processor
slicer/
detector
CDR
CLK
b[n-D]
~
DSP 
Calibration
Channel 
Output 
r(t)
Analog
Block
(Detected 
Bits)
(b)
Figure 1.7: Mixed-signal design approaches: a) conventional vs. digitally-assisted analog
(DAA), and b) system-assisted mixed-signal (SAMS).
9
would enable designers to reap the benefits of Moore’s law. The contrast between DAA
and SAMS is illustrated in Fig. 1.7. DAA (Fig. 1.7(a)) relaxes the analog-component level
specifications by digital post-processing. SAMS (Fig. 1.7(b)) extracts system-information at
the equalizer and detector output to tune the parameters of the analog mixed-signal compo-
nents and achieve best system performance. This thesis demonstrates the application of the
SAMS approach to design (a) FEC-based backplane links, and (b) BER-optimal ADCs.
First, we propose to push the high-speed I/O link channel into a noise-dominated scenario,
i.e., SNR-limited, and then use forward error-correction (FEC) jointly with equalization to
achieve a desired post-FEC BER of < 10−15 in order to minimize power consumption of such
links. The coding-gain can be leveraged to relax the design specifications of the mixed-signal
components. Such a link would require the equalizer to achieve a BER no greater than 10−3-
to-10−4, and rely on the FEC to bring the BER below 10−15. Indeed, most communication
links today such as DSL, wireless, optical, disk drives, and others, employ some form of
FEC in order to reduce SNR requirements, and provide robustness to channel errors. The
state-of-the-art I/O channel today is in some sense primitive, in that it relies exclusively on
waveform shaping techniques such as transmit/receive equalization to achieve the requisite
BER. Though digital techniques consume more power than analog [3], scaling of feature
sizes tends to favor digital designs more than analog. FEC is extensively employed in optical
links, but the stringent power budget has delayed its advent in back-plane links. FEC is
being considered in some I/O standards [2], where a lightweight code with limited coding
gain is implemented in-band in the PHY layer specifically to handle error-bursts resulting
from a DFE.
Second, we propose system-assisted design of an ADC in the context of a high-speed back-
plane link. Here, the reference-levels and thresholds of the ADC comparators are set in order
to achieve the best BER. This is in contrast to conventional ADC design approaches that
attempt to meet a fidelity-specification such as SNDR (signal to noise and distortion ratio)
and SFDR (spurious-free dynamic range).
10
In Chapter 2, we show that a combination of coding and modulation can be employed to
improve energy-efficiency. We propose a low-power decoder architecture that enables the
implementation of random error correcting block codes within the power budgets of these
links. The link budgeting work presented in this chapter culminates in the design of a test-
chip that demonstrates some of the benefits of FEC.
The applicability of FEC in high-speed links hinges on the ability to implement codes at
acceptable power and latencies. Chapter 2 presents an approach to implement block codes
with low power consumption. The need to minimize latency necessitates the development of a
model to evaluate latency vs. performance trade-offs. Past work on statistical link modeling
has focused on uncoded I/O links [3], [15]. A rigorous statistical model is necessary to
evaluate coded link performance, since the very low target BER unique to this application
precludes the use of simulation. In particular, it is important to predict the impact of DFE
error-propagation on FEC performance at the low error rates of interest. Error bursts are
particularly undesirable and codes must be designed for their correction.
In Chapter 3, we develop an analytical model to quantify the effect of DFE error propaga-
tion on the link performance for FEC-based links. We use this model to study systematically
the latency vs. performance trade-off for random and burst correction codes.
As the complexity of equalizers increases to support higher data rates and longer channels,
exploiting the benefits of digital scaling by performing digital signal processing becomes an
interesting alternative. Indeed, recent years have witnessed the implementation of ADC-
based high-speed link transceivers [16–18]. For such links, the ADC is the most power
hungry component in the transceiver, consuming up to 40% of the transceiver power. This
very high ADC power consumption has steered recent research towards the development
of circuit-level techniques for designing energy-efficient high-speed ADCs. In this thesis,
we investigate the use of system-level information (e.g., BER) to adjust the parameters of
the ADC, thereby reducing the number of reference levels necessary, resulting in savings in
power.
11
In Chapter 4, we examine the application of SAMS for ADC design. We demonstrate
the benefits of BER-based ADC reference-level settings in the context of an equalizer-based
communication link.
In Chapter 5, we develop an algorithm to adaptively adjust the reference levels of the ADC
based on the BER-metric. We propose an architecture to implement the proposed algorithm
and demonstrate its feasibility through finite-precision simulations. The work in this chapter
illustrates how the benefits of SAMS-based ADC design can be practically realized.
Chapter 6 summarizes the contributions of this thesis. Some preliminary work on modeling
mixed-signal components to enable a SAMS-based design approach is presented. We also look
at possible future research directions.
12
CHAPTER 2
FORWARD ERROR CORRECTION IN
HIGH-SPEED I/O LINKS
As the first application of the SAMS approach, we consider the design of energy-efficient
high-speed back-plane links via forward error-correction (FEC). State-of-the-art links achieve
BER = 10−15 without FEC. In this chapter, we demonstrate the SAMS approach by showing
that system-level techniques such as forward error correction (FEC) and modulation can be
exploited to simplify mixed-signal component design and improve energy-efficiency. FEC
is better suited for back-plane links compared to digital communication techniques such as
maximum likelihood sequence estimation (MLSE), as it does not necessitate the presence of
a high-speed ADC at the receiver. Such ADCs are known to consume significant power.
The coding gain offered by FEC is leveraged to relax transmit driver, PLL, ADC and
comparator specifications. We first present power models of the transmitter, ADC (for ADC-
based links) and FEC. We then study the FEC vs. TX power and FEC vs. ADC power
trade-offs to show that net power savings can be achieved. Introducing FEC reallocates
the power expended by the analog front-end to the FEC encoder and decoder, which scales
much better with process technology. The stringent power budget (e.g., 10-30 mW/Gb/s)
and bandwidth limitations of I/O channels require that the codes considered should (a) offer
significant coding-gain at moderate to high code rates, and (b) be realized at a fraction of
the link power budget. Hence, this thesis proposes binary BCH codes for back-plane links.
Such codes are represented by a 3-tuple parameter set (n, k, t), where n is the codeword
length, k is the dataword length, and t is the error-correcting capability.
Figure 2.1 depicts a FEC-based high-speed link. At the heart of the reliability problem is
the channel, which causes ISI and cross-talk. Complexity constraints imply that residual ISI
13
Front-End
r(t)
g’(t)
Transmitter
Channel
h(t)
No /2
Detector
Receiver
1/T
w
PLL
LE/DFE
(c,d)
CDR
QE
eTX eRX
xtalk(t)d[k]
Voffset
b[k]
Encoder
(n,k,t)
b[k]^
Pre-emp. + Shaping
Decoder
(n,k,t)
Q
ADC
Figure 2.1: An FEC based high-speed link.
and cross-talk are present, resulting in reduced eye opening at the detector/slicer. Timing
jitter, quantization noise,1 resistor thermal noise, and comparator offsets further compound
the reliability problem.
We demonstrate the benefits of leveraging coding gain to design energy-efficient links
through a combination of system performance budgeting and low power FEC design. Section
2.2 describes the performance and power models used to evaluate system-level trade-offs. In
section 2.3, we show that a FEC-based link achieves performance superior to an uncoded
link when coding gain compensates for the ISI penalty. Section 2.4 presents a discussion of
how FEC coding gain can be traded off with the specifications of various analog components
to achieve net power savings. In Section 2.5, we propose a FEC-system architecture and a
low-power decoding scheme that is ideally suited for back-plane links. Section 2.6 describes
a student-designed test-chip that demonstrates the application of FEC to achieve transmit
driver swing reduction. Section 2.7 summarizes the key contributions in this chapter.
2.1 Forward Error-Control (FEC): Preliminaries
The block diagram of a FEC-based I/O link is illustrated in Fig. 2.1, where the inner
transceiver includes the shaping filter (e.g., pre-emphasis) g(t) at the transmitter, the phys-
ical channel h(t), the receive filter (e.g., equalizer or band-limiting low-pass filter) r(t),
1The ADC is absent in an analog-based link; the LE/DFE are implemented using analog cells that
perform weighted-current-summation.
14
followed by a baud-rate sampler and a detector (e.g., slicer). FEC is a well-known technique
where blocks of data/information bits of length k (dataword) are mapped to blocks of code
bits of length n (codeword) where n > k. Such a code is said to have a code-rate of r = k
n
.
If R is the data-rate in bits/s then an FEC link (or coded link) will have a line-rate L = R
r
which is greater than R. This is because a coded link needs to transmit redundant bits in
addition to the data bits. For uncoded links, L = R. As the line-rate is greater than the
data-rate, a coded link will suffer from more ISI than an uncoded link and hence incur an
ISI penalty.
The mapping from dataword to codeword is chosen such that the minimum Hamming
distance (dmin) between any two codewords is maximized and the decoder complexity is
minimized. Both of these properties are satisfied by linear codes. The error-correction
capability of an (n, k, t) linear code is governed by dmin in that the maximum number
of correctable errors t = bdmin−1
2
c. Thus, a larger dmin results in greater error-correction
capability and hence a greater coding gain, where the coding gain is the difference between
the channel SNR of a coded and an uncoded link achieving the same BER.
Another trade-off inherent in the design of coded links is between the minimum distance
(dmin) and code-rate (r). Reducing k for a specific n increases the maximum achievable
dmin, i.e., an improvement in the coding gain can be achieved at the expense of the ISI
penalty. This is because the code-rate r = k
n
will decrease, thereby necessitating a higher
line rate. A way around this problem is to increase the block/code-length n. This however
will impact the latency of the design and the complexity of the encoder and decoder. Thus,
coded links offer an interesting variety of trade-offs between power consumption, BER, and
latency. For I/O links, we show that the coding gain from specific types of codes will offset
the ISI penalty with an acceptable latency and thereby result in a reduced power link.
15
2.2 Statistical Link Model
In this section, we describe the system model adopted to evaluate the performance trade-offs
discussed in the sections that follow.
2.2.1 Channel and Noise Model: Inner Transceiver
The shaping filter g(t) (achieved using a combination of discrete-time pre-emphasis (w)
and continuous time shaping (g′(t))), channel h(t), and receiver front-end r(t) in Fig. 2.1
are abstracted into an equivalent baud-sampled discrete-time channel (h[m] = g(t) ? h(t) ?
r(t)|t = mT , where ? denotes convolution) with additive Gaussian noise ν[m], where ν[m]
represents the sum of voltage noise contributions from the receiver front-end input referred
noise, clock jitter and quantization noise. Timing jitter (TX,RX) is mapped to voltage-noise
(xTX,RX ) employing the approach illustrated in [3]. The input signal to the equalizer can be
written as
x[n] = xISI [n] + xth[n] + xTX [n] + xRX [n] + xQE[n] (2.1)
where xISI [n], xth[n], xTX [n], xRX [n] and xQE[n] are the contributions of the signal, ther-
mal noise, transmit jitter, receive jitter and quantization noise, respectively. Defining
x = [x[n] . . . x[n − L + 1]]T , where L denotes the length of the linear equalizer (LE),
the minimum mean square error (MMSE) coefficients c are determined as
c = Rxx
−1p (2.2)
where, Rxx = E(xx
T) is the auto-correlation matrix, p = E(b[n − D]x) is the cross-
correlation vector, and D is the system delay. The auto-correlation matrix Rxx can be
written as
Rxx = R
ISI
xx +R
th
xx +R
TX
xx +R
RX
xx +R
QE
xx (2.3)
16
where the RHS of (2.3) indicates the constituent auto-correlation terms. The signal, thermal
noise, jitter and quantization noise are assumed to be uncorrelated. The expressions for the
auto-correlation terms on the RHS of (2.3) can be found in the literature [3, 19, 20].
The next step involves obtaining the distribution of the signal at the slicer input (y[n]).
The MMSE equalizer taps are employed to compute the residual ISI taps and the resulting
ISI distribution at the slicer (PYISI (y)). The ADC quantization noise is modeled as uniformly
distributed additive signal, and the distribution corresponding to each of the equalizer input
samples is convolved to obtain a quantization noise distribution at the slicer (PYQE(y)).
The thermal noise and voltage noise due to transmit and receive jitter have a Gaussian
distribution (PYth(y), PYTX (y) and PYRX (y)). The individual signal and noise distributions
at the slicer are convolved to obtain the cumulative signal and noise distribution as
PY = PYISI ? PYth ? PYTX ? PYRX ? PYQE (2.4)
The comparator sensitivity is modeled using a voltage offset Voffset, by shifting the signal
at the slicer input by Voffset. With the knowledge of detector (slicer) thresholds, BER can
be estimated once the distribution PY is determined.
2.2.2 Simplified Channel Model
h
][kb
][kv
][kx
Figure 2.2: Simplified ISI-channel and additive noise model.
The detailed statistical inner-transceiver model described in Section 2.2 can be employed
17
to get an accurate estimate of link performance even at low BER, especially when the
goal is to derive specifications for various link components. However, a simpler link model
(Fig. 2.2), comprising an ISI channel and additive noise, can be used when the goal is to
obtain a first order comparison between various communication techniques. In this model,
a composite discrete-time channel h is derived as described in Section 2.2. All the noise
sources are lumped into an equivalent noise source (ν). This can be assumed to have a
Gaussian distribution, or a distribution can be obtained through simulation. This channel
model is employed in Chapters 3, 4 and 5 to model the inner-transceiver.
2.2.3 Simplified FEC Model
The channel and noise model just described is employed to compute the preFEC-BER
(BERpre). It is assumed that the effect of correlated errors is managed via interleaving, as
proposed in Section 2.5. It is noted that the proposed technique achieves interleaving-gain
at no additional power overhead, but results in a decoding latency overhead. Based on this
assumption, BERpost is calculated from the BERpre based on the (n, k, t) parameters of the
code [19, 20]. In Chapter 3, we develop a rigorous statistical model to evaluate the effect of
DFE error propagation.
2.2.4 Component Power Models
This subsection describes the power models for the transmit driver, ADC and the FEC.
The transmit driver power is primarily governed by the transmit swing, the ADC power is
dictated by the effective number of bits (ENOB) and speed of operation, and the FEC power
is determined by parameters (n, k, t).
18
Transmitter
The works in [10,12] recognize the transmit driver power as a significant component in link
total power. The power consumed by a current mode driver is given by [10]
PTX = Vdd
(
2Vsw
Zd
)
(2.5)
where Vdd is the supply voltage, Vsw is the transmit driver output differential swing, and Zd
is the differential impedance of the transmission line.
Analog-to-Digital Converter
The ADC samples the filtered received signal at baud-rate. The output samples are provided
to the equalizer. The ADC power is estimated as [21],
PADC =
V 2ddLmin(fsample + fsignal)
10(−0.1525N1+4.838)
(2.6)
where Vdd is the supply voltage, Lmin is the minimum channel length for a given CMOS
technology, fsignal and fsample are the signal and sampling frequencies, respectively, and N1
represents the ADC resolution.
FEC
The decoder power consumption for the architecture shown in Fig. 2.10 is given as
Pfec = Penc + Pdec + Pbuff
Penc = cenct
Pdec = cedt+ csuαsut+ cbmuαbmut+ celuαelut
19
where Penc, Pdec, and Pbuff are the power consumption of the encoder, decoder and the data
buffer, respectively. The constants cenc, ced, csu, cbmu and celu are power consumed by the
encoder, the error-detector, syndrome unit (SU), Berlekamp-Massey unit (BMU) and error
locator unit (ELU), respectively, when t = 1 and the activity factor is one. BERpre and
codeword length n determine the activity factors αsu, αbmu, and αelu as follows:
αsu = αbmu = αelu = nBERpre
With an appropriate choice of BERpre and n, we can lower the decoder power significantly.
Since shift registers have significant power dissipation, a muxed-shift-register implementation
is necessary for the data-buffer to reduce the number of shift operations per clock cycle.
2.3 Coding Gain vs. ISI Penalty
Employing a simple additive white Gaussian noise (AWGN) ISI channel model, we show [22]
that BER improvements over state-of-the-art uncoded links can be achieved with well-known
binary block codes. For a 10 Gb/s data-rate, 20” FR-4 link employing 2-PAM modulation, a
matched-filter, a 6 tap linear equalizer (LE) and a 11 tap decision-feedback equalizer (DFE),
a 12 dB coding gain at BER = 10−15 is achieved using a (327, 265, 7) code [23, 24] that is
obtained from shortening the (511, 448, 7) BCH code.
In order to capture the trade-offs inherent in FEC, Hamming codes with dmin = 3, t = 1
and a BCH code with dmin = 15, t = 7 are evaluated, with a discrete-time linear equal-
izer (LE)/decision feedback equalizer (DFE) based receivers. An I/O link comprising a 20”
FR4 channel supporting NRZ shaped 10 Gb/s data was considered. The following equaliza-
tion/coding scenarios were evaluated:
1. DFE and a (7, 4, 1) (r = 0.6) Hamming Code(code 1)
2. DFE and a (31, 26, 1) (r = 0.84) Hamming Code(code 2)
20
3. DFE and a (63, 57, 1) (r = 0.9) Hamming Code(code 3)
4. LE and a (327, 265, 7) (r = 0.81) BCH Code
5. DFE and a (327, 265, 7) (r = 0.81) BCH Code
1
23
Figure 2.3: Performance of Hamming codes.
The first three designs with a Hamming code are compared in Fig. 2.3, from where we
make the following observations:
• The BER with Hamming codes improves with the code-rate in going from a low-rate
code (HC1) to a high-rate code (HC2). This clearly illustrates the ISI penalty vs.
coding gain trade-off. In fact, the crossover point between an uncoded DFE-based link
and a Hamming coded link occurs for HC2 in the SNR range of interest.
• The ISI penalty for HC3 is indicated by its BERpre being worse than that of the
uncoded DFE. However, in this case, the coding gain is sufficient to more than offset
its ISI penalty as shown by its BERpost curve.
• Analysis and simulation show good correlation. The small difference in the analysis
and simulation results for the (63, 57, 1) code are due to the impact of residual ISI
21
induced correlated errors. Simulations for BERs lower than 10−7 were not performed
because of extremely long simulation times. For such low BERs, we relied on analytical
estimates.
15 20 25 30 35 40
10−15
10−10
10−5
SNR (Eb/No) dB
B
ER
LE
DFE
Coding + LE
Coding + DFE
12 dB Coding Gain
Figure 2.4: Performance of a (327, 265, 7) BCH code.
Fig. 2.4 shows the performance of a BCH code. Also shown in the figure are the reference
curves for uncoded LE and DFE designs. We make the following observations:
• A BER improvement of six orders-of-magnitude and ten orders-of-magnitude improve-
ment with an LE and a DFE, respectively, is achieved.
• The crossover point between coded and uncoded links occur around 25 dB to 30 dB.
• At very low SNRs, the uncoded links perform better because the ISI penalty starts to
dominate over the coding gain for coded links.
2.4 Coding Gain vs. Energy-Efficiency Trade-Offs
The fundamental impairments in a high-speed link are residual ISI (and cross-talk), timing
noise such as transmit and receive jitter and circuit noise such as thermal noise due to
22
the termination resistor, front-end amplifier and comparator circuits. Coding gain can be
employed to reduce link power consumption by: (a) reducing transmit swing requirements
(Sec. 2.4.1), (b) enabling higher PAM constellations (Sec. 2.4.1), (c) reducing ADC sampling
rate and precision required (Sec. 2.4.2), (d) improving jitter-tolerance (Sec. 2.4.3), and (e)
reducing receiver amplification requirements and improving comparator offset-tolerance (Sec.
2.4.4).
All discussions in this chapter pertain to 10 Gb/s data transmission across a 20” FR4 trace
(25 dB attenuation at Nyquist). The noise PSD No/2 = 4 mV
2/GHz. Receive equalization is
carried out using 4 LE taps and 6 DFE taps. The parameter values chosen are representative
of state-of-the-art links [2].
2.4.1 Reduced Transmitter Swing Requirement
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
-20
-18
-16
-14
-12
-10
-8
-6
-4
TX Swing, V
sw
 (Vppd)
B
ER
0.03 UI rms, 2-PAM, uncoded
0.01 UI rms, 2-PAM, uncoded
0.03 UI rms, 2-PAM, coded
0.01 UI rms, 2-PAM, coded
0.01 UI rms, 4-PAM, coded
0.01 UI rms, 4-PAM, uncoded
1.54X
enabling 
4-PAM @ 1Vppd
Figure 2.5: Coding gain vs. transmit swing using a (255, 247, 1) code at low (0.01 unit
interval (UI) rms) and high (0.03 UI rms) transmit jitter for 2-PAM and 4-PAM.
High speed links are peak-power constrained; this arises from the supply voltage Vdd of
the process node. The peak power constraint is a bottleneck in implementing higher signal
constellations, as the minimum symbol distance decreases with constellation size for a fixed
23
peak power. Timing jitter is a significant impairment, and the voltage noise induced by
timing jitter increases with transmit swing Vsw. This limits the extent to which BER can
be improved by increasing Vsw. Fig. 2.5 illustrates the BER sensitivity to Vsw for two
different values of transmit jitter, 0.01 unit interval (UI) rms and 0.03 UI rms. The effects
of quantization are not considered in this analysis, as this is applicable to non-ADC based
links as well. This figure illustrates that, with increase in jitter, the BER does not reduce
as fast with an increase in Vsw. We also note that for a given jitter value (0.01 UI rms, say),
2-PAM BER is more responsive to Vsw increment than 4-PAM because, for the same Vsw,
the jitter induced voltage noise relative to minimum distance is higher for 4-PAM. From
Fig. 2.5, we infer that
• FEC relaxes the transmit swing requirements. Equation (2.5) implies a directly pro-
portional savings in driver power.
• The power savings are higher for a link dominated by jitter, owing to lesser sensitivity
of BER to transmit swing. Compared to the uncoded link, the FEC reduces swing
requirement by 0.35 Vppd for the 0.01 UI rms jitter case as compared to 0.48 Vppd for
the 0.03 UI rms jitter case.
• The relaxed swing enables the use of a larger constellations such as 4-PAM. For
example, at 1 Vppd, 2-PAM achieves BER = 10
−12. In contrast, 4-PAM achieves
BER = 10−10, not meeting the performance target. Coding thus enables 4-PAM to
achieve the target BER at the same transmit swing level.
Fig. 2.6 compares transmitter power as a function of swing and FEC power as a function
of error correction capability. The 1.54X savings in TX swing highlighted in Fig. 2.5 (0.01
UI rms jitter case) maps to a 14 mW reduction in TX driver power, with a 7.5 mW FEC
overhead. The transmit power saving is 10 mW for the 0.03 UI rms jitter scenario, leading
to a 1 mW/Gb/s improvement in energy efficiency. Such savings are expected in links where
transmitter power is dominant [25].
24
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5
10
15
20
25
30
35
40
TX Swing, V
sw
 (Vppd) 
P T
X 
(m
W
)
1 2 3 4
5
10
15
20
25
30
35
40
t (# errors corrected), bits
P F
EC
 
(m
W
)
n = 255
n = 63
TX power, Vdd = 1.8V
reduces with process 
node
FEC power
(7.5 mW
In 90nm)
1.54X
TX power
(does not scale well
With process)
(top x, right y axes)
(bottom x, left y axes)
Figure 2.6: TX power vs. swing Vsw and FEC power vs t, evaluated in 90 nm IBM CMOS.
2.4.2 Reduced ADC Precision Requirement
Recently, several works on high speed ADCs [17, 18, 26] have appeared, with the latter
achieving 3.8 bits at 7.5 GS/s and 52 mW power dissipation. Even though DSP based
links offer flexibility and robustness, they have not gained widespread acceptance owing to
the high power overhead of the ADC. Fig. 2.7 shows that the precision requirement for an
ADC-based link can be relaxed by employing FEC. The performance of an FEC based link
using a (255, 247, 1) BCH code and a 5 bit ADC is superior to an uncoded link using a 6b
ADC. A similar observation is made for 4-PAM. In order to evaluate the power trade-offs
involved systematically, BCH codes with n <= 511 were considered, and the minimum ADC
precision required to achieve BER < 10−12, for a channel operating at Vsw = 1Vppd, was
determined. Table 2.1 presents these results. As noted earlier, for coded 2-PAM modulation,
a (255,247,1) code brings down the precision requirement from > 6 bits to 5 bits. Using
the stronger (511,493,2) code, at the same code rate reduces the required ENOB to 4.5 bits.
Codes with rates lower than 0.95 are significantly affected by the increased ISI penalty and
do not meet the performance target with 2-PAM. With 4-PAM, a larger subset of codes
25
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
TX Swing, V
sw
 (Vppd)
BE
R
2-PAM, coded, 5b ADC
2-PAM, uncoded, 6b ADC
4-PAM, uncoded, 6b ADC
4-PAM, coded, 5b ADC
> 1b
> 1b
Figure 2.7: Coding gain vs. ADC precision using a (255,247,1) code for 2-PAM and 4-PAM.
Table 2.1: FEC vs. ADC Power, BER = 10−12, Vdd = 1.2 V
M-PAM (n,k,t) CR ENOB PFEC PADC Ptotal
(bits) (bits) mW mW mW
2 Uncoded 1 >6 0 307.4 >307.4
2 (255,247,1) 0.97 5 7.7 223.6 231.3
2 (511,502,1) 0.98 5 11.7 220.5 232.2
2 (511,493,2) 0.97 4.5 16.6 188.5 205.0
4 Uncoded
4 (255,247,1) 0.97 5.5 7.7 133.2 140.9
4 (511,502,1) 0.98 5.5 11.7 131.3 143.0
4 (511,493,2) 0.97 4.8 16.7 104.7 121.3
4 (511,484,3) 0.95 5.5 23.8 136.2 160.0
26
(down to the (511,438,8) code with CR = 0.86) meet the target (Table 2.1 only lists down
to CR = 0.95, for brevity). However, an optimum in terms of ENOB reduction is achieved
by the (511,493,2) code resulting in a power savings of greater than 180 mW. This example
illustrated how FEC enables a higher PAM implementation at reduced precision, leading to
power savings over conventional 2-PAM. Moving to a stronger code at this block length only
increases line rate without any significant savings in precision, and hence is not justified.
2.4.3 Improved Jitter Tolerance
Residual ISI and jitter are known to be the two most significant impairments in high-speed
links. A high speed link transmitter consists of a PLL that generates a clock to synchronize
the data stream. Jitter in this clock arises from device noise, reference clock and power
supply noise. It has been shown in [3] that high frequency transmitter jitter modulates the
energy of the transmitted signal, increasing the effective jitter induced voltage noise at the
comparator. The jitter induced noise increases proportionally to signal amplitude, hence
this cannot be handled by increasing Vsw. FEC is a particularly effective strategy to handle
this impairment. In order to evaluate the improvement in the transmitter jitter specification
systematically, the admissible jitter is tabulated as a function of codeword length in Table 2.2,
for BERpost < 10
−12, codes with n < 511, and Vsw = 1Vppd.
The base reference performance for the uncoded link at 0.01 UI jitter is inferred from
Fig. 2.5 for 2-PAM and 4-PAM respectively. 2-PAM achieves the BER target at 1Vppd swing
but 4-PAM falls short. As seen from Table 2.2, the jitter tolerance increases by 2X to 6X
depending on the code used. Binary BCH are suitable for correcting errors caused due to
random jitter. The results presented motivate the need for circuit designers to investigate
low-power clocking circuits that leverage the relaxed jitter specifications.
27
Table 2.2: Exploiting Coding-Gain to Improve Jitter Tolerance: TX Jitter Permissible at
BER = 10−12
M-PAM (n,k,t) TX jitter
(bits) % UI, rms
2 Uncoded 0.01
2 (255,247,1) 0.04
2 (511,502,1) 0.04
2 (511,493,2) 0.06
4 Uncoded <0.01
4 (127,113,2) 0.02
4 (255,247,1) 0.02
4 (255,223,4) 0.04
4 (511,502,1) 0.02
4 (511,448,7) 0.05
2.4.4 Improving Comparator Offset Tolerance
In conventional design, comparator impairments (circuit impairments, in general) such as
static offset, input-referred supply noise and metastability are modeled by adding an offset
(Voffset, Fig. 2.1) to the comparator input signal.
A
B
C
A”
B”
25 mV
30 mV
40 mV
2 orders
6 orders
Figure 2.8: BER versus voltage offset for 2-PAM and 4-PAM.
Figure 2.8 shows BER at the slicer (preFEC-BER) as a function of voltage offset at the
slicer for 2-PAM and 4-PAM. The input swing to the slicer (Vsl) is normalized to Vsl = Vdd
28
(in Vppd) to obtain the normalized input swing (NIS). The higher the NIS, the higher the
amplification required and the higher the receiver power. The work in [25] highlighted the
need to optimize the transmit driver and receiver amplifier power. The results of this section
show that requirements on the receiver amplification can be significantly reduced in an FEC
based link. At lower NIS, more circuit error events occur (e.g., metastability) leading to
worse BER. Thus, comparator power (a function of Vsl) can be traded off with error rate.
The preFEC-BER for the link is also characterized as a function of code rate (CR), keeping
the equalizer complexity fixed. Points A, B and C in Fig. 2.8 correspond to BER = 10−12,
for 2-PAM and and A” and B” for 4-PAM. A (A”) represents an uncoded link. B (B”) and C
represent a coded link with CR = 0.96 (CR = 0.8), for 2-PAM (4-PAM). A higher code rate
for 2-PAM is chosen to reduce the ISI penalty. We see that an FEC assisted link, operating
at CR = 0.96 ((127,120,1) and (255,239,2) codes) offers 30-to-40 mV additional tolerance to
voltage offset (points B and C) at one third the NIS of the uncoded link (compare with point
A) for 2-PAM modulation. Similarly, for 4-PAM, and a (127,120,1) code, a 25 mV excess
offset tolerance at one half the NIS is illustrated (comparing points A” and B” in Fig. 2.8).
2.5 Low-Power FEC Architecture
Given the limitations of technology, it is usually necessary to parallelize the FEC implemen-
tation so that the encoder and decoder run at achievable speeds. Fig. 2.9(a) shows how the
FEC encoder and decoder can be integrated with the serializer and deserializer present in
high-speed serial links. If each parallel channel is encoded with a trand random error cor-
recting code, and there are M subchannels, the proposed architecture achieves burst error
correction capability tburst = trandM . Burst error-correction is important in a receiver where
the DFE produces error-bursts spanning the DFE length. If encoding and decoding are
implemented in stage-i0 of the serializer and deserializer respectively, then M = 2
i0 . Hence,
there is an inherent trade-off between FEC speed of operation, burst error correction (hence
29
25 26 27 28 29 30 31
-7
-6.5
-6
-5.5
-5
-4.5
-4
-3.5
-3
-2.5
-2
SNR (dB)
B
ER
M = 1, DFE uses TX bits
M = 1, DFE uses detected bits
M = 10, DFE uses detected bits
.
.
ENC
ENC
ENC
ENC
.
.
stage “i”
1
2
stage “p”
.
.
DEC
DEC
DEC
DEC
.
.
2
1
stage “i”
stage “p”
SNR (dB)
B
E
R
Figure 2.9: FEC system architecture: a) parallel FEC, and b) performance benefits.
performance), latency and power. Fig. 2.9(b) illustrates the benefits of intrinsic interleaving.
A 10 Gb/s data rate link with a (63,57,1) code was used for this evaluation. The BER curve
with transmitted bits used for feedback in the DFE gives a lower bound on the performance.
The curve with M = 1, and detected bits used for feedback illustrates the performance if
the specified code were implemented in a serial (non-interleaved) manner. A 3.5 dB loss due
to error propagation at BER = 10−4 is observed. When M = 10, and detected bits are
used for feedback, a significant part of the degradation in performance due to burst errors
is recovered. This performance is within 1 dB of the estimated bound.
The BCH encoder is implemented using standard techniques that involve bit shift and
XOR operations. Most of the FEC power is consumed at the decoder. We consider the
RiBM [27] version of the Berlekamp-Massey algorithm for decoding BCH codes. The SU,
BMU and ELU perform additions and multiplications in GF (2m), where n = 2m − 1 is the
codeword length of a primitive BCH code. The low-power decoder architecture (Fig. 2.10)
includes an error-detector similar to the encoder, i.e., the error-detector operates in GF (2).
The error-detector operates continuously, and its output is used to gate the power hungry
SU, the BMU, and the ELU blocks performing GF (2m) computations. Further, the encoder,
30
SYNDROME
UNIT GF(2
m
)
BMU GF(2
m
) ELU GF(2
m
)
GATED CLOCKCLOCK
AND
BUFFER
corrected. 
bits
ERROR
DETECTOR
recd. 
bits
Figure 2.10: Proposed low-power BCH decoder architecture.
error-detector and data-buffers can be operated at voltages lower than the supply. This two-
pronged approach results in the ability to implement a wide range of BCH codes at very low
power.
2.6 iOpener: System Description
Motivated by the advantages of FEC demonstrated in this chapter, a testchip (iOpener) in a
90-nm process implementing BCH block codes and supported by simple analog equalization
was designed. This is described in this section. The goal of this test-chip was to demonstrate:
• Reduction in transmit swing (peak power constraint) requirement enabled by FEC.
• Favorable trade-off between transmit driver and FEC power.
• Feasibility of providing random and burst correction using binary BCH codes with
interleaving within the power budget.
The iOpener test chip architecture is illustrated in Fig. 2.11. The data stream in the
transmitter is generated from an on-chip 16-channel random data generator (PRBS). The
data is then encoded with low-speed BCH encoders with one of the following (n, k, dmin)
specifications: (63, 36, 11), (63, 39, 9), (63, 45, 7), (63, 51, 5), or (63, 57, 3), where n is the
codeword length, k is the data bits per codeword, and dmin is the minimum distance between
two codewords. The error correction capability is t = bdmin−1
2
c. The transmit driver, which
also performs the role of pre-emphasis filter, was designed to deliver a variable output signal
31
ENCODER TX-DRIVER
RX-AMPDECODER
CDR
Freq. Div.
PRBS
Freq. Div.BERC
16 16
Ext. Clock
1616
Error Rate
TX,FEC
configuration
M
U
X
D
M
U
X
F
R
 -
4
 tra
c
e
TRANSMITTER
RECEIVER
Figure 2.11: iOpener test chip architecture.
swing, so as to provide verification that the same BER can be achieved at a lower SNR
when FEC is enabled. At the receiver, the receive amplifier and the clock recovery unit
(CRU) are designed to achieve the required pre-FEC BER (BERpre). The reformulated
inversionless Berlekamp-Massey architecture [27] is used for decoding.
The analog section of the receiver was designed to provide a BER in the range of 10−4 to
10−6 to the digital section, where the decoder would further reduce the BER. Due to the
inclusion of pre-emphasis at the transmitter, the receiver front-end only needed to contain
a variable gain amplifier to provide the proper amplitude signal to the slicer. Since the
pre-emphasis equalization combats channel ISI and results in a somewhat open received eye
diagram, a full-rate bang-bang CDR was designed to recover the clock and the data which
are then passed to the deserializer.
A 2 kV HBM ESD protection level was targeted in the design of the on-chip ESD protection
circuits. A dual-diode protection scheme was used.
2.6.1 Simulation Results - Performance, Area and Power
The transceiver code rate (CR), data rate (DR), transmit swing (Vsw), burst error, decoding
latency and BERpre values needed to achieve BERpost = 10
−15 are listed in Table 2.3 for
32
Table 2.3: Transceiver Performance Summary
Code CR Vsw DR tburst Latency Required
(n,k,d) (Vppd) Gb/s (bits) (L bits) BERpre
(63,57,3) 0.90 1 5.65 16 2048 10−8
(63,51,5) 0.81 0.5 5.05 32 2080 10−7
(63,45,7) 0.71 0.25 4.46 48 2112 10−6
(63,39,9) 0.62 0.25 3.86 64 2144 10−5
(63,36,11) 0.57 0.25 3.57 80 2176 10−4
each of the implemented codes. The transmit swings for the three highest code rates were
chosen to achieve the required BERpre. The two lowest code rates can be used to achieve
the target BERpost when the receive SNR is lower, for e.g, when the receiver eye-opening is
smaller for channels longer than the 20” FR-4 channel, resulting in BERpre ≈ 10−4 − 10−5.
Table 2.4: Transceiver Power and FOM Summary
CR Ser-Des TX RX Amp CDR FEC Total FOM
mW mW mW mW mW mW mW/Gb/s
0.90 11 47 36 17 7 118 20.9
0.81 11 26 36 17 9 99 19.6
0.71 11 15 36 17 11 90 20.2
Post-extraction simulations, including packaging parasitics, were used to determine the
power of all the custom circuit blocks and a power analysis was performed on the standard
cell circuit blocks. The power breakdown for the complete transceiver as a function of the
code rate is shown in Table 2.4. The simulation results suggest that significant savings in
transmit power are possible through the use of FEC. In going from a code rate of 0.9 to
0.71, the combined power consumption of the transmit driver and the FEC block decreases
from 7.76 mW/Gb/s to 5.8 mW/Gb/s. This is achieved by expending 32 mW less transmit
power at the expense of 4 mW codec power, and accepting a decreased data rate.
The total power consumption for the transceiver ranges from 117 mW for the maximum
transmit swing and a 0.9 code rate to 89 mW for the minimum transmit swing and a 0.71
code rate. Taking into account the change in effective data rate, the transceiver achieves
a minimum power consumption of 19.6 mW/Gb/s with a code rate of 0.81. As presented
33
in Table 2.4, the power efficiency is relatively constant over the implemented code rates
in 90 nm technology. The power efficiency is 2-3.5 times better than designs reported in
the literature ( [28], [29] and [30]) that operate at similar data rates but which employ
high swing transmitters (600mVppd to 1200mVppd) and several taps of equalization to open
the eye. On the other hand, [10] reports a power efficiency of 2.2 mW/Gb/s, which is
achieved using strategies such as PLL sharing, resonant clock distribution etc., all of which
are complementary to the system level techniques presented here.
The transceiver was designed in a standard 90 nm CMOS technology, and utilizes a 1.2
V digital supply and a 1.8 V analog supply. The transceiver was designed to be placed on a
test board using a chip on board (COB) process. Daughter cards will be used to implement
various link lengths between the transmitter and receiver. The testchip was designed to
achieve a post-decoding BER of 10−15 when operating over a standard 20” FR-4 channel.
The layout of the 4 mm× 2 mm test chip is shown in Fig. 2.12 with the component blocks
TX Driver
Digital Core
D e - S e r i a l +
S e r i a l i z e r
RX AMP
CDR
E Y E
M o n i t o r
D L L
C L K
O U T
D a t a O u t
D e - S e r i a l +
S e r i a l i z e r
Loop Back Buffers
D C A P
D C A P
D C A P
Transmitter Receiver
4mm
2m
m
2m
m
Figure 2.12: iOpener layout.
labeled. The size of the test chip was a function of the perimeter required for the large number
of IO ports required to test the individual components of the transceiver. The transmitter,
consisting of the serializer, pre-driver, and differential CML driver, is located on the lower
34
left-hand side of the layout and occupies an area of 0.2 mm2. The receiver, consisting of
the receive amplifier, clock recovery circuitry, and deserializer, is located on the lower right-
hand side of the layout and occupies a total area of 0.6 mm2. The transmitter and receiver
are separated by the synthesized digital logic which contains the encoder, decoder, and test
logic. The digital logic was synthesized as a single block to facilitate loopback testing of
the transceiver. The synthesized digital logic occupies an area of 1.3 mm2. The test chip
also incorporates several analog test blocks to facilitate the testing and measurement of the
receiver. The remaining die area was utilized by the IO pad ring and additional decoupling
capacitance to further reduce the supply noise and to fulfill density requirements.
2.7 Summary
In this chapter, we showed that a combination of coding and modulation reduces power
consumption in high-speed links. Specifically, we employed FEC to relax specifications of
the transmit driver, PLL, ADC and comparator/slicer. A low-power FEC architecture that
achieves random and burst correction within the power budget of back-plane links was pro-
posed. Thus, system-level ideas such as coding and modulation were leveraged to simplify
mixed-signal design and reallocate power to circuits employing digital communication tech-
niques. These circuits benefit from Moore’s law and hence the link-architecture proposed is
expected to scale better than conventional analog-based architectures.
35
CHAPTER 3
IMPACT OF FEC ON DFE-BASED I/O LINKS
In Chapter 2, we proposed and studied the application of forward error-correction (FEC) for
multi-Gb/s links to reduce power and improve BER, and we evaluated the pre vs. post-FEC
BER improvements and the power trade-offs involved for binary BCH codes. In evaluating
FEC performance, we assumed that the slicer errors are uncorrelated. This assumption is not
accurate for DFE-based links. However, we proceeded with the understanding that sufficient
interleaving can be provided at the transmitter (Section 2.5) to spread error bursts across
codewords and cause them to appear as random errors. Interleaving results in a latency
penalty at the decoder. Hence, it is desired that interleaving depth be minimized. In order
to evaluate the impact of DFE induced burst-errors and the impact of interleaving on FEC
performance at the low BER of interest, it is necessary to develop an accurate statistical
model that captures these effects.
Past work has focused on statistical link modeling for uncoded I/O links [3], [15]. Not
much work has been done in analyzing the performance of FEC in the presence of correlated
errors generated in a DFE-based I/O link. In this chapter, we focus specifically on evaluating
the performance of binary block codes in a DFE based link. Binary BCH codes offer good
error correction at moderate to high code rates, making them an excellent candidate for
designing FEC-based low-power I/O links. However, the DFE produces correlated errors
because of error propagation; i.e., a decision error leads to bursts of errors. These error
bursts become severe when the magnitude of a DFE tap is more than half the main tap
This chapter is based on work done in collaboration with Dr. Nirmal Warke from Texas Instruments,
Inc.
36
(cursor). The impact of DFE errors is even more significant in an FEC-based system. The
main contribution in this chapter is to develop an accurate model for evaluating FEC for
I/O links, and employing this model to evaluate the performance of random and burst error
correcting codes for a real I/O link. A rigorous model is necessary, given the very low target
BER unique to this application. The model consists of: 1) an accurate statistical estimate
of the link specific noise sources such as residual ISI and timing jitter, 2) a Markov chain
based DFE model to account for error-correlation, and 3) its extension based on the dynamic
programming principle to compute random and burst error probabilities in codeword blocks.
In the past [31], the effect of DFE error propagation in uncoded links has been modeled
using a Markov chain based approach. This is reviewed in Section 3.1. This Markov chain
model is employed to compute two types of error statistics for FEC-based links: random and
burst. Based on these statistics, the performance of a set of random error correcting codes
(RECC) and burst error correcting codes (BECC) is evaluated in Section 3.3.
3.1 Modeling DFE Error Propagation
A DFE cancels out ISI from past bits using past decisions. If a past decision is in error, it
propagates in the feedback section for a number of symbol periods equal to the DFE length
(LDFE). This phenomenon can be modeled using a Markov chain with memory LDFE. The
signal at the input to the slicer yk is given by
yk = bk + n
random
k + n
dfe−ep
k (3.1)
ndfe−epk =
∑
(bk−m − bˆk−m) ∗ hm (3.2)
where bk and bˆk are the transmitted and detected bits, respectively, n
random
k is the ran-
dom noise component, ndfe−epk is the error propagation component, and hm are the channel
coefficients.
37
It is assumed that DFE errors are the main source of error correlation; i.e., all other noise
sources such as residual ISI outside the DFE window, cross-talk and timing jitter are lumped
into one equivalent noise source. The distribution for nrandomk can be computed by convolving
the individual noise distributions. An error pattern specifies the DFE error corresponding to
the past LDFE decisions. For example, in a link employing 2-PAM modulation, i.e., symbol
alphabet [1,−1] and 2 DFE taps, the error patterns are (−2, 0), (−2,−2), (−2, 2), (2, 0),
(2,−2), (2, 2), (0, 2), (0,−2), (0, 0), leading to a total of Nstates = (2M − 1)LDFE = 9 error-
states. The notation Eik represents the DFE being at the i
th error-state (i = 1, . . . , Nstates)
at time k. Fig. 3.1 depicts some of the transitions in the state transition diagram for a 2-tap
DFE.
-2,2
-2,-2
0,0
-2,0
2,2
0,2
0,-2 2,-2
2,0
Figure 3.1: Markov chain state transitions for a 2-tap DFE. Only some transitions are
illustrated for clarity.
In Fig. 3.1, for example, a transition from error-state (0, 0) to (2, 0) occurs when the
present error-state is (0, 0), i.e., the present and previous decisions are not in error, and in
the next symbol-period, a bˆk = 1 decision is made when actually a bk = −1 was transmitted
resulting in error magnitude 1 − (−1) = 2. Note that in our notation for the error-state,
the left value is the most recent. The state transition probabilities Pr(Eik|Ejk−1) and steady
state probabilities Pr(Eik) are given as
38
Pr(Eik|Ejk−1) =
∑
bk
Pr(Eik|Ejk−1|bk)Pr(bk) (3.3)
Pr(Eik) =
∑
j
Pr(Ejk−1)Pr(E
i
k|Ejk−1) (3.4)
where Pr(Eik|Ejk−1|bk) can be obtained once the distribution of nrandomk is known. The
Markov chain model described in this section is validated by comparing the error pattern
probabilities predicted by theory with those obtained by simulation. This comparison, shown
for a synthetic channel with taps [1 0.5 0.3 0.2 0.1], a 4-tap DFE, transmit symbols [1 -1]
and noise variance 0.1, is shown in Table 3.1. This clearly illustrates excellent agreement
between the two.
Table 3.1: Markov Model Validation
Error log(Pr)-th. log(Pr)-sim.
Pattern (theory) (simulated)
0(0000) -0.0015 -0.0015
1(0001) -3.1066 -3.1061
2(0010) -3.2265 -3.2249
3(0011) -3.7037 -3.7033
4(0100) -3.2257 -3.2242
5(0101) -4.5233 -4.5171
6(0110) -3.7088 -3.7089
7(0111) -5.5332 -5.4949
8(1000) -3.1066 -3.1061
9(1001) -5.0387 -4.9747
10(1010) -4.508 -4.5031
11(1011) -6.2165 -6.1549
12(1100) -3.7060 -3.7055
13(1101) -5.7747 -5.7696
14(1110) -5.5332 -5.4949
15(1111) -6.3904 -6.3010
39
3.2 FEC Model
The Markov chain model described in the previous section is employed to determine error
statistics spanning a codeword block. Two kinds of statistics are of interest - random and
burst. The former are of interest when evaluating the performance of random error correcting
codes. The broad class of binary cyclic codes falls under this category. Another class of
codes are designed to correct burst or correlated errors. A subclass of cyclic codes called
Fire codes and codes designed in higher Galois fields (for e.g., Reed-Solomon codes) belong
to this category. As it is not feasible to perform simulations at the low BER region (10−15)
of interest, it is necessary to accurately model the error distribution in a codeword to get a
good estimate of FEC performance. In this section, we describe a method to compute error
statistics employing a trellis-based approach based on the dynamic programming principle.
3.2.1 Random Error Correcting Code (RECC)
The discussions that follow apply to 2-PAM modulation but can be extended easily to larger
constellation sizes. For the specific case of 2-PAM and LDFE = 2, the state machine depicted
in Fig. 3.1 can be reduced by defining a composite state (i, j), where i, j = 1 and i, j = 0
imply the presence or absence of an error, respectively. The following equations describe the
steps involved in this state-space reduction:
Pr(1, 1) = Pr(−2, 2) + Pr(−2,−2)
+ Pr(2,−2) + Pr(2, 2) (3.5)
Pr(1, 1|1, 0) = Pr(−2,−2|1, 0) + Pr(−2,−2|1, 0)
+ Pr(2,−2|1, 0) + Pr(2, 2|1, 0) (3.6)
Pr(−2,−2|1, 0) = Pr(−2, 2|2, 0) ∗ Pr(2, 0|1, 0) (3.7)
40
where Pr(1, 1) is the probability of two successive bits being in error. Equation (3.6) is
employed to to reduce the number of states to 2LDFE from 3LDFE by discarding information
regarding the exact error values in the state definitions and retaining information about
whether or not a bit was in error, and (3.7) illustrates how each term on the RHS of (3.6)
can be computed.
Figures 3.2(a) and Fig. 3.2(b) illustrate one section of the trellis employed to perform
recursive computation of the statistics involved, when LDFE = 4. Each composite state in
the trellis represents a certain sequence of errors in the past LDFE decisions. In the following,
we will use the term state to refer to the composite error-state. The random error weight
Source Node
(prev. stage: k)
Sink Node
(current stage: k+1 )
0
0
0001
0010
0011
weight ‘j’ path
weight ‘j’ path
(a)
1
1
1001
0010
0011
weight ‘j-1’ path
weight ‘j-1’ path
Sink Node
(current stage: k+1 )
Source Node
(prev. stage: k)
(b)
Figure 3.2: Trellis paths of weights j.
probabilities are updated at each trellis stage as follows: If the error bit at stage k + 1 is 0
(Fig. 3.2(b)),
Prk+1j (i) = Pr
k
j (to(i, 1)) + Pr
k
j (to(i, 2)) (3.8)
41
If the error bit at stage k + 1 is 1 (Fig. 3.2(a)),
Prk+1j (i) = Pr
k
j−1(to(i, 1)) + Pr
k
j−1(to(i, 2)) (3.9)
where to(i, 1) and to(i, 2) denote the two states leading to state i. Prkj (i) denotes a k bit
long path of weight j that passes through state i.
3.2.2 Burst Error Correcting Code (BECC)
A burst error is defined by the difference in position between the first error and the last in
a codeword block. For example, a burst of length j has its first error at position k and the
last error k + j − 1 bits later. To compute the burst error statistics, we define the event
Bkj (i) as the event that a k bit path ends in state i and has burst length j. Pr
k
j (i) denotes
the probability of that event. In order to compute burst pattern probabilities, we also keep
track of the event that an error burst beginning at stage m in the trellis passes through state
i at stage k (denoted as Bkbeg−m(i)).
Source Node
(prev. stage: k)
Sink Node
(current stage: k+1 )
1
1
1001
0010
0011
path beg @ m
path beg @ m
0001 0000
……...
Figure 3.3: Trellis paths of burst length j.
The probabilities for error events beginning at a stage m in the trellis are updated as
Prk+1beg−m(i) = Pr
k
beg−m(to(i, 1))Pr(to(i, 1)→ i)
+ Prkbeg−m(to(i, 2))Pr(to(i, 2)→ i) (3.10)
42
where the terms Pr(to(i, 1/2) → i) are the transition probabilities leading to state i, and
Prkbeg−m(i) is the probability of the event B
k
beg−m(i). At each trellis stage, the error-burst
probabilities are updated based on events that have their last error in a codeword in that
stage. Equation (3.11) governs this update. If new error bit is 1 (Fig. 3.3),
Prk+1j (i) = Pr
k+1
beg−(k+2−j)(i)Pr(i, 0, 0, 0...0) + Pr
k
j (i) (3.11)
where Pr(i, 0, 0, 0...0) is the probability of starting at state i at stage k and observing 0s
for the remaining part of the codeword. Equations (3.8), (3.9) and (3.11) are employed to
estimate the bit error rate (BER) according to (3.12).
BER =
n∑
j=t+1
Pr(j) ∗ j
n
(3.12)
We note here that by using the basic Markov model for the DFE and developing a recursive
equation connecting error statistics at stage k+M to that at stage k, we can easily analyze
the implementation trade-offs mentioned in Section 2.5 (recall that M here refers to the
interleaving depth). Clearly, in (3.8), (3.9) and (3.11), M = 1.
3.2.3 Error Statistics for an AWGN Channel
The random and burst error evaluation model has been verified by comparing random error
weight and burst length probabilities through analysis and simulation up to an error proba-
bility of 10−5 for the synthetic channel considered in Sec. 3.1. A strong agreement between
analysis and simulation is inferred from Table 3.2.
In this section, we evaluate error statistics spanning a codeword block via analysis and
simulation, and compare the two. To do this, we consider a 7.5 Gb/s (fixed) channel rate
transmission across a channel measured to have a 15 dB loss at Nyquist frequency. A 5-
tap DFE is assumed. White Gaussian noise (WGN) is added at the channel output and
43
Table 3.2: Validation of Error Pattern Statistic Computation
Errwt/ log(Pr)-th. log(Pr)-sim. log(Pr)-th. log(Pr)-sim.
Burst random random burst burst
1 -1.5695 -1.5659 -1.5739 -1.57
2 -1.9578 -1.9502 -2.043 -2.0495
3 -3.2211 -3.1692 -2.8531 -2.8511
4 -3.9936 -4.0114 -3.3968 -3.3951
5 -5.0903 -5.0706 -3.8486 -3.8083
6 -6.0251 N/A -4.0844 -4.0122
7 -7.0635 N/A -4.3835 -4.3444
8 -8.0602 N/A -4.5253 -4.5465
9 -9.0894 N/A -4.5782 -4.5613
10 -10.1130 N/A -4.603 -4.622
11 -11.1492 N/A -4.6178 -4.6176
12 -12.1888 N/A -4.63 -4.6258
13 -13.2356 N/A -4.6413 -4.7224
14 -14.2876 N/A -4.6526 -4.7368
15 -15.3456 N/A -4.6641 -4.6511
error probabilities for a 50 bit codeword block are evaluated. Both random and burst error
statistics are computed and listed in Table 3.2. The computed statistics are employed to
estimate FEC performance for various random (errwt > w, w = (3, 5, 7, 9)) and burst
(burst > l, l = (3, 5, 7, 9)) error correction capabilities and illustrated in Fig. 3.4.
Figure 3.4 indicates that weight 3 errors are predominantly burst errors, particularly at
high SNR. This is inferred from the curves for errwt > 3 and burst > 3 which are close
to each other. The burst length distribution is relatively uniform beyond this length. This
explains why the burst error rate drops slowly as l is increased. For random errors, there is
a significant distribution of events for the values of w considered in Fig. 3.4. This is reflected
in the rapidly diminishing error rates in this case.
Fig. 3.5 illustrates the effect of DFE error propagation. The two plots in dashed lines
are based on error statistics in a codeword of length 50. The two plots in continuous lines
are under the assumption that the transmitted bits are employed at the DFE; i.e., there is
no error propagation. At BER = 10−15, the figure illustrates an SNR loss of 6 dB due to
error-propagation. This clearly illustrates the potential benefits of applying interleaving and
44
20 22 24 26 28 30 32 34
−18
−16
−14
−12
−10
−8
−6
−4
−2
SNR (dB)
B
ER
Block Length = 50
 
 
burst > 3
burst > 5
burst > 7
errwt > 3
errwt > 5
errwt > 7
errwt > 9
burst > 9
preFEC
Figure 3.4: Error statistics in 50 bit block.
20 25 30 35
−20
−15
−10
−5
0
Block Length = 50
SNR (dB)
B
ER
 
 
errwt > 3 with DFE−EP
errwt > 5  with DFE−EP
errwt > 3 no DFE−EP
errwt > 5 no DFE−EP
6 dB
Figure 3.5: Effect of error propagation.
45
transmitter pre-coding techniques.
3.3 Latency vs. Performance
In this section, we present the results of evaluating the performance of a set of random and
burst correcting codes. This evaluation accounts for the ISI penalty vs. coding gain trade-off
by fixing data rate to 10.3125 Gb/s and evaluating FEC peformance at the corresponding
channel rate. The channel is measured to have a 19 dB loss at Nyquist frequency. The
codes evaluated are listed in Table 3.3. The transmit swing is fixed at 1200 mV p-p. The
distribution of the link noise sources such as residual ISI and timing jitter are convolved to
obtain an effective noise distribution. The TX introduces 1 ps rms random jitter and 4 ps
duty-cycle-distortion (DCD), and the RX adds 1.4 ps random jitter and 4 ps DCD. The RX
bandwidth was set at 6−7 GHz. These numbers were obtained from SPICE characterization
of the circuits involved in 65 nm CMOS. A 5 tap DFE resulting in a uncoded link BER
= 10−8 is employed in the analysis. The RECC chosen are binary BCH codes (n = 2m − 1)
and the BECC are Fire codes [32] or interleaved Fire codes.
Table 3.3: Codes Evaluated
Code
Rate
Code
Random Error Burst Error
0.96 255,2 511,2 750,3 279,5 558,10 1116,20
0.88 127,2 255,4 511,7 750,10 105,4 210,8 315,12
0.80 127,4 255,6 511,11 750,15 35,3 70,6 105,9
0.64 127,6 255,12 511,21 750,28
The performance evaluation results for different codes are partitioned into the two plots in
Fig. 3.6 and Fig. 3.7. The BER is plotted as a function of the block length, with code-rate
(r) as a parameter. Fig. 3.6 zooms in on the region where the post-FEC BER is worse
than the BER of the uncoded link. In this region, the performance degradation due to ISI
penalty exceeds the coding gain. The r = 0.88 and r = 0.64 curves show worse performance
than r = 0.8. Of the three code-rates, r = 0.8 is optimal. Two distinct trends emerge for
46
50 100 150 200 250 300 350 400
−8
−7.5
−7
−6.5
−6
−5.5
−5
Block Length
B
ER
(255,6)
(127,4)
(35,3)
(70,6)
(105,9) (127,2)
(255,4)
(105,4)
(210,8)
(315,12)
(255,2)
(127,8)
(255,12)
 
 R = 0.8 recc
R = 0.8 becc
R = 0.88 recc
R = 0.88 becc
R = 0.96 recc
R = 0.96 becc
R = 0.64 recc
Figure 3.6: Performance evaluation of block codes.
300 400 500 600 700 800 900 1000 1100
−14
−13
−12
−11
−10
−9
−8
−7
(511,11)
Block Length
B
ER
(750,15)
(70,6)
(511,8)
(750,10)
(511,2)
(750,3)
(558,10)
(1116,20)
(511,21)
 
 
(279,5)
R = 0.8 recc
R = 0.8 becc
R = 0.88 recc
R = 0.88 becc
R = 0.96 recc
R = 0.96 becc
R = 0.64 recc
Figure 3.7: Performance evaluation of block codes.
47
RECC and BECC performance. For RECC, the increased codeword length implies higher t
and higher code-rate r for a given t. The increase in t outweighs the increase in the number
of error events of a given weight, resulting in improved performance with higher n. For
example, as we go from RECC (127, 4) to (750, 15) performance improves monotonically.
For BECC, however, there is an optimal block length at which the burst error correction
capability is matched to the channel burst characteristics. For higher n, the number of
burst error events increases faster than the error correction capability t. For example, as
we move from a (35, 3) through (70, 6) to a (105, 9), an optimal is reached at the (70, 6)
code. From Fig. 3.7, it is evident that a codeword length of 750 or more is necessary to
meet BER = 10−15 for this channel. Simple burst correction codes are not sufficient to meet
the performance requirements. The codes considered so far have been exclusively RECC
or BECC. However, interleaving RECC enables us to strike a balance between random and
burst error correction. The methodology described in this chapter can be easily extended to
analyze this systematically.
3.4 Summary
In this chapter, we studied latency vs. performance tradoff when burst and random error
correction codes are employed in high speed backplane links. In order to evaluate this trade-
off, a method to accurately model the effects of DFE error propagation on FEC performance
in high-speed I/O links is developed. The model is then employed to analyze a typical DFE
based I/O channel and characterize RECC and BECC performance. For the channel stud-
ied, a code rate of 0.8 was determined to be optimal. A codeword length of 750 or higher
is necessary to meet the BER target. Two distinct trends were observed for RECC and
BECC. The performance of the former improved monotonically with n, whereas the BECC
showed best performance at a certain optimum codeword length. The methodology proposed
in this chapter can be easily extended to analyze interleaved binary BCH codes - these can
48
be employed to achieve a good balance between random and burst error correction.
49
CHAPTER 4
BER-OPTIMAL ADC ARCHITECTURE
Chapters 2 and 3 pertained to the first application of the SAMS approach, i.e., application
of forward error-correction (FEC) to simplify mixed-signal design in high-speed I/O links.
This chapter and Chapter 5 pertain to the second application of the SAMS approach that
is the focus of this thesis. In this chapter, we propose an ADC for high data rate commu-
nication links in which the quantization levels and the quantization thresholds are set to
minimize the BER. We term such an ADC a BER-optimal or BER-aware ADC because it
employs a detection criterion and, instead of SQNR, maximizes the probability of detecting
a transmitted bit correctly.
Traditional ADC design is based on a fidelity criterion, where it is assumed that one desires
the ability to reconstruct the input to the ADC from its samples subject to constraints such
as circuit power and process technology. The metric to be optimized, the error between
input and output, is captured by signal-to-quantization-noise-ratio (SQNR) and signal-to-
noise-plus-distortion-ratio (SNDR). Most ADCs today employ uniform quantization; that
is, the levels and thresholds are placed uniformly within the signal dynamic range. As
the SQNR depends strongly on the number of bits BX of the ADC, system design leads
one to determine BX required to meet a specific SQNR or other performance specification.
Unfortunately, large values of BX lead to high power consumption, large area, and increased
input capacitance. In high-speed systems (e.g., in excess of 10 Gsps), low-power ADCs
are particularly difficult to design, and the effective number of bits (ENOB) usually does
not exceed 6 [16–18]. As uniform quantization does not take into account statistics of
the input signal other than the amplitude range, it does not maximize SQNR or minimize
50
Driver
Channel
h(t)
Noise 
v(t)
b[n] ADC
Digital 
Equalizer
slicerx[n] y[n]xc(t)
CDR
CLK
r c
b[n-D]
~
Vmax
-Vmax
(a)
Q
1/T
Quantizer
ADC
CLK
x(t) x[n]
xc[nT]
r
(b)
1
0
PDF of xc[nT]
(c)
Figure 4.1: Role of an ADC in a communication link: a) block diagram of a communication
link, b) functional diagram of an ADC, and c) eye diagram and PDF of the sampled received
signal xc[nT ].
BER. In the context of an ADC-based communication link in Fig. 4.1(a), we show the
eye diagram (Fig. 4.1(c)) of the received signal xc(nT ) prior to quantization (Fig. 4.1(b))
along with its probability density function (PDF) (Fig. 4.1(c)). Signal statistics can be
exploited to assign thresholds and levels in the ADC to improve system performance. The
problem of determining the SQNR optimal set of quantization levels and thresholds was
solved in [33] and [34]. The Lloyd-Max algorithm was proposed to iteratively determine
the optimal levels r and thresholds t of such a quantizer. Recent work [35] has studied the
issue of adaptively computing reference levels for a level-crossing ADC. We show in this
chapter that the Lloyd-Max algorithm improves SQNR in communication links but does
not necessarily reduce BER. The need for optimizing ADCs, especially in multi-gigabit
links, has resulted in significant recent research activity, such as a study of communication
limits under low-precision ADC [36,37] and the application of the mutual-information metric
to design such ADCs [38]. Related work on the application of low-precision ADCs also
includes the use of dither for signal reconstruction [39], the use of a 1-bit ADC for frequency
estimation [40] and ADC threshold optimization for signal amplitude estimation [41]. The
idea of BER-optimal analog/mixed-signal components has been proposed in the context
51
of BER-optimal equalizers [42, 43] and sampling phase adjustment [43, 44]. BER-optimal
ADCs differ from various digitally-assisted ADCs [13, 14] as the latter maximize SQNR.
The rest of this chapter is organized as follows. Section 4.1 presents a numerical gradient-
descent approach for computing BER-optimal levels and thresholds. Section 4.2 compares
the performance of the BER-optimal and traditional ADCs via simulations for different
channels, modulation and equalization techniques.
4.1 ADC Design Methods
Figure 4.1(a) illustrates a typical digital communication link where the ADC at the receiver
is followed by some digital processing prior to detection. Assuming 2-PAM modulation,
the transmitter sends a random sequence of bits b[n] ∈ {±1} through the channel. At
the receiver, the ADC quantizes the signal, and the outputs are subsequently processed to
account for ISI from the channel. A slicer following the digital processor makes a hard
decision on which bit has been transmitted. As shown in Fig. 4.1(b), the ADC consists of a
baud-spaced sampler followed by a quantizer. At a given sampling time index n, the input
to the quantizer is given by
xc(nT ) =
M−1∑
i=0
h[i]b[n− i] + v[n] (4.1)
where b[n] is the transmitted bit, h[i] the baud sampled impulse response of the channel
with memory M , and v[n] is modeled as additive white Gaussian noise with variance σ2.
The noise-free channel output is given by z(nT ) =
M−1∑
i=0
h[i]b[n− i].
The ADC output space R = rk, k = 1, . . . , N has N levels rk and N − 1 thresholds
tk, k = 1, . . . , N − 1, where N is equal to 2Bx . The mapping between xc(nT ) and the
52
quantized signal x[n] is
x[n] = r1 if xc(nT )(−∞, t1]
= rN if xc(nT )(tN−1,∞) (4.2)
= rk if xc(nT )(tk−1, tk] for k = 2, . . . , N − 2
The ADC output is then digitally processed with one of several techniques present in the
communications literature to estimate the transmitted bit sequence b[n]. We look at the
two most commonly employed ADC reference level design criteria, prior to presenting on
BER-optimal ADCs.
4.1.1 Uniform ADC
In a uniform ADC, the quantization levels are spread evenly within the signal dynamic range.
The minimum and maximum input amplitudes expected by this ADC are expressed as−Vmax
and Vmax, respectively. The quantizer step-size is ∆ =
2Vmax
N
= 2Vmax
2Bx
. For sufficiently small
quantization error, q[n] = xc(nT ) − x[n] is assumed to be a uniformly distributed random
variable, bounded between −∆
2
and +∆
2
and independent of the input. Quantization noise
power σ2q is given by E[q
2[n]] = ∆
2
12
. For uniform quantization, SQNR can be calculated from
6.02BX + 4.8− 20 log10 Vmaxσx , where σ2x is the average signal energy in the ADC input. Each
additional bit increases SQNR by about 6 dB.
53
4.1.2 Non-Uniform ADC Lloyd-Max Quantizer
A Lloyd-Max quantizer [33,34] minimizes the distortion measure known as the mean-squared
error E(q2[n]) (MSE), given by
E(q2) = E[(xc − rk)2]
=
N∑
k=1
∫ tk
tk−1
(xc − rk)2fXc(xc)dxc (4.3)
where Xc is the random variable representing input xc(nT ), fXc(xc) is its assumed probability
density function (PDF), rk ∀k = 1 . . . N are the reference levels and tk ∀k = 0 . . . N are
the thresholds. Here, to = −∞ and tN =∞.
Stationary points of the MSE in terms of r and t can be found by differentiation with
respect to r and t [33]:
rk,opt =
∫ tk,opt
tk−1,opt
xcfXc(xc)dxc∫ tk,opt
tk−1,opt
fXc(xc)dxc
(4.4)
tk,opt =
rk,opt + rk+1,opt
2
(4.5)
These equations are often difficult to solve, so the Lloyd-Max algorithm iteratively deter-
mines r and t. Although this algorithm improves SQNR, we find that it is not the same as
minimizing BER. In this section, we first illustrate the need to jointly design the ADC and
the detector through a motivational example. This is followed by a discussion of ADCs in
the context of equalization-based detectors, which are the subject of this chapter.
4.1.3 BER-Optimal ADC: A Motivational Example
Consider a communication link (Fig. 4.2), where the transmitter employs 2-PAM modula-
tion and the channel is represented by the discrete-time impulse response [0.25 0.75]. The
receiver consists of a decision device immediately following the ADC; i.e., a decision on the
54
][nb
][nv
][nh
}75.0,25.0{
][nxc
t
ADC ][
~
nb
PDF of Xc
1
-1
-0.5
0.5
(a) (b)
Figure 4.2: Detection based ADC design example: (a) a communication link where the
receiver ADC acts as detector, and (b) signal distribution at the ADC input.
transmitted symbols is made based on the ADC output. We apply Bayesian hypothesis
testing with the ADC inputs being treated as observations of the random variable Xc[n],
and the two hypotheses on the transmitted symbol being b[n] = 1 and b[n] = −1.
b˜[n] = 1 if Pr{b[n] = 1|Xc[n] = xc} > Pr{b[n] = −1|Xc[n] = xc}
= −1 else (4.6)
Here, xc denotes a particular value realized by the random variable Xc[n]. Now,
Pr{b[n] = 1|Xc[n] = xc} = Pr{b[n] = 1, Xc[n] = xc}
Pr{Xc[n] = xc} (4.7)
=
Pr{Xc[n] = xc|b[n] = 1}Pr{b[n] = 1}
Pr{Xc[n] = xc} (4.8)
55
Assuming that the transmitted symbols are independent and identically distributed and the
additive noise has variance σ2,
Pr{Xc[n] = xc|b[n] = 1} = 1
2
1∑
b[n−1]=−1
1√
2piσ2
e−
(xc−0.25−0.75b[n−1])2
2σ2
=
1
2
√
2piσ2
(
e−
(xc−1)2
2σ2 + e−
(xc+0.5)
2
2σ2
)
(4.9)
Similarly,
Pr{b[n] = −1|Xc[n] = xc} = Pr{Xc[n] = xc|b[n] = −1}Pr{b[n] = −1}
Pr{Xc[n] = xc} (4.10)
and
Pr{Xc[n] = xc|b[n] = −1} = 1
2
√
2piσ2
(
e−
(xc+1)
2
2σ2 + e−
(xc−0.5)2
2σ2
)
(4.11)
From (4.6)-(4.11), the optimal detection rule is given as
b˜[n] = 1 if
1
2
√
2piσ2
(
e−
(xc−1)2
2σ2 + e−
(xc+0.5)
2
2σ2
)
>
1
2
√
2piσ2
(
e−
(xc+1)
2
2σ2 + e−
(xc−0.5)2
2σ2
)
(4.12)
= −1 else (4.13)
Solving the inequality (4.13) for the high SNR case, the observation xc(nT ) can be divided
into four regions with three thresholds, and the decision rule is given by
b˜[n] = 1 if xc ∈ [−0.75, 0)
⋃
[0.75, ∞) (4.14)
= −1 else. (4.15)
This can be understood from the fact that the noise-free channel output z[n] takes four
possible values from the set [−1 − 0.5 0.5 1]. The ADC input, therefore, has a multi-modal
distribution consisting of a mixture of Gaussian modes centered at each of these values. The
56
modes centered at −0.5 and 1 correspond to a transmitted 1 and those centered at −1 and
0.5 correspond to a transmitted −1. The ADC thresholds are designed such that the modes
are mapped to the corresponding hypothesis on the transmitted symbol. The BER-optimal
ADC in this case is clearly not SQNR optimal, as mapping the mode centered at −0.5 to a
1 and vice versa would incur a heavy SQNR cost.
In this chapter, we assume that the ADC output is processed by a linear equalizer (LE)
or a decision-feedback equalizer (DFE) whose feed-forward and feedback coefficients are
denoted by vectors c and d, respectively. The output of an L-tap linear equalizer (LE) will
be the convolution of the ADC outputs and the equalizer coefficients c. The estimate of
the transmitted symbol b[n−D] is b˜[n−D] = sgn(y[n]), where y[n] is the slicer input (Fig.
4.1(a)). In the case of a DFE, b˜[n−D] = sgn
(
L−1∑
j=0
c[j]x[n− j]−
L2∑
l=1
d[l]b˜[n−D − l]
)
. Here
D is introduced to account for delay in the channel and equalizer; it must be chosen carefully
to achieve good BER.
4.1.4 BER Optimal ADC: Design
We propose quantization based on the detection criterion, by setting the levels r and thresh-
olds t non-uniformly using the BER metric. In the system presented in Fig. 4.1(a), an
error is made when b˜[n] 6= b[n] (assuming D = 0), so BER is computed by averaging over
all possible values of y[n] and hence all vectors xn = [x[n]x[n − 1]...x[n − L + 1]] such that
b˜[n] = sgn(y[n]) = sgn(cTxn) (assuming LE-based receiver) produces an error at the slicer,
57
i.e.,
BER = P{b[n] 6= b˜[n]}
=
∑
y[n]
[
P{y[n]}
(
1− b[n]b˜[n]
2
)]
=
∑
xn
[(
L−1∏
j=0
P{x[n− j] = rk}
)(
1− b[n]b˜[n]
2
)]
(4.16)
where P{x[n− j] = rk} is given by
Q
(
tk−1 − z[n− j]
σ
)
−Q
(
tk − z[n− j]
σ
)
(4.17)
P{•} signifies the probability of an event, and Q(•) is the Gaussian Q function.The equalizer
output b˜[n] is given by
b˜[n] = sgn
(
L−1∑
j=0
c[j]x[n− j]
)
(4.18)
A BER-optimal ADC is one where r and t are chosen to minimize (4.16).
For fixed equalizer coefficients and reference level settings, we consider the L-tuple ADC
output space R. The equalizer c uniquely partitions R into hypotheses R0 and R1 (R0,
R1, R2 and R3) corresponding to a detected 0 and 1, respectively, for 2-PAM (4-PAM). The
noise-free L-tuple channel output space T can be classified into T0 and T1 (T0, T1, T2 and T3)
corresponding to a transmitted 0 and 1 (0, 1, 2 and 3), respectively, for 2-PAM (4-PAM).
The ADC must map T0 and T1 to R0 and R1, respectively, i.e., we should observe no errors
when noise is absent. The presence of noise results in T0 events being mapped to R1 and
vice versa, leading to errors at the detector. At high-SNR, the minimum pairwise distance
between L-tuples from R0 and T1 (T0 and R1) determines the most likely error-event. This
58
distance can be viewed as an effective eye opening for the non-uniformly spaced ADC-based
receiver.
A closed form expression for the BER optimal parameters of the ADC, r and t, is difficult
to obtain due to the highly non-linear objective function. Therefore, we employ the gradient
descent algorithm to determine the parameters. The following update equations are used to
compute r iteratively. For the ith iteration of the algorithm, we have
BER = f(h, r, t, c, σ)
ri = ri−1 + µ
(
∂BER
∂r
)
|r=ri−1
≈ ri−1 + µ
(
∆BER
∆r
)
(4.19)
The placement of t remains the same as given by (4.5). To avoid differentiating the sign
function, the gradient is computed by finite differences–each entry in the gradient vector
is obtained by perturbing the reference levels and computing the change in BER due to
this perturbation [45]. The BER cost function can also be optimized with respect to the
reference levels using techniques such as Nelder-Mead which are suitable for non-linear cost
functions.
This algorithm can readily be extended to decision-feedback equalizers by replacing the
right-hand side of (4.18) with sgn
(
L−1∑
j=0
c[j]x[n− j]−
L2∑
l=1
d[l]b˜[n−D − l]
)
.
In this section, we first summarized the commonly employed ADC reference level design
techniques. Further, we motivated the need to employ optimal ADC reference level place-
ment through an example which highlighted the difference between the fidelity and detection
criteria. The BER cost function for a communication link with a non-uniform reference level
ADC and an equalizer-based receiver was presented. The remainder of the chapter examines
the benefits of BER-optimal ADCs for various modulation and equalization scenarios. We
demonstrate through analysis and simulations that the BER-optimal ADC outperforms the
uniform and Lloyd-Max quantization approaches for several backplane-like channels with
59
different levels of ISI.
4.2 Analysis and Simulation Results
In this section, we first present the analysis and simulation methodology employed in this
chapter. This is followed by simulation results demonstrating the effectiveness of the pro-
posed ADC design approach for typical high-speed communication links.
4.2.1 Simulation Methodology
First, given a channel impulse response, a minimum mean squared error (MMSE) linear
equalizer is obtained assuming a uniform ADC. Next, (4.19) is used to iteratively approxi-
mate the minimum BER thresholds and representation levels for the ADC. Equation (4.16)
is then used to compute the BER analytically. We verified our expressions through a com-
bination of Monte Carlo simulations and importance sampling (IS, Section 4.2.1). In order
to isolate the effect of nonuniform quantization, the equalizers in all setups are MMSE
equalizers. In addition, only equalizer inputs are quantized; the equalizer itself has infinite
precision. Signal-to-noise ratio (SNR) was computed by SNR =
M−1∑
i=0
h[i]2
σ2
.
We define the ADC shaping gain as
SG(BER) = SNRold(BER)− SNRnew(BER) (4.20)
to quantify the reduction in SNR achieved via the BER-optimal techniques.
Importance Sampling Review
Importance sampling (IS) is a well-known statistical tool to estimate the probabilities of
rare events. For such events, it is often not computationally feasible to run a Monte Carlo
simulation to estimate the desired probability. Importance sampling relies on a combination
60
of simulation and analysis to determine the estimate. The probability distribution that
governs the occurrence of the rare event is skewed in order to observe it more frequently in
simulations. The IS-estimator applies a correction factor to the conventional Monte Carlo
estimator in order to evaluate the probability of the event, based on the knowledge of the
original distribution and the skewed distribution. In this chapter, where it is desired to
estimate BER < 10−6, we alter the noise distribution by increasing its variance. This leads
to more frequent error events. A correction factor equal to the ratio of the original and
altered distribution is applied to the MC-estimator. The above discussion is summarized in
the following analysis.
Consider an event E defined in the space of a random variable V, which is a function of
m random variables X1, X2,. . .,Xm with known distribution fX(x).
V = g(x1, x2, . . . , xm) = g(X)
The probability of event E is given as
Pr(E) =
∫ ∞
−∞
hE(v)fV (v)dv =
∫ ∞
−∞
hE(g(x))fX(x)dx
where hE(v) is the indicator function for the event E. The Monte Carlo estimate of the
event probability after N trials is given as
Pˆr(E) =
1
N
N∑
i=1
hE(g(x)) (4.21)
If the event E is so rare that it is computationally infeasible to observe it with sufficient
frequency, we skew the distribution fX(x) to obtain f
∗
X(x). The modified estimator, known
61
as the IS-estimator is given by
Pˆr
∗
(E) =
1
N
N∑
i=1
h∗E(g(x))w(x), where (4.22)
w(x) =
m∏
j=1
fX(xj)
f ∗X(xj)
(4.23)
In order to evaluate the performance of a non-uniform ADC at high SNR, we treat the L-
tuple ADC input as the random vector X, the slicer input as V and occurrence of detection
error as the event E of interest. The Gaussian noise distribution was perturbed from fN(n)
to f ∗N(n) by increasing its variance in order to observe errors more frequently. It can be
shown that correction term in the IS estimator is given by
w(x) =
L∏
j=1
fN(nj)
f ∗N(nj)
(4.24)
Figure 4.3: Validating BER analysis through simulation and importance sampling.
Fig. 4.3 demonstrates a comparison of results obtained through BER computation (anal-
ysis) and importance sampling. The BER vs. SNR curves for 3-bit non uniform and
62
4-bit uniform ADCs obtained through analysis show good correlation with those obtained
through simulation and importance sampling, thereby validating the results from analysis
at low BER.
4.2.2 BER-Optimal ADC vs. Lloyd-Max ADC
The BER-optimal ADC is based on the detection criterion, whereas the uniform and Lloyd-
Max ADCs are both based on the fidelity criterion. Although a Lloyd-Max ADC can improve
SQNR, Fig. 4.4 shows that a 2-bit Lloyd-Max ADC followed by a MMSE linear equalizer
results in little improvement in BER compared to a 2-bit uniform ADC followed by a
MMSE LE. This observation indicates that SQNR is not the best metric when the goal
is to reduce BER. In contrast, a receiver based on the detection criterion (2-bit BER-
optimal ADC followed by min-BER linear equalizer, where the equalizer coefficients are
computed in a similar manner as in (4.19) using gradient descent algorithm), results in
significant improvement, surpassing even a 3-bit uniform ADC for SNR > 16 dB. This
clearly demonstrates that the detection criterion is a more effective metric than the fidelity
criterion in communication links.
10 12 14 16 18 20
10−4
10−3
10−2
10−1
SNR(dB)
BE
R
 
 
2Bit Uniform ADC, MMSE Eq
3Bit Uniform ADC, MMSE Eq
2Bit Lloyd−Max Quantizer, MMSE Eq
2Bit minBER ADC, minBER Eq
Figure 4.4: Performance comparison between the BER-optimal and Lloyd-Max ADC for
channel h = [0.1 0.7 0.4].
63
4.2.3 BER-Optimal ADC vs. Uniform ADC
We now present results from the analysis of four different scenarios representing different
modulation, channel and equalization types. The channel models correspond to 20” FR-4
backplane channels carrying 10 Gb/s data. First, we study the simplest practical case of a
low-ISI channel that employs 2-PAM modulation and a linear equalizer. In order to compare
the shaping gains for low-ISI and high-ISI channels, we then investigate a high-ISI channel
with the same modulation (2-PAM) and equalization (LE). The third case study deals with
the most commonly occurring scenario - a high-ISI channel employing 2-PAM and a DFE.
Finally, for the same channel model, we vary the modulation technique (4-PAM) and keep
the DFE-based receiver intact.
The four modulation/channel/equalization scenarios are as follows:
1. 2-PAM, low-ISI, 3-tap LE (Case A)
2. 2-PAM, high-ISI, 3-tap LE (Case B)
3. 2-PAM, high-ISI, 3 feed-forward tap, 2 feedback tap DFE (Case C)
4. 4-PAM, low-ISI, 2 feed-forward tap, 2 feedback tap DFE (Case D)
1) Case A (Fig. 4.5): Fig. 4.5(b) shows that a 3-bit BER-optimal ADC performs better
than a 3-bit uniform ADC. Furthermore, a 3-bit BER-optimal ADC is at least as effective
as a 4-bit uniform ADC. The BER curve for an infinite precision ADC, infinite precision
equalizer is also displayed for comparison purposes. In both the low and high SNR regimes
(BER=10−4 and 10−15, respectively), the shaping gain SG achieved by the BER-optimal
ADC is 2.5 dB.
2) Case B (Fig. 4.6): When channels with high levels of ISI are employed for testing, the
3-bit BER-optimal ADC is significantly better than the 3-bit uniform ADC as shown in Fig.
4.6(b). In this case, performance of the 3-bit uniform ADC does not improve with increasing
SNR due to severe quantization noise. Compared to a 3-bit uniform ADC, ADC shaping gain
64
1 2 3 4 5 6 7
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Time Index n
h[n
]
(a)
10 12 14 16 18 20 22 24 26
10−14
10−12
10−10
10−8
10−6
10−4
10−2
SNR(dB)
BE
R
 
 
3Bit Uniform
3Bit BER−Optimal
4Bit Uniform
Infinite Precision
2.5dB ADC
Shaping
Gain
(b)
Figure 4.5: Performance for a low-ISI channel employing 2-PAM modulation and a LE: a)
sampled impulse response of a backplane-like channel, and b) BER vs. SNR curves for a
3-bit uniform, 3-bit BER-optimal, 4-bit uniform, and infinite-precision ADC, respectively.
1 2 3 4 5 6 7 8 9
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time Index n
h[n
]
(a)
15 20 25 30 35
10−14
10−12
10−10
10−8
10−6
10−4
10−2
SNR (dB)
BE
R
 
 
3Bit Uniform
3Bit BER−Optimal
4Bit Uniform
Infinite Precision ADC
3b BER−Optimal
= 4b Uniform
3dB ADC 
Shaping
Gain Over
4b Uniform
(b)
Figure 4.6: Performance for a high-ISI channel employing 2-PAM modulation and a LE: a)
sampled impulse response of a backplane-like channel, and b) BER vs. SNR curves for a 3-
bit uniform, 3-bit BER-optimal, 4-bit uniform, and an infinite-precision ADC, respectively.
65
SG is too large to be quantified; compared to a 4-bit uniform ADC, SG(BER = 10
−15) = 3
dB.
(a)
2.5 dB
1 dB
5-Bit Uniform
3-Bit BER-Optimal
4-Bit BER-Optimal
10
-2
10
-4
10
-6
10
-8
10
-10
10
-12
10
-14
10
-16
10
-18
, LM
(b)
Figure 4.7: Performance for a high-ISI employing 2-PAM modulation and a DFE: a) sampled
impulse response of a backplane-like channel, and b) BER vs. SNR curves for a 3-bit
uniform, Lloyd-Max (LM) and BER-optimal ADC, 4-bit BER-optimal ADC, and 5-bit
uniform ADC, respectively.
3) Case C (Fig. 4.7): In this study, we compare the uniform, Lloyd-Max and BER-
optimal quantization techniques, for a backplane-like link that employs 2-PAM modulation
and decision-feedback equalization. The 3-bit uniform and Lloyd-max ADCs achieve similar
BER across the SNR range considered, and quantization noise dominates the BER per-
formance. The 3-bit BER-optimal ADC offers a shaping gain SG = 1 dB over the 5-bit
uniform ADC. The 4-bit BER-optimal ADC offers a shaping gain SG = 2.5 dB over the
5-bit uniform ADC.
Table 4.1: 3-bit ADC: SQNR vs. SNR (dB)
SNR 19.5 25.5 31.5 35.5
Lloyd-Max 15 16.3 16.6 16.7
BER-opt 12.7 12.8 13.1 13.1
Table 4.1 summarizes the SQNRs achieved by the 3-bit Lloyd-Max and BER-optimal
66
ADCs at the ADC output. From Fig. 4.7, it is clear that the 3-bit BER-optimal ADC
achieves lower BER than a 3-bit LM ADC. Table 4.1 indicates that the 3-bit BER-optimal
ADC in fact achieves lower SQNR. This confirms that higher SQNR does not imply better
BER.
(a)
2.5 dB
10
-2
10
-4
10
-6
10
-8
10
-10
10
-12
10
-14
B
E
R
(b)
Figure 4.8: Performance for a low-ISI channel employing 4-PAM modulation and a DFE: a)
sampled impulse response of a backplane-like channel, and b) BER vs. SNR curves for a
4-bit uniform, Lloyd-Max (LM), BER-optimal and 5-bit uniform ADC, respectively.
4) Case D (Fig. 4.8): In this study, we compare the uniform, Lloyd-Max and BER-
optimal quantization techniques, for a backplane-like link that employs 4-PAM modulation
and decision-feedback equalization. The 4-bit LM ADC offers improvement over the 4-bit
uniform ADC and achieves the same BER as the 4-bit BER-optimal ADC upto SNR = 38
dB (Fig 4.8(b)). However, the 4-bit BER-optimal ADC has superior asymptotic efficiency
to the LM ADC as observed from the steeper drop in BER at high SNR.
Figure 4.9 illustrates the multi-modal ADC input signal distribution for the channels
considered in this section. The figure also depicts the positions of the reference levels as
obtained by the two conventional techniques (uniform and Lloyd-Max) and the proposed
BER-optimal technique. It is clear from this figure that the BER-optimal levels can be
quite different from the LM and uniform levels. In contrast to the LM-ADC which allocates
67
some levels to capture the outer regions of the ADC input signal, the BER-optimal ADC
is aware of the fact that these signal occurrences are likely due to a strong 1 or a strong 0
and can tolerate more quantization errors.
P
ro
b
a
b
il
it
y
 D
e
n
s
it
y
 F
u
n
c
ti
o
n
 (
P
D
F
)
(a)
P
ro
b
a
b
il
it
y
 D
e
n
s
it
y
 F
u
n
c
ti
o
n
 (
P
D
F
)
(b)
(c)
Figure 4.9: ADC input signal distribution and reference level settings for uniform, LM and
BER-optimal quantization: a) high-ISI channel, 2-PAM, LE (Case B), b) high-ISI channel,
2-PAM, DFE (Case C), and c) low-ISI channel, 4-PAM, DFE (Case D), respectively.
In this section, we first discussed the analysis and simulation methodology employed in
this chapter. We then considered four representative channel-modulation-equalizer scenarios
and evaluated the shaping gains offered by the BER-optimal ADC over the conventional
design techniques. We observed that the technique offered best performance improvement
68
for 2-PAM modulation over a high-ISI channel and a DFE-based equalizer.
4.3 Summary
In this chapter, we proposed the idea of configuring the reference levels and thresholds of
an ADC based on the BER metric, which is the system-performance metric of interest in
a communication link. We first showed through an example that the ADC must be co-
designed with the detector for optimality. Further, equalizer-based links were considered,
and this technique offers greater shaping gains with a 2-PAM based high-ISI channel as
compared to a low ISI channel. The BER-optimal ADC results in a greater than 1-bit
improvement for a 2-PAM high-ISI channel with LE. The shaping gain is even better in a
DFE-based link, where a 2-bit reduction in ADC precision was demonstrated at the same
link BER. The technique is less effective when 4-PAM modulation is employed, as the closer
grouping of constellation points in 4-PAM (for a given peak swing) diminishes the potential
of improving signal margins at the slicer by adjusting reference levels. In Chapter 5, we
consider architectural issues in implementing a receiver that employs the BER-optimal ADC
followed by equalization.
69
CHAPTER 5
ADAPTATION ALGORITHMS AND EQUALIZER
ARCHITECTURES FOR BER-OPTIMAL ADCS
In Chapter 4, we proposed the co-design of the ADC and equalizer in high-speed links,
by optimizing the reference levels of the ADC based on the overall link BER metric [46].
We defined the ADC shaping gain as SG(BER) = SNRold(BER) − SNRnew(BER), to
quantify the reduction in SNR to achieve a given BER via BER-optimal techniques. It was
shown (see Fig. 4.6(b)) that a 3-bit BER-optimal ADC offers significant shaping gains, e.g.,
SG(BER = 10
−15) > 30 dB compared to a 3-bit uniform ADC. In addition, SG(BER =
10−15) = 3 dB when compared to a 4-bit uniform ADC for the example considered. The
ADC reference levels were computed using a gradient-descent on the BER cost function.
This approach is infeasible for implementation, as (a) it assumes knowledge of channel and
equalizer coefficients, and (b) it is too complex to implement in many high-speed links with
tight power and size constraints. An additional issue is the potential increase in the equalizer
complexity. This can occur if the BER-optimal ADC output is to be re-mapped/encoded
in order to operate using subsequent linear arithmetic operators.
The contributions of this chapter are as follows: (a) we demonstrate that there is no
significant increase in equalizer complexity required to achieve the ADC shaping gain esti-
mated in Chapter 4, (b) we develop an approximate minimum BER algorithm (AMBER)
to adapt the reference levels and thresholds, and (c) we propose a VLSI architecture to im-
plement the algorithm, and demonstrate its feasibility through finite precision simulations.
In Section 5.1, we determine the precision required to represent the ADC reference lev-
This chapter is based on preliminary work done by Neerav Mehta (M.S., ECE, University of Illinois at
Urbana-Champaign).
70
els, when a linear equalizer (LE) is employed at the back-end. Further, we show that the
shaping gain can be utilized to reduce the coefficient precision necessary at the equalizer.
An adaptive algorithm for the reference levels is developed in Section 5.2, and an architec-
ture to implement this algorithm is proposed and evaluated in Section 5.3. The algorithm
and corresponding architecture illustrate that the shaping gains predicted in Chapter 4 are
achievable in practice.
5.1 Fixed Reference Level ADC, Fixed Coefficient Equalizer:
Precision Requirements
ADC
( )x t
CLK
[ ]encx n
LUT
[ ]b n
c
xB
(a)
FIR
[ ]x n [ ]y n
ADC
( )x t
CLK
[ ]encx n
ENC
[ ]b n
xB rB
cB
c
(b)
Figure 5.1: Two equalization techniques: a) LUT-based non-linear, and b) linear equalizer
(LE).
The representation of the ADC output values is tied to the implementation of the equal-
izer. Since the quantization levels are no longer equidistant, a change in digital output will
correspond to different changes in analog input. The equalizer employing linear operators
cannot operate directly on such digital outputs. To address this issue, two equalizer archi-
tectures are possible (Fig. 5.1): (1) look-up table (LUT)-based [46], and (2) a tapped-delay
line LE. The LUT-based equalizer operates directly on the L ADC samples, xenc[k], each
represented with Bx bits, in order to make the decision b˜[k]. The tapped-delay line LE first
maps xenc[k] to x[k] with Br bits (Br ≥ Bx), so that the mapping x(t) 7→ x[k] is closer to
linear. Thus, a BER-optimal ADC may need an equalizer with higher complexity than that
needed by the conventional ADC. In this section, we show that this is not the case.
Figure 5.2 shows a realization of the encoder. In this technique, the ADC output code (Bx
bits) is used to select from one of N = 2Bx reference levels. The reference levels are stored
71
Reg
r0
Reg
r1
Reg
r2
Reg
r3
Reg
r4
Reg
r5
Reg
r6
Reg
r7
[ ]encx n
xB
[ ]x n
rB
Figure 5.2: ADC output encoder architecture.
in Br-bit registers. The BER estimates in [46] assumed a floating point representation of
the reference levels. To compare the complexity of the conventional equalizer (referred to
as EQconv(4b)) with that of the encoder and equalizer following the BER-optimal ADC
(EQopt(3b)), we look at the precision requirements for the ADC reference levels (Br bits)
and coefficients of the LE (Bc bits).
The ISI channel shown in Fig. 4.6 is used to evaluate the impact of reference level precision
Br on output BER performance. Note that Br also directly impacts the complexity of
the equalizer and hence, is a critical parameter. To determine Br for the 3-bit BER-
optimal reference levels, we first consider the floating point equalizer case and consider two
representative SNR points: low SNR (24 dB) and high SNR (32 dB). We note that while
these SNR ranges are low and high for high-speed serial links, they may be viewed as high
SNR for, say, wireless links. An estimate of Br can be obtained through the following
inequalities:
Br ≥ Bx (5.1)
Br ≥ log2
 maxi (|ri|)
min
j
(rj+1 − rj)
 (5.2)
Constraint (5.1) is necessary to ensure that the BER-optimal ADC captures at least as
much information as a Bx bit uniform ADC. Constraint (5.2) is necessary to guarantee that
72
the minimum difference in reference levels can be represented properly. At low and high
SNR, Br = 4 based on (5.1) and (5.2) for the example considered. The threshold precision is
determined as Bt = Br+1, as the thresholds are placed equidistant from the reference levels.
In the following discussion, BERx (SNRx) refers to the BER (SNR) when parameter x
has a finite precision representation and all other parameters are floating point.
The finite precision reference levels, rfx, and the corresponding thresholds tfx were used
to analytically estimate BERr,t from (4.16) and via Monte Carlo simulations. When Br = 4,
the BER with fixed-point reference levels (BERr) was found to be better than the BER
of the 4-bit uniform ADC based conventional receiver, at low and high SNR. This implies
that the data precision is the same for the linear equalizers employed with the 4-bit uniform
ADC and 3-bitBER-optimal ADC; i.e., the coefficient precisions for the two equalizers would
determine which equalizer is more complex. We now determine the coefficient precision Bc
for the conventional equalizer.
From the floating-point BER estimates in Fig. 4.6(b), the SNR requirement at the equal-
izer output is determined as SNR = Q−1(BER) for the low and high SNR cases. We
determine Bc such that the fixed-point SNR is within 0.1 dB of the floating-point SNR,
resulting in Bc = 5 and Bc = 6 bits, for the low SNR and high SNR cases, respectively, for
the 4-bit uniform ADC (Bx = Br = 4). Thus, reducing Bc from 6-bits and 5-bits for the
high and low SNR cases, respectively, for the 3-bit BER-optimal ADC (Bx = 3, Br = 4),
we quantize w, and estimate BERr,t,c (BER with finite-precision coefficients and reference
levels) through analysis (4.16) and simulation. It was determined that when Bc = 5 and
Bc = 4 bits for the high and low SNR cases, respectively, BER
opt
r,t,c ≈ BERunifr,t,c . At this
precision setting, the shaping gain offered by the BER-optimal ADC is leveraged to lower
coefficient precision, and hence overall equalizer complexity.
The post-ADC digital processing units for EQconv(4b) and EQopt(3b) were synthesized
using Nangate’s 45 nm cell library [47], at an operating frequency of 400 MHz. The power and
area estimates are summarized in Table 5.1. Note that the area estimates are shown within
73
braces. It is observed that at both low and high SNRs, the power and area requirements of
EQconv(4b) and EQopt(3b) are comparable. This implies that the ADC shaping gain and
associated ADC power reduction can be realized without expending significant additional
power in the equalizer.
Table 5.1: Power(µW) and Area(µm2) Comparison
SNR Low SNR High SNR
(dB) (24) (32)
EQconv(4b) 75 (303) 90 (376)
EQopt(3b) 82 (360) 101 (425)
5.2 ADC Reference Level Adaptation Algorithm
In this section, we develop the AMBER algorithm for adapting ADC reference levels, anal-
ogous to the development of AMBER for equalizer coefficients [42]. The coefficient update
equation in AMBER algorithm is derived by considering the LMS algorithm and modifying
it to perform updates only in the event of an error, as developed in [42] as an approximation
to a minimum BER adaptation. The AMBER coefficient update equation is given as [42]
c[k + 1] = c[k] + µcI[k] sgn (e[k])x[k] (5.3)
where c[k] = {c0[k], c1[k], . . . , cL−1[k]} is the equalizer coefficient vector,
I[k] = 1, if b˜[k] 6= b[k]
= 0, else
The probability of error, BER, given by (4.16), is a non-smooth function of c and r. In
order to develop an adaptation algorithm for reference levels, we assume that the equalizer
coefficients are fixed at c. The slicer input y[k], and the slicer error e[k] are given by
74
y[k] = cTx[k]
e[k] = b[k]−
L−1∑
j=0
c[j]x[k − j] (5.4)
Therefore,
E(e2[k]) = E
{
(b[k]−
L−1∑
j=0
c[j]x[k − j])2
}
(5.5)
The gradient of the square error taken with respect to the reference levels can be written as
∂E(e2[k])
∂rn
= −2E
(b[k]−
L−1∑
j=0
c[j]x[k − j])(
∑
j∈Jn[k]
c[j])

= −2E
e[k] ∑
j∈Jn[k]
c[j]
 (5.6)
where Jn[k] = {j : x[k− j] = rn}. Approximating the gradient of the MSE from (5.6) by its
stochastic counterpart and taking the sign of the error, e[k], we obtain
rn[k + 1] = rn[k] + µr sgn (e[k])
∑
j∈Jn[k]
c[j] (5.7)
where µr is the step-size for reference level update. The thresholds are updated as,
ti =
ri + ri+1
2
, for i = 1, . . . , N (5.8)
Analogous to (5.3), the signed LMS algorithm (5.7) for ADC reference levels can be
modified as follows, to obtain AMBER algorithm for reference levels.
rn[k + 1] = rn[k] + µrI[k] sgn (e[k])
∑
j∈Jn[k]
c[j]. (5.9)
75
To evaluate the algorithm, the equalizer coefficients are initialized to c = copt. In practice,
c can be initialized to [0 . . . 1 . . . 0]T , and the coefficients can be updated using either LMS
or AMBER algorithm with the reference levels uniformly spaced in this period. Once the
equalizer coefficients converge, the coefficient adaptation can be turned off. The reference
levels are then updated by choosing a very small step-size µr owing to the high sensitivity
of the BER to reference levels. To evaluate AMBER, a few sets of initial conditions (on
a uniformly spaced grid, but differing from the others in the full scale value 2Vmax) were
experimented with, and the best performance gains quantified. For the ISI channel (see
12 14 16 18 20 22 24 26 28 30
10−6
10−5
10−4
10−3
10−2
10−1
SNR (dB)
B
ER
 
 
3Bit BER−optimal 
3Bit AMBER
Figure 5.3: Performance evaluation of AMBER algorithm for reference level updates.
Fig. 4.6), Fig. 5.3 shows a comparison between AMBER and the BER calculated using
optimal reference level settings (Section 4.1) for a 3 bit BER-optimal ADC. The reference
levels were initially set to uniformly spaced levels spanning half the full-dynamic range, i.e.,
−Vmax
2
to Vmax
2
. The BER was evaluated based on the threshold values at the end of 4× 105
bit periods.
Figure 5.3 demonstrates that AMBER achieves the shaping gains predicted in [46]. We
carried out simulations up to 28 dB to maintain feasible simulations.
76
5.2.1 Remarks on the Convergence of AMBER
We now consider the slicer mean square error (MSE) cost function and examine convergence
of the corresponding gradient descent algorithm. A gradient descent algorithm based on the
BER cost function is not analytically tractable, so we focus on the MSE at the slicer, which
is a function of the ADC output levels and comparator thresholds. The gradient descent
algorithm for MSE with respect to ADC output levels r, and thresholds t, is different from
that for equalizer coefficients c and d, in that it consists of the following two iterative steps:
rn[k + 1] = rn[k]− µr ∂E(e
2[k])
∂rn
(5.10)
= rn[k] + 2µrE
e[k] ∑
j∈Jn[k]
c[j]
 for n = 1, . . . , N (5.11)
ti =
ri + ri+1
2
, for i = 1, . . . , N − 1 (5.12)
Hence, such an iterative algorithm is distinct from conventional gradient descent for a convex
cost function (e.g., MSE as a function of equalizer coefficients (c)). In order to prove that the
above iterations result in convergence of MSE, we first identify that the iterative equations
(5.11) and (5.12) mirror the iterations in the Lloyd-Max algorithm. The arguments pre-
sented below are similar in nature to those employed to prove convergence of the Lloyd-Max
algorithm when the signal distributions are known. At high SNR, b[k] ≈
L−1∑
j=0
c[j]z[k− j], and
the MSE from (5.5) can be written as
E(e2[k]) ≈ E
{
(
L−1∑
j=0
c[j]z[k − j]−
L−1∑
j=0
c[j]x[k − j])2
}
≈ E
{
L−1∑
j=0
c[j](z[k − j]− x[k − j])2
}
(5.13)
where z[n] is the noise-free channel output. We first note from (5.13) that the MSE is implic-
itly dependent on the ADC output levels r and thresholds t, through the terms containing
77
x[k − j], the ADC output. When t is kept fixed, MSE is a convex function of the ADC
output levels r (expanding the expectation results in terms with quadratic and linear depen-
dence in r). Hence, the gradient descent iteration step (5.11) for the convex MSE function
((5.13), t fixed) results in decreasing values of the MSE. Now, for fixed r, the choice of t
that minimizes MSE is given by (5.12). This is because, for a given channel output level
z[k − j], MSE is minimized if it is mapped to the nearest ADC output level in r, thereby
minimizing the term (z[k − j]− x[k − j])2 in (5.13).
Hence, the sequence of iterations given by (5.11) and (5.12) is a monotonically decreasing
sequence. In addition, MSE ≥ 0, i.e., it has a lower bound. A monotonically decreasing se-
quence that is bounded converges. Therefore, with the above assumptions, MSE convergence
is guaranteed.
5.3 Adaptive Receiver Architecture
F - Block
-
WUD - Block
[ ]b n
][ne
[ ]x n [ ]y n
RL-UD
ADC
DAC
( )x t
CLK
REFV
[ ]encx n
r
ENC
t
rBxB
rB
tB
cB
c
Figure 5.4: Adaptive receiver architecture with a LE.
In this section, we present an architecture (Fig. 5.4) to realize the algorithm developed
in Section 5.2. The ADC reference levels are set by a digital-to-analog converter (DAC).
The ADC consists of pre-amps and metastability latches (Fig. 5.5), that compare the input
signal with the thresholds and generate a thermometer code. A transition detector generates
78
+-
2Mt
CLK
+
-
1Mt
CLK
+
-
1t
CLK
( )x t
+
-
2t
CLK
( )x t
.
.
.
.
( )x t
.
.
.
.
ENC
[ ] ( )  enc xx n B bits
REFV
Pre-amp &
Metastability latches
Transition
Detector
( )x t
Figure 5.5: Flash ADC architecture consisting of a bank of pre-amps that amplify the
difference between the input signal and the quantization threshold. This is followed by
latches that quantize the pre-amp output and a transition detector and encoder that generate
the Bx-bit ADC output.
79
a transition code 1 corresponding to the transition index. This code is then converted to a
Bx-bit index xenc[n]. The index is mapped at the encoder (ENC) to a Br-bit digital represen-
tation of the reference levels as illustrated in Fig. 5.2. The index also feeds into the reference
level update unit (RL-UD). The F-block and WUD blocks realize the conventional filtering
(FIR) and coefficient weight-adaptation blocks, respectively.
[ ]encx n
D D D D
RL-UD1
…..
RL-UD2 RL-UD3 RL-UDN…..
1r 2r 3r Nr
1t 2t 1Nt
c
][ne
2 2 2
…..
,r rl udB
tB
cB
eB
xB
Figure 5.6: RL-UD architecture.
The RL-UD block (Fig. 5.6) consists of an L-register delay line that buffers the ADC
output. This delay line is shared by the update units corresponding to the individual ref-
erence levels (RL-UD1 . . .RL-UDN). The error signal e[n] and coefficients c are inputs to
this block. To implement (5.9), e[n] can be quantized to 1-bit by taking its sign, i.e., Be = 1.
The units RL-UD1 . . .RL-UDN generate the reference levels r1 to rN , which are added and
right shifted 1-bit to generate the ADC thresholds (5.8). The reference levels and thresholds
are represented using Br,rl−ud and Bt bits respectively, where Bt = Br,rl−ud+1. The Br most
significant bits (MSBs) of Br,rl−ud are fed into the encoder.
80
[ ]encx n
D D D D…..
xB
comp
j
comp
j
comp
j
comp
j
0 0 0 0
…..
r
eB
][ne
r
B
D
rB
jr
cB
c0 c1 c2 cL-1
Figure 5.7: Reference level rj update block.
81
Figure 5.7 shows a direct-mapped architecture for updating reference level j as described
in (5.9). The index j is compared against the indices in this tapped delay line to generate the
control signals to the muxes. The equalizer coefficients (c, Bc bits) are inputs to the muxes.
Each mux outputs a 0 when its control signal is 0, and the corresponding coefficient when the
control signal is a 1. The equalizer coefficients corresponding to the selected indices (Jn[k])
are thus summed, multiplied by the step-size (µr) and the sign of the error to generate the
update term. This is then added to the reference level from the previous cycle to generate the
updated reference level. A sorting network might be necessary to guarantee monotonicity in
the levels.
We now demonstrate through fixed-point simulations that the architecture proposed in
this section is implementable. For this purpose, we assume Br = 4, and Bc = 5 as determined
for the high-SNR case in Section 5.1. The precision Br,rl−ud is determined by the stopping
criterion as follows:
Br,rl−ud = log2
 Vmax
min |(µr
∑
j∈Jn[k]
c[j])|
 (5.14)
Based on the parameters (Vmax, µr and c) employed to simulate AMBER in Section 5.2,
and applying (5.14), we determined that Br,rl−ud = 9, Bt = 10. With these precision values,
AMBER was implemented in fixed point and the results are shown in Table 5.2.
Table 5.2: Finite-Precision BER Comparison
SNR 24 28 32
(dB) Bc = 4 Bc = 5 Bc = 5
3-b BER-opt 1× 10−3 6.7× 10−5 2× 10−6
3-b AMBER 1× 10−3 2.9× 10−4 2× 10−5
4-b uniform 1× 10−3 2.6× 10−4 2× 10−5
It can be inferred from the finite-precision BERs (fixed-point coefficient and data) in Table
5.2, that a 3-bit AMBER and a 4-bit uniform ADC achieve similar BERs. This implies that
the adaptation algorithm proposed in this chapter can be implemented in practice and the
shaping gains predicted in [46] are realizable.
82
In Table 5.3, we compare the complexities of conventional LMS algorithm for coefficient
adaptation and AMBER algorithm for reference level adaptation, in terms of full-adders
(FAs). The parameters L, N , Bc, Bx and Br,rl−ud refer to the equalizer length, number of
ADC reference levels, coefficient precision, ADC output precision and reference level preci-
sion in the RL-UD, respectively. The multiplication with adaptation step size is assumed to
be implemented using shift operations, and the WUD accumulator precision is assumed to
be twice the coefficient precision in the F-block. For the precision values Bc, Bx and Br,rl−ud
Table 5.3: Complexity Comparison (Full-Adders (FAs))
Adders Multipliers Total
LMS 2BcL (BcBx + L− 1)BxL 2BcL
+((BcBx + L− 1)Bx)L
AMBER (LBx +Br,rl−ud)N (LBx +Br,rl−ud)N
+((L− 1)Bc)N +((L− 1)Bc)N
determined in this section, Table 5.3 implies that AMBER complexity is 76% more than the
conventional equalizer. However, since the RL-UD block is clock gated after convergence,
this does not present a power overhead. Therefore, for high-speed links employing flash ADC
architecture, the proposed AMBER receiver represents a practical technique to implement
BER-optimal ADCs.
5.4 Summary
In this chapter, we proposed an adaptive algorithm (AMBER) for designing BER-optimal
ADCs with a linear equalizer. We proposed and evaluated an architecture to implement
AMBER. This architecture resulted in a 76% complexity overhead with no power overhead,
as the adaptation is turned off after convergence. This work proves that the ADC shaping
gains estimated in Chapter 4 can be achieved in practice, yielding performance gain and
dramatic ADC power savings.
83
CHAPTER 6
CONCLUSION AND FUTURE WORK
The design of energy-efficient high-speed backplane links via the application of communi-
cation IC design techniques and communications-inspired IC design techniques is rich with
opportunities in the signal processing, communications and VLSI domains. This thesis has
illustrated that a SAMS-based approach simplifies the design of analog components of a com-
munication link. In this chapter, we summarize the contributions of the thesis and speculate
on future research directions.
6.1 Contributions
In this thesis, we presented a SAMS approach to mixed-signal design and demonstrated its
application to (a) FEC-based high-speed back-plane links and (b) BER-optimal ADCs.
1. We demonstrated for a 20” FR4 link carrying 10 Gb/s data, (a) a 18 mW/Gb/s
savings in the ADC, (b) a 1 mW/Gb/s reduction in transmit driver power, (c) up to
6X improvement in transmit jitter-tolerance, and (d) a 25-to-40 mV improvement in
comparator offset tolerance with 3X smaller swing.
2. We developed an accurate statistical model to evaluate the impact of FEC on DFE-
based links. Based on this model, we concluded that in the absence of interleaving, a
20” FR4 link carrying 10 Gb/s data and employing binary BCH codes would require
codes over 500 bits in length to achieve a BER target of 10−15.
3. Adjusting the ADC reference levels to optimize for BER results in a 1-2 bit ADC
84
precision reduction for LE and DFE based links, respectively, thereby leading to ap-
proximately 50%-75% power reduction for a flash-ADC. Moreover, the BER-optimal
ADC achieves performance superior to a Lloyd-Max ADC when 2-PAM modulation is
employed. For 4-PAM, the performance achieved by BER-optimal ADC is comparable
to that of the Lloyd-Max ADC.
4. We proposed the AMBER algorithm for adaptation of ADC reference levels. Finite-
precision analysis of AMBER indicates that reference levels represented with 9-bit
precision are sufficient for a 3-bit BER-optimal ADC to achieve BER equal to that of
a 4-bit conventional ADC. We presented an architecture to implement the algorithm
and realize the ADC power savings. This architecture necessitates 76% area overhead
with no significant power overhead.
6.2 Modeling of Analog Mixed-Signal Components for
SAMS-Based Design
In Chapter 2, we proved that FEC coding gain can be exploited to relax the specifications
of the mixed-signal components of the system such as the ADC, PLL, transmit driver and
comparator. Further, in Chapter 4, we demonstrated that a BER-optimal ADC provides
shaping gain. The FEC coding gain and the ADC shaping gain provide opportunities to
further simplify the design of mixed-signal components such that they are no longer required
to operate in the conventional high performance/power envelope. In order to design such
a system, it is necessary to obtain system-level models of various components that capture
performance and power trade-offs when operating in a relaxed low performance mode. The
model should also interface with the preceding and succeeding blocks in the system. In
fact, [9] attempts such a system-level optimization, under conventional performance/power
envelopes. The latch is an example of a mixed-signal component that finds widespread
application, e.g., in ADCs and PLLs. In a conventional design, the latch operates in a
85
memoryless manner, i.e., the output in a specific clock cycle is a function of the input
in that clock cycle. This is guaranteed by providing sufficient signal swing at the latch
input and designing the latch to have sufficient bandwidth. However, a latch employed in
a SAMS-based link can be permitted to operate with a relaxed input swing and bandwidth
requirement. This motivates modeling the latch circuit when it is operating in such an
unconventional error-prone mode.
In the rest of the section, we demonstrate the challenges in characterizing the error-rate
of a latch, and propose a Markov chain to model a latch subject to bandwidth limitations.
6.2.1 The Latch
Sampler
Vin
Regenerator
Vs Vout
Cs
 
 
 
 
Vin
Vs,Vout
VclkVclk
Figure 6.1: The latch: a) generic block diagram, and b) a specific circuit schematic.
All latches include a sampler front-end and a regenerator as shown in Fig. 6.1(a), and
employ a two-phase clock to sample the input signal Vin on a capacitor Cs on one phase,
and regenerate in the other phase to obtain the final output Vout. For example, a high-
speed static reset based digital latch shown in Fig. 6.1(b) is often used to convert the analog
output of a comparator to generate a full-swing digital signal in an ADC (Vout = Vs). The
latch samples the input signal when Vclk is HIGH, while the regenerator (cross-coupled
inverter pair) is held in a reset mode. When Vclk goes LOW, the cross-coupled inverter pair
regenerates Vout. Usually, multiple regeneration stages are needed to reach full swing and to
86
avoid metastability.
6.2.2 Error Types
Error-free latch operation occurs when the sampled voltage at the end of the sampling phase,
i.e., the seed voltage, is large enough to develop into to rail-to-rail swing during the regenerate
phase. The behavior of the latch is determined mainly by the following factors:
1. The input signal swing Vsw and the sampler bandwidth BW determine the input signal
levels at the gates of the input transistors. This directly impacts the seed voltage.
2. The open-loop gain of the regenerate phase determines the speed with which the seed
voltage is regenerated to full swing.
3. The bandwidth at node Vs (see Fig. 6.1(b)) determines how fast it can be charged and
discharged. A larger bandwidth implies that a smaller swing is required at the input
for error-free operation.
4. The data rate determines how much time is available for the latch to resolve an input
sample to a clean 1 or a 0.
The input signal swing Vsw is tied directly to power consumption. We wish to characterize
the latch in terms of its power consumption and error-rate. By developing system models
for evaluating such power-performance trade-offs, the latch specification problem is absorbed
into the system design process. This is a marked departure from the conventional design flow,
where system design is carried out independent of circuit design, the latter being directed
toward designing high performing circuit blocks leading to high power consumption. In this
report, we focus on obtaining a simple model of a latch exhibiting errors.
To compute the latch error-rate, we first need to identify how errors occur. Latch errors
can be classified as: (a) metastability errors, and (b) memory errors. Metastability errors
are defined as the errors that occur when the seed voltage is not large enough to regenerate
87
into a full-swing signal at the end of the regenerate phase even though the seed voltage
has the correct polarity. Memory errors occur when the input signal swing Vsw or the
sampler bandwidth BW is insufficient to obtain a seed voltage with the correct polarity.
The latch metastability issue is well-known and has been modeled [48] and compensated
for by employing a cascade of regenerators. It is easy therefore to develop a model of a
latch exhibiting metastability errors as a function of power. However, the model in [48]
assumes that the seed voltage has the correct polarity. In this work, we consider memory
induced latch errors. In contrast to metastability errors, memory errors cannot be corrected
by cascading additional regenerators.
6.2.3 Latch Memory Errors: Circuit Simulation
The input-output characteristics of the latch shown in Fig. 6.1(b) are analyzed using SPEC-
TRE in a 1.2 V, 65 nm CMOS process assuming a data rate of 10 Gb/s. The bandwidth
limitation of the sampler is modeled by the resistor Rin and capacitor Cin. An input sam-
pler BW= 5.3GHz was considered. Input swings Vsw = 600 mVppd and Vsw = 800 mVppd
were considered. For BW≥ 6GHz, the latch is memory error-free, i.e., the seed voltage has
correct polarity. For lower values of BW , memory errors begin to appear as the sampler is
unable to track the input signal.
Figure 6.2(a) and Fig. 6.2(b) depict the signal transients for an input bit sequence of
010110, with Vsw = 600 mVppd, and initial states Vout = 0.75 V and Vout = −0.7 V,
respectively. Fig. 6.2(a) highlights the reset phase and regenerate phase in each period. We
also observe that it takes at least two consecutive identical digits (CIDs) for the latch to
transition from a strong 1 to a 0, and vice-versa. When the input bits alternate, the output
retains polarity. Three latch errors occur, as shown in Fig. 6.2(a).
Fig. 6.3(a) and Fig. 6.3(b) depict the signal transients for Vsw = 800 mVppd. In contrast
to the previous case, we notice sensitivity to the initial condition. Even though the initial
88
01 1 1
0
Clock Period
0
Reg.Reset
IC
1
00
1 1
0
IC
Figure 6.2: Latch voltage waveforms for a 010110 input sequence with input swing Vsw =
600mVppd with two different initial output voltages (labeled as IC for initial condition): (a)
Vout = 0.75 V, and (b) Vout = −0.7 V.
0 0
1 1
0
IC
1
0 0
1 1
0
1
IC
Figure 6.3: Latch voltage waveforms for a 010110 input sequence with input swing Vsw =
800 mVppd with two different initial output voltages: (a) Vout = 0.25 V, and (b) Vout =
−0.2 V.
89
condition on Vout seems favorable for the first 0 input in Fig. 6.3(b), the fact that the input
itself is switching from 1 results in a weaker 0 at the input transistor gate. This leads to a
weaker 0 at the output at the end of the first cycle. Consequently, the next input 1 is able
to cause an output transition to 1.
6.2.4 Markov Model
As memory errors are a function of past inputs and the present latch output, a Markov
model is proposed to capture the behavior of a latch exhibiting memory errors. In addition,
as the error statistic is a function of the input swing Vsw, we develop a Vsw-dependent
Markov model. The Markov chain transitions corresponding to the two input swings (from
Sec.6.2.3) are shown in Fig. 6.4 and Fig. 6.5. The states are shown in circles, and the
arrows are labeled using the input/output pair corresponding to each transition. The model
is verified by comparing the latch output bits as obtained by SPECTRE simulations with
the predicted output bits obtained from the proposed Markov model. This was done for a
1000 bit long PRBS sequence which was verified to have every n-bit pattern up to n = 10.
The models shown in Fig. 6.4 and Fig. 6.5 predict 326 and 245 errors respectively, which
matches the SPECTRE results in both number and position of errors. The PRBS sequence
also guarantees that all the states and transitions are realized frequently.
1s 0sW
0/1 0/0
1/01/1
1/1 0/0
Figure 6.4: Markov model for Vsw = 600 mVppd input.
The model corresponding to a differential input swing of 600 mVppd is shown in Fig. 6.4.
This model consists of three states, 1S, 0S and W . The 1S state occurs when the current
input and the previous output are 1, and the 0S state occurs when the current input and
the previous output are 0. When the current input and previous output are of opposite
90
sign, a W state results, where W denotes a weak state. Comparing this model with the
transients shown in Fig. 6.2(a) and Fig. 6.2(b), we can identify the state-transition sequence
as: W (IC) → 0S → W → 0S → W → 1S → W . The outputs can be predicted as
0 → 0 → 0 → 0 → 1 → 1. Note that the model has just one intermediate state, a fact
reaffirmed by the insensitivity of the outputs to the weak initial condition; i.e., both lead to
the same output sequence.
1s
0s
0
1/1
0/0
1
1/1
0/0
1/10/1
0/0 1/0
Figure 6.5: Markov model for Vsw = 800 mVppd input.
The model corresponding to a differential input swing of 800 mVppd is shown in Fig. 6.5.
This model has four states. The 1S and 0S states occur when two consecutive input 1 or 0
occur, respectively. There are two weak states in this case, labeled as 1 and 0. The 1 state
corresponds to a present input of 1 and a previous input of 0. The 0 state corresponds to
a present input of 0 and a previous input of 1. Comparing this model with the transients
shown in Fig. 6.3(a), we can identify the state-transition sequence as 1(IC) → 0S → 1 →
0→ 1→ 1S → 0. The outputs can be predicted as 0→ 0→ 0→ 1→ 1→ 1.
Similarly, the state-transition sequence for Fig. 6.3(b) is identified as 0(IC)→ 0S → 1→
0→ 1→ 1S → 0. The outputs can be predicted as 0→ 0→ 0→ 1→ 1→ 1.
Thus, we find that an input-swing dependent Markov model can be employed to obtain the
system-level behavior of a latch exhibiting memory errors. By correlating Vsw with power,
91
and by incorporating the metastability error model [48], we plan on obtaining a complete
power vs. error-rate model for a latch. These and other models of non-ideal circuit behavior
need to be developed in the future.
6.3 Future Work
This thesis demonstrated that FEC coding gain can be exploited to improve the energy-
efficiency of high-speed back-plane links. However, the problem of optimally exploiting the
coding gain to design such energy-efficient links remains an open problem. Recent work such
as [9] proposes simultaneous system and circuit design space exploration to determine the
optimal architecture and allocation of resources in a given system in the absence of FEC.
This framework should be extended to include FEC-based link designs. The SAMS modeling
work presented in the previous section is a first step in that direction. The design of efficient
adaptation algorithms to tune mixed-signal component parameters based on system level
information is yet another direction that can be explored.
Researchers in communication theory have traditionally developed transmission and recep-
tion algorithms and their performance bounds assuming the availability of infinite processing
complexity and ideal circuit components. The presence of finite-complexity non-ideal (e.g.,
noisy) transmitter and receiver components can be incorporated in channel capacity calcu-
lations in order to determine the true bounds on achievable throughput. The SAMS models
of circuit components can enable such calculations.
92
REFERENCES
[1] G. Balamurugan, “Channel model,” 2005, private communication.
[2] A. Szczepanek, I. Ganga, C. Liu, and M. Valliappan, “10GBASE-KR FEC tutorial,”
Website, http://www.ieee802.org.
[3] V. Stojanovic, “Channel-limited high-speed links: modeling, analysis and design,”
Ph.D. dissertation, Stanford University, U.S.A, 2004.
[4] N. Krishnapura and M. Barazande-Pour, “A 5 Gb/s NRZ transceiver with adaptive
equalization for backplane transmission,” in International Solid-State Circuits Confer-
ence, 2005.
[5] J. Zerbe et al., “Equalization and clock recovery for a 2.5-10 Gbps 2-PAM/4-PAM
backplane transceiver cell,” in International Solid-State Ciruits Conference, 2003.
[6] J. E. Jaussi, G. Balamurugan, D. Johnson, B. Casper, A. Martin, J. Kennedy,
N. Shanbhag, and R. Mooney, “8 Gb/s source-synchronous I/O link with adaptive
receiver equalization, offset cancellation and clock de-skew,” IEEE Journal of Solid
State Circuits, vol. 40, no. 1, pp. 80–88, 2005.
[7] R. Farjad-Rad, C.-K. K. Yang, M. Horowitz, and T. Lee, “A 0.3 micron CMOS 8 Gbps
4PAM serial link transceiver,” IEEE Journal of Solid State Circuits, vol. 35, no. 5, pp.
757–764, 2000.
[8] J. T. Stonick, G.-Y. Wei, J. L. Sonntag, and D. K. Weinlader, “An adaptive PAM-
4 5 Gbps backplane transceiver in 0.25 micron CMOS,” IEEE Journal of Solid State
Circuits, vol. 38, no. 3, pp. 436–443, 2003.
[9] R. Sredojevic and V. Stojanovic, “Optimization-based framework for simultaneous
circuit-and-system design-Space exploration: a high-speed link example,” in Interna-
tional Conference on Computer-Aided Design, 2008.
[10] R. Palmer et al., “A 14mW 6.25Gb/s transceiver in 90nm CMOS for serial chip-to-chip
communications,” in IEEE International Solid-State Circuits Conference, 2007.
[11] E. Prete et al., “A 100 mW 9.6 Gb/s transceiver in 90nm CMOS for next-generation
memory interfaces,” in International Solid-State Ciruits Dig. Tech. Papers, 2006, pp.
253–262.
93
[12] G. Balamurugan et al., “A scalable 5-15 Gb/s 14-75 mW low-power I/O transceiver in
65 nm CMOS,” IEEE Journal of Solid State Circuits, vol. 43, no. 4, pp. 1010–1018,
2008.
[13] A. Meruva and B. Farahani, “A 14-b 32MS/s pipelined ADC with novel fast-convergence
comprehensive background calibration,” in IEEE International Symposium on Circuits
and Systems, 2009, pp. 956–959.
[14] P. Nikaeen and B. Murmann, “Digital compensation of dynamic acquisition errors at
the front-end of high-performance A/D converters,” IEEE Journal of Selected Topics
in Signal Processing, vol. 3, no. 3, pp. 499–508, 2009.
[15] V. Stojanovic and M. Horowitz, “Modeling and analysis of high-speed links,” in Custom
Integrated Circuits Conference, 2003, pp. 589–594.
[16] H.-M. Bae, J. Ashbrook, J. Park, N. Shanbhag, A. Singer, and S. Chopra, “MLSE
receiver for electronic dispersion compensation of OC-192 fiber links,” IEEE Journal of
Solid State Circuits, vol. 41, no. 11, pp. 2541–2554, 2006.
[17] M. Harwood et al., “A 12.5 Gb/s SerDes in 65nm CMOS using a baud-rate ADC with
digital RX equalization and clock recovery,” in IEEE International Solid-State Circuits
Conference, 2007.
[18] P. Schvan et al., “A 24GS/s 6b ADC in 90nm CMOS,” in IEEE International Solid-State
Circuits Conference, 2008.
[19] E. A. Lee and D. G. Messerschmitt, Digital Communication. Kluwer, 1994.
[20] J. G. Proakis, Digital Communications. NY: McGraw-Hill, 2001.
[21] Y. Li, B. Bakkaloglu, and C. Chakrabarti, “A system level energy model and energy-
quality evaluation for integrated transceiver front-ends,” IEEE Transactions on VLSI
Systems, vol. 15, no. 1, pp. 90–103, 2007.
[22] R. Narasimha and N. R. Shanbhag, “Forward error correction for high-speed I/O,” in
Asilomar Conference on Signals, Systems and Computers, 2008, pp. 1513–1517.
[23] H. Helgert and R. Stinaff, “Shortened BCH codes,” IEEE Transactions on Information
Theory, vol. 19, no. 6, pp. 818–820, 1973.
[24] S. Litsyn, “Table of nonlinear binary codes,” Website, www.eng.tau.ac.il/∼litsyn/
tableand/index.html.
[25] G. Balamurugan et al., “Modeling and analysis of high-speed I/O links,” IEEE Trans-
actions on Advanced Packaging, vol. 32, no. 2, pp. 237–247, 2009.
[26] H. Chung et al., “A 7.5-GS/s 3.8-ENOB 52-mW flash ADC with clock duty cycle control
in 65nm CMOS,” in IEEE Symp. on VLSI Circuits, 2009, pp. 268–269.
94
[27] D. Sarwate and N. Shanbhag, “High-speed architectures for Reed-Solomon decoders,”
IEEE Transactions on VLSI Systems, vol. 9, no. 5, pp. 641–655, 2001.
[28] V. Balan et al., “A 4.86.4-Gb/s serial link for backplane applications using decision
feedback equalization,” IEEE Journal of Solid State Circuits, vol. 40, no. 9, pp. 1957–
1967, 2005.
[29] T. Beukema, M. Sorna, K. Selander, S. Zier, B. L. Ji, P. Murfet, J. Mason, Senior,
W. Rhee, H. Ainspan, B. Parker, and M. Beakes, “A 6.4-Gb/s CMOS SerDes core with
feed-forward and decision-feedback equalization,” IEEE Journal of Solid State Circuits,
vol. 40, no. 12, pp. 2633–2645, 2005.
[30] R. Payne et al., “A 6.25-Gb/s binary transceiver in 0.13-um CMOS for serial data
transmission across high loss legacy backplane channels,” IEEE Journal of Solid State
Circuits, vol. 40, no. 9, pp. 1957–1967, 2005.
[31] J. Ashley, B. M. M. Blaum, and C. Melas, “Performance and error propagation of two
DFE channels,” Magnetics, IEEE Transactions on, vol. 33, no. 5, pp. 2773–2775, 1997.
[32] R. E. Blahut, Algebraic codes for data transmission. Cambridge University Press, 2003.
[33] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Transactions on Information
Theory, vol. IT-28, no. 2, pp. 129–137, 1982.
[34] J. Max, “Quantizing for Minimum Distortion,” IEEE Transactions on Information
Theory, vol. IT-6, pp. 7–12, 1960.
[35] K. Guan, S. Kozat, and A. Singer, “Adaptive reference levels in a level-crossing analog-
to-digital converter,” EURASIP Journal on Advances in Signal Processing, 2008.
[36] F. Behnamfar, F. Alajaji, and T. Linder, “Channel-optimized quantization with soft-
decision demodulation for space-time orthogonal block-coded channels,” IEEE Trans-
actions on Signal Processing, vol. 54, no. 10, pp. 3935–3946, 2006.
[37] J. Singh, O. Dabeer, and U. Madhow, “On the limits of communication with low-
precision analog-to-digital conversion at the receiver,” IEEE Trans. Commun., vol. 57,
no. 12, p. 36293639, 2009.
[38] G. Zeitler, “Low-precision analog-to-digital conversion and mutual information in chan-
nels with memory,” in Proceedings of the 48th Annual Allerton Conference on Commu-
nication, Control, 2010.
[39] E. Masry, “The reconstruction of analog signals from the sign of their noisy samples,”
IEEE Transactions on Information Theory,, vol. 27, pp. 735–745, 1981.
[40] A. Host-Madsen and P. Handel, “Effects of sampling and quantization on single-tone
frequency estimation,” IEEE Transactions on Signal Processing, vol. 48, pp. 650–662,
2000.
95
[41] D. Rousseau, G. V. Anand, and F. Chapeau-Blondeau, “Nonlinear estimation from
quantized Signals: Quantizer optimization and stochastic resonance,” in Third Intl.
Symp. on Physics in Signal and Image Proc., 2003.
[42] C.-C. Yeh and J. R. Barry, “Adaptive minimum bit-error rate equalization for binary
signaling,” IEEE Transactions on Communication, vol. 48, no. 7, pp. 1226–1235, 2000.
[43] E. H. Chen et al, “Near-optimal equalizer and timing adaptation for I/O links using a
BER-based metric,” IEEE Journal of Solid-State Circuits, vol. 43, no. 9, pp. 2144–2156,
2008.
[44] A. Singer, A. Bean, and J. W. Choi, “Mutual information and timeinterleaved analog-
to-digital conversion,” in Proc. ITA, 2010.
[45] William H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical
Recipes: The Art of Scientific Computing. NY: Cambridge University Press, 2007.
[46] M. Lu, A.Singer, and N. Shanbhag, “BER-Optimal analog-to-digital converters for com-
munication links,” in International Symposium on Circuits and Systems, 2010.
[47] Nangate, “45 nm cell library,” Website, www.nangate.com/openlibrary/.
[48] H. J. M. Veendrick, “The behavior of flip-flops used as synchronizers and prediction of
their failure rate,” IEEE Journal of Solid State Circuits, vol. SC-15, no. 2, pp. 169–176,
1980.
96
