Performance analysis of pre-equalized multilevel partial response modulation for high-speed electrical interconnects by Guenach, Mamoun et al.
1Performance Analysis of Pre-Equalized Multilevel
Partial Response Modulation for High-Speed
Electrical Interconnects
M. Guenach∗+, L. Jacobs+, B. Kozicki∗, and M. Moeneclaey+
∗Nokia Bell Labs, +Ghent University, Belgium
Abstract
In this paper, we first review the baseband modulation techniques intended for use in short-reach, high-speed electrical
interconnects. Then we briefly introduce the high-level design concepts of the investigated electrical interconnect, indicating the
main limitations and outlining the transceiver complexity related to the advanced modulation designs. We further investigate
finite-complexity linear pre-equalization under an average transmit power constraint of both full-response and precoded partial
response signaling with pulse amplitude modulation (L-PAM) mapping. For a representative electrical interconnect, we argue that
the constellation size, the type of modulation, the detection method as well as the number of pre-equalizer taps should be carefully
selected in order to achieve target error performance at data rates between 100 Gbit/s and 200 Gbit/s. We show that for many
combinations of above mentioned parameters, precoded duobinary 4-PAM yields the best error performance for a fixed average
transmit power.
Index Terms
High-Speed Electrical Interconnects, Modulation, Partial Response, Equalization, Error Probability Bounds.
I. INTRODUCTION
Wireline interconnects, both optical and electrical, are a group of communication systems with the highest serial data
rates in modern telecommunications equipment. Traditionally, optical interconnects have been taking advantage of the large
bandwidth of optical media, such as optical fiber or polymer waveguide, to deliver high-capacity transmission links. Optical
transmission opens the possibility for modulation formats with complex constellations, such as quadrature amplitude modulation
(QAM), as well as polarization division multiplexing . While these techniques have also found a way to the interconnects in
high-performance computers [1], the majority of short-reach applications apply only basic modulation techniques due to the
limitations in power budget for signal recovery and processing [2]. Moreover, optical interconnects carry a price premium due
to the increased transceiver complexity and the difficulty of integrating an optical data path on silicon chips or printed circuit
boards. Therefore, there remains a significant interest in electrical links for short-reach interconnects. Compared to the optical
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
2links, the interconnects using metallic medium are constrained by the narrowband characteristic of the channel resulting from
frequency-dependent insertion loss.
As a result, electrical interconnects can typically be found in chip-to-chip , chip-to-module interconnects , or in connections
between network elements located on different line cards of a device . In these applications, the serial rate limitation of the
interconnect can be overcome by parallelization of data paths. The variety and the large number of electrical interconnects
drives the need for efficient implementations allowing transporting bits at lower transceiver cost and lower expense of power.
Over the last decade the requirements for electrical interconnect speed have evolved from hundreds of Mbit/s to multiple tens of
Gbit/s [3]. The increase in speed has been enabled by advances in semiconductor chip manufacturing leading to growing speed
of transceivers, and in material science, addressing the high frequency-dependent losses of the transmission medium. Regardless
of the advancements in channel design, signals experience degradation during transmission due to channel imperfections. In
particular, the channel has low-pass characteristics due to conductive losses as well as dielectric losses resulting in increased rise
time and fall time of the received signal compared to the transmitted pulses. In addition, a typical interconnect physical channel
consists of a number of elements, including chip escape pins, vias and connectors which introduce impedance discontinuities,
noise coupling, cross-talk and mode conversion [4].
Collectively, these impairments result in inter-symbol interference (ISI) and increased noise in the received signal. Reduced
signal integrity directly affects the achievable error rate of the link and, consequently, impacts the prime performance parameter
of the interconnects. The modulation and equalization are two main design parameters in this small-scale context. At the
modulation level, a vast majority of standard electrical interconnect systems operate with non-return to zero (NRZ) or pulse
amplitude modulation with 4 levels (4-PAM) [3][5]. Despite its simplicity, the NRZ-modulated signal occupies a large bandwidth
and requires a significant amount of equalization. This motivates the use of partial response (PR) duobinary modulation as
proposed in [6]. By design, controlled ISI is introduced in PR signaling [7] to spectrally shape the signal, such that the signal
power is more concentrated at the lower frequencies. Low-complexity signal detection in the case of PR involves a modulo
operation at the receiver; alternatively, at the expense of higher complexity, the inherent redundancy in the signal [8] can be
exploited by performing maximum likelihood sequence detection (SeqDet) by means of the Viterbi algorithm [9].
Equalization is typically employed to combat the effects of degradation introduced by the dispersive channel [10]. When
equalization is performed at the receiver, it consists of a forward filtering operating on the received signal, and can be
augmented by a feedback filter operating on past symbol decisions forming a decision feedback equalizer (DFE) [11]. Although
this technique is effective and employed in many modern transceiver implementations, its suitability for ultra-high speed
interconnects incurs a penalty in form of power consumption, as illustrated by the implementation of an 80 Gbit/s DFE
consuming 4 W [12]. Instead, pre-equalization at the transmitter consists of applying a linear pre-distortion filter to the data
symbols in order to compensate for channel distortion. Pre-equalization is preferred over equalization at the receiver from an
implementation point of view [10], because of among others the required resolution of the analog-to-digital converter is lower
in the former case .
Several authors (e.g., [6][13]-[16]) have investigated pre-equalization in the context of high-speed electrical interconnects.
The authors in [15] consider duobinary signaling, and use a frequency-domain fitting to determine the coefficients of a finite
3impulse response (FIR) linear pre-equalizer. In [16], the combination of a programmable 2-tap pre-equalizer at the transmitter
and an adaptive 4-tap DFE at the receiver is investigated for NRZ signaling. In [6], a 2-tap pre-equalizer with fractional delay
is optimized numerically to minimize a semi-analytically computed bit error rate (BER). In [14], the coefficients of a 6-tap
equalizer are represented by 4 bits, and their values are optimized by means of a numerical search to minimize data-dependent
jitter. In [13], a combination of an FIR pre-equalizer and a one-tap DFE is considered for PR signaling; a minimum mean-
square error (MMSE) criterion is used to determine the filter taps. Most of these papers consider the eye opening (simulated
or measured) or the measured BER as a performance measure.
In this article, we focus on linear MMSE pre-equalization with limited complexity, for generic multilevel mapping and
full-response or precoded partial-response signaling. In section II we review some of the state-of-the-art baseband modulation
schemes with two-level mapping (L = 2) including particular line coding. We show how the power spectral density (PSD)
of any baseband signaling is typically derived. We then discuss the rationale behind the selection of line coding with desired
spectral properties. A number of line coding schemes are used to exemplify these properties. In section III we introduce the
high-level design concepts of the investigated electrical interconnect indicating the main limitations and outlining the transceiver
complexity related to multilevel and PR formats. Section IV considers the optimization of filter tap values at the transmitter and
the scaling factor applied to the signal at the input of the detector. We find that this approach yields a smaller mean-square error
(MSE) compared to the case where (as in [13]) only the filter taps are optimized. Unlike other contributions on pre-equalization,
the analytical derivation of the optimum filter taps and scaling factor is performed by transforming the MSE into an equivalent
but simpler expression that allows a geometrical interpretation. The error performance of the detector is investigated in section
V. Accurate upper and lower bounds on the symbol error probability, that take into account the presence of noise and residual
ISI, are presented in the case of symbol-by-symbol detection (SymDet); these bound are computationally less complex than
the semi-analytically computed error rate from [6]. We point out that in the case of PR signaling, the error performance can
be improved by using SeqDet. Numerical results targeting future interconnect systems operating at 100 Gbit/s and 200 Gbit/s
are provided in section VI, comparing 2-PAM and 4-PAM mapping, full response (FR) and PR (with polynomials 1 +D and
1+2D+D2) signaling, and FIR pre-equalizers with 5 or 11 taps. Based on a simulated frequency response of a representative
electrical interconnect, the different configurations are compared in terms of the MMSE, the eye diagram and the symbol error
rate for SymDet and SeqDet. Conclusions are formulated in section VII.
II. REVIEW OF THE BASEBAND LINE SIGNAL CODING
A. Fundamentals
In baseband (BB) transmission, data pulses are not modulated onto carrier waveforms, as opposed to pass-band transmission
where data pulses carry negligible DC power. The data sequence {an} is first encoded to a sequence {wn} which is applied
to a pulse shaping filter p(t), resulting in a signal s(t) =
∑
n wnp(t − tn). Note that this expression is only valid for BB
transmission that uses one shaping filter p(t). Other non-linear BB systems can use in the general case a shaping filter that
depends on the symbols wn, such as the case of miller coding. All information is carried in symbols {an} which, for instance
when belonging to an alphabet of size L, result in an information rate of R = log2(L)/T bit/s. Hence, each symbol wn carries
4log2(L) bits. Signal coding and pulse shaping can be used to shape the spectral content of the signal s(t). The overall transfer
function of the link including the transmit filter, the channel and the receive filter, governs the extent of ISI or the amount of
energy that leaks from one symbol to another.
B. Line signal coding
Line signal coding translates the binary data stream into pulses for transmission. The different line coding schemes are
chosen to achieve one or more general objectives, namely:
• Desirable spectral characteristics - narrow transmission bandwidth,
• Multiple level transitions or significant power near the harmonics f = k × (1/2T ) for reliable timing recovery, and
• High noise immunity leading to low BER that can be achieved by including some redundancy or correlation in the
transmitted symbols.
When the signal PSD has nonzero spectral content at DC, baseline wandering may occur. A receiver will evaluate the average
power of the received signal (called the baseline) and use it as a reference to determine the value of the incoming data elements
or alternatively subtract the bias such that the detector always sees an unbiased signal. If the incoming signal does not vary
over a long period of time due to a long sequence of identical symbols, the baseline will drift and thus cause errors in the
detection of incoming data elements. Long consecutive sequences of symbols can be avoided by scrambling data and proper
DC balancing may be ensured with 8B/10B or 64B/66B coding that respectively adds 25% and 3.125% overhead. Also, most
realizations of high-speed electrical interconnects do not allow low frequencies to pass or in other words they are AC coupled
[17].
Figure 1. Examples of basic line signal coding.
50 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
F=f*T
PS
D
 /
 P
m
a
x
 
 
Polar NRZ
AMI (Bi-polar NRZ)
Manchester
Miller
0 0.5 1 1.5 2 2.5 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F=f*T
Po
w
er
 /
 P
m
a
x
 
 
Polar NRZ
AMI (Bi-polar NRZ)
Manchester
Miller
Figure 2. Normalized power spectral density (left-hand side) and the corresponding cumulative power spectral density (right-hand side) of line codes using
rectangular pulse shapes p(t).
In the general case, when a stream of a mutually correlated data {wn} is assumed, the PSD of the resulting signal s(t) =∑
n wnp(t−nT ), denoted Ss(f), is proportional to the product of the PSD of the correlated sequence {wn} (denoted Sw(f))
and the magnitude squared of the transmit pulse shaping (|P (f)|2), i.e., Ss(f) = 1T · Sw(f) · |P (f)|2 where T is the symbol
period. In the following, particular cases of line coding are explained. The reader should refer to Fig. 1 for respective example
waveforms generated by these codes, using rectangular pulse shapes p(t).
1) Polar NRZ (binary antipodal) signaling: In the binary antipodal signaling scheme, the ’0’ and ’1’ bits in the data stream
are independently coded into positive and negative pulses, respectively. The PSD of rectangular pulses has the following
undesirable characteristics:
• Non-zero DC content which may be a problem in AC-coupled interconnects.
• No harmonics around k/T which makes the timing recovery1 more difficult in the case of long strings of marks or spaces.
Schemes like differential NRZ ensure that a transition occurs for each new ’1’.
• The PSD contains high frequency components which will be attenuated in the bandwidth-limited interconnects, therefore
leading to distortion of the pulse waveform.
Note that the spectrum has been computed assuming i.i.d. data symbols. When in practice the symbols are not i.i.d., scrambling
can modify the symbol sequence into an i.i.d. sequence.
2) Bipolar (or alternate mark inversion) signaling: In the bipolar signaling scheme, also called alternate mark inversion
(AMI), ’1’ codes into the opposite polarity pulse from the last ’1’ while ’0’ codes into a zero line signal yielding 50% duty
cycle. Hence this scheme has no DC as opposed to P-NRZ. However, long sequences of consecutive ’0’ symbols provide
no transitions from which clock information could be derived. In AMI line coding, correlation is intentionally added in the
polarities of the ’1’ pulses, which will impact the PSD of the line signal as will be shown next.
Although with three signal levels it is possible to carry log2 3 bits/symbol, in AMI only 1 bit/symbol is allocated balancing
the advantages of spectral shaping, timing recovery and error detection. Hence, when studying the coding design for high-speed
interconnects, it is beneficial to investigate the coding inherent to modulation via the correlated signal levels as an alternative to
adding additional redundancy by means of channel coding. This characteristic is further explored in the discussion of duobinary
1By timing recovery we mean i) clock or sampling frequency 1/T recovery and ii) phase or delay shift recovery.
6coding. As shown in Fig. 2, the resulting PSD has the benefits of no DC content, peak at 0.4/T when combined with the
rectangular pulse, hence attractive for bandwidth conservation. Furthermore, redundancy through bipolar violation indicating
error occurrences can be exploited by applying SeqDet rather than SymDet, as will be explained later in the paper.
3) Manchester (or biphase) line code: In Manchester code, ’1’ maps into a pulse containing positive and negative levels
while ’0’ maps to the same shape but with the inverted polarity. This code is very efficient for timing recovery as every symbol
interval includes a level transition. However a penalty is paid in terms of bandwidth because significant energy is placed beyond
1/2T .
4) Miller or delay modulation code: In this line code a data ’1’ codes into a Manchester pulse or its negative maintaining
the continuity from the previous level. A data ’0’ either represented by ’+’ or ’-’ level over the whole symbol interval continues
the previous level if it comes from a Manchester pulse due to ’1’, or flips to the other level if the previous level was due to
’0’ [18]. Essentially two different pulse shapes are used: Manchester and constant levels. It generates at least one transition
every two symbols. The reader is referred to [18] for the exact derivation of the PSD of the Miller (also called delay) code.
This PSD is sharply peaked below 0.4/T and looks attractive for bandwidth conservation.
5) Bipolar N zero suppressed coding : Bipolar N zero suppressed (BNZS) line coding, widely used in T1&DS1 transmission
systems [19], was proposed to improve the synchronization properties of the bipolar line coding discussed above. In BNZS,
sequences of ’N’ consecutive zeros are replaced by a codeword that contains intentional bipolar violation. By doing so, the
synchronization is improved as the maximum run length of zeros is reduced to N-1. It has been shown that this modification
of bipolar line coding makes the exact computation of the PSD very tedious [20]. However, the resulting PSD is similar to
the PSD of bipolar line coding.
III. END-TO-END DESIGN AND COMPLEXITY
The design of an electrical interconnect requires the consideration of multiple aspects of the end-to-end system [21][22].
It involves the choice and optimization of the following parameters in order to realize the required link capacity: target cost,
performance targets, number of lanes forming a link, serial rate of transceiver, modulation, equalization and coding of electrical
signals, material and form factor of the transceiver packaging, material and geometry for line card and backplane, routing and
impedance control of traces, size and density of connector.
These elements, schematically illustrated in Fig. 3, represent only a high-level view for the particular case of short-
reach electrical interconnect elements which are affected by design decisions. Depending on the application and particular
requirements, also other parameters will be included, such as power consumption, heat dissipation and the material process
in which transceivers are realized. For this discussion we will be focusing on the implications of two design aspects of the
interconnect, i.e., path loss and transceiver complexity.
7Figure 3. End-to-end electrical interconnect.
A. Electrical interconnect loss
Electrical interconnects are invariably limited by the available bandwidth of the medium. Propagation in copper links,
typically laid out on a printed circuit board (PCB) translates into frequency-dependent signal attenuation and, consequently,
signal distortion due to ISI as well as cross-talk interference. The performance of the PCB traces is influenced by the selected
materials and the geometry. Material characteristics determine the frequency-dependent loss. Similarly, in the connector sub-
system the material, the mating of the two parts and the footprint/breakout are all impacting the channel characteristics . An
electrical reflection caused by any of these elements will lead to a significant variation of the insertion loss profile of the
channel at high frequencies. An insertion loss variation will lead to signal integrity deterioration. Such challenges can be
mitigated with the use of advanced materials. Due to the resulting cost, however, this is not the preferred solution. In addition
to the loss, densely-spaced traces on a backplane result in increased cross-talk, which is further exacerbated by the use of
higher frequencies [23].
Consequently, the design of the appropriate modulation and line coding aims at efficient distortion mitigation and noise
resilience. Considering the aforementioned constraints of the high-speed interconnects, the goal is to limit the bandwidth
occupied by the signal, while maximizing the information carrying capacity. Balancing the choice of multilevel or advanced
versus basic modulation formats brings the cost and power consumption of additional circuitry on the transmitter and receiver
side.
B. Transceiver complexity
The benefits of advanced modulation and line coding formats come at the cost of increased transceiver complexity. The
complexity originates from the need to perform additional analog or digital operations on the signal before and/or after the
transmission through the interconnect medium. In addition to the equalization, multilevel and PR formats require pre- or
post-coding.
A basic transmitter with PR duobinary precoder is shown in the left-hand side of Fig. 4. The precoder simplifies decoding
and avoids error propagation. The implementation complexity lies in introducing the delay feedback loop and the logic gate
operating at the symbol rate. Without precoding, the complexity shifts to the duobinary decoder (post-coding), as shown in the
right-hand side of Fig. 4. In such a case a duobinary decoder requires feedback in the critical path including 2 levels of logic
8and a feedback path which has to complete within a single symbol period. In systems without precoding, unbounded error
propagation due to feedback in the decoder may occur. In comparison, when precoding is implemented in the transmitter, both
the error propagation and the stringent requirement for feedback loop in the receiver are removed as depicted in the right-hand
side of Fig. 4.
Figure 4. Basic transceiver block diagram of PR duobinary.
A high-speed duobinary precoder can be implemented using a half rate precoder with parallel lanes, feeding two or four
parallel signals before multiplexing with data rates of one-half or one-quarter of the transmission bit rate. The final precoded
signal is obtained by multiplexing the precoder output bit by bit [24]. Regardless of the precoding stage, the increasing symbol
rates are challenging to the receiver circuitry. A viable solution to this challenge is an integration of the decoder with two
levels of demultiplexing, and placement of the decoding logic in-between the demultiplexer stages [25].
The performance benefits of the advanced modulation formats come with yet another implementation complexity. Namely,
reception of the multilevel waveforms. In the case of NRZ modulation, the high-speed digital receivers are optimized for
reception of a binary signal. The handling of multiple signal levels in partial-response or e.g. 4-PAM formats leads to an
increase in the number of logic elements required to implement the receiver. The typical electrical interconnect is implemented
as a sub-element of a larger integrated circuit which precludes the use of analog-to-digital converters (ADCs). These ADCs are
characterized by high power consumption and require large circuit area to implement [26]. As an alternative, solutions tailored
to the particular modulation format have gained momentum. Custom receiver circuits have been shown for both duobinary
signals [27] as well as for 4-PAM modulation [28]. These designs show that high-symbol rate operation combined with
multilevel modulation require advanced circuitry; the latter is realized in recent CMOS and BiCMOS technologies and with
broadband circuit impedance matching leading to more complex designs and smaller tolerances than in the case of traditional
NRZ modulation.
9Figure 5. Evolution of short-reach electrical interconnects 1997~2016: a) interconnect bit rate and presentation year; b) number of equalization taps in function
of interconnect bit rate; c) transceiver chip die area in function of interconnect bit rate.
As a result of this growing circuit complexity, the growth in bit rates of short-reach electrical interconnects, which had been
increasing steadily throughout the first decade of the 21st century, appears to have saturated short of reaching the 100 Gbit/s
benchmark. This is shown in Fig. 5 a). The saturation of the interconnect speed comes in part from the limited capabilities to
equalize the signal before the transmission. As the serial rate of the interconnect is increased, the number of equalization taps
(combined, feedforward equalizer taps, DFE taps and poles in continuous-time linear equalizers) must be reduced, as shown
in Fig. 5 b). This limit in the number of equalization capability stems from the significant increase in transmitter chip area
(and the corresponding power consumption) associated with the high transmit data rate. This is shown in Fig. 5 c) where the
transmitter area in logarithmic scale is plotted as a function of transmit bit rate.
From the above analysis it is clear that in order to realize the high-speed interconnect targeting multiple 100 Gbit/s, it
is necessary to realize systems with limited equalization effort. The first step in this direction is to quantify the benefits of
applying advanced modulation.
10
IV. PRE-EQUALIZATION OF PARTIAL-RESPONSE SYSTEMS
Figure 6. Precoded PR system
We consider the precoded PR [7] system from Fig. 6, characterized by a polynomial hT (D) = 1 +
∑
m>0 hT ,mD
m with
integer coefficients. At the transmitter, the precoder converts a sequence of i.i.d. L-ary digits {an}, that are uniformly distributed
over the set {0, 1, ..., L− 1}, into a sequence {bn}, according to
bn =
[
an −
∑
m>0
hT ,mbn−m
]
L
(1)
where [x]L denotes the modulo-L reduction of x to the half-open interval [0, L). We restrict our attention to the case where
L is an integer power of 2. The resulting precoder output {bn} consists of i.i.d. L-ary digits that are uniformly distributed
over the set {0, 1, ..., L− 1}. The sequence {bn} is mapped to the symbol sequence {dn} according to dn = 2bn−L+ 1, so
that dn belongs to the L-PAM constellation Ad = {−(L− 1), −(L− 3), ..., L− 3, L− 1}; we denote σ2d = E[d2n] = L
2−1
3 .
The L-PAM symbols {dn} are applied to a linear pre-equalizer that operates at the symbol rate 1/T , where T stands for
the symbol interval. Denoting the pre-equalizer coefficients by {gm}, the corresponding pre-equalizer transfer function is
G(ej2pifT ) =
∑
m gme
−j2pifmT . The output of the pre-equalizer is fed to a fixed unit-energy transmit filter Htr(f). Introducing
the notation < X(ej2pifT ) >= T
∫ 1/(2T )
−1/(2T )X(e
j2pifT )df , the resulting transmit symbol energy Etr is obtained as
Etr = σ
2
d
〈|G(ej2pifT )|2Rtr(ej2pifT )〉 (2)
= σ2dg
TRtrg
where g is a vector containing the pre-equalizer coefficients, Rtr is a Toeplitz matrix determined by
(Rtr)m,n =
∫
|Htr(f)|2 ej2pif(m−n)T df (3)
and Rtr(ej2pifT ) = 1T
∑
n
∣∣Htr (f − nT )∣∣2. When Htr(f) is a unit-energy square-root Nyquist filter, Rtr becomes the identity
matrix, and Rtr(ej2pifT ) = 1. The transmitted signal enters a channel with transfer function Hch(f), and is affected by additive
white Gaussian noise (AWGN) with spectral density N0/2. The received signal is applied to fixed filter Hrec(f). The receiver
filter output is sampled at the symbol rate at instants nT + τ , and the resulting samples are multiplied by a scaling factor 1/ξ.
11
Introducing Hc(f) = Htr(f)Hch(f)Hrec(f), the scaled sample zn can be represented as
zn =
1
ξ
∑
m
dn−m
(∑
k
gkhm−k
)
+ νn (4)
where hm = hc(mT + τ) is the sample of the impulse response hc(t) of Hc(f), taken at mT + τ . The variance of νn in (4)
is given by σ2ν = σ
2/ξ2, with σ2 = (N0/2)
∫ |Hrec(f)|2df denoting the noise variance at the output of the receiver filter. The
sampling delay τ is a design parameter, which affects the value of the coefficients {hm}. We intend to select the coefficients
{gm} and the scaling factor 1/ξ such that zn in (4) is close to wn given by
wn = dn +
∑
m>0
hT ,mdn−m (5)
subject to the transmit power constraint (2). Note from (5) that we take for wn a specific linear combination of the current
and past symbols {dm}, where {hT ,m} denote the integer coefficients of the PR polynomial hT (D) that has been used in the
precoding operation (1); it is explained in section V how the receiver detects the symbol an from a noisy version of wn. The
special case where hT (D) = 1 is referred to as FR signaling; in the case of FR, (1) and (5) reduce to bn = an and wn = dn,
respectively.
For the sake of practical implementation, we focus on a pre-equalizer with a finite number (Lg) of coefficients, i.e., g =
(g0, g1, ..., gLg−1)
T ; at the end of this section we point out that restricting our attention to a causal pre-equalizer represents
no loss of generality. Introducing the matrix H and the vector hT , with (H)m,n = hm−n and (hT )m = hT ,m, we rewrite (4)
as
zn = wn +
∑
m
dn−m(
1
ξ
Hg − hT )m + νn (6)
where the second term in (6) denotes residual ISI. Assuming that the impulse response hc(t) has a finite duration, the coefficients
hm are zero for m /∈ {−Lh,min, −Lh,min+1, ..., Lh,max}, so that (at most) Lh = Lh,min+Lh,max+1 coefficients are nonzero;
note that Lh depends on the duration of hc(t), while Lh,min is a function of the sampling delay τ . Hence, H and hT have
nonzero rows only for the row index ranging from −Lh,min to Lh,max +Lg−1 and from 0 to LT −1, respectively. Therefore,
the summation index m in (6) can be restricted to the finite range Mfin = (−Lh,min, −Lh,min + 1, ..., Lh,max + Lg − 1) ∪
(0, 1, ..., LT − 1). The closeness of zn to wn is expressed by the mean-square error (MSE) E[(zn − wn)2], given by
E[(en)2] , E[(zn − wn)2] = σ2d||
1
ξ
Hg − hT ||2 + σ
2
ξ2
(7)
In the following, we will select the pre-equalizer coefficients g and the scaling factor ξ such that (7) is minimized under the
constraint (2).
Before minimizing the MSE, we will turn (7) into an equivalent expression, which allows a geometrical interpretation.
Considering the singular-value decomposition (SVD) HR−0.5tr = UΣV
T, where R−0.5tr is the inverse of R
0.5
tr , with R
0.5
tr R
0.5
tr =
Rtr, we define the invertible transforms g = R−0.5tr V · x and hT = U · q, which convert the MSE (7) into
E[(en)2] = σ2d
∑
m∈M1
(
1
ξ
smxm − qm)2 + σ2d
∑
m∈M0
q2m +
σ2
ξ2
(8)
12
and the constraint (2) into
σ2d
∑
m∈Mfin
x2m = Etr (9)
where M0 ⊂Mfin is the subset of indices for which the corresponding eigenvalues of HR−1tr HT are zero, M1 = Mfin \M0,
and sm = (Σ)m,m for m ∈ M1. Denoting the m-th column of U by U<m>, the first term in (8) depends on the projection∑
m∈M1 qmU
<m> of hT on the column space of H, and is a function of both x and ξ; this term can be canceled by a proper
selection of the pre-equalizer taps. The second term in (8) is not affected by x nor ξ, and therefore represents the irreducible
part of the MSE; this term equals σ2d times the squared magnitude of the component of hT that is orthogonal to the column
space of H. The sum of both these terms denotes the contribution from the residual ISI, and equals the first term of (7).
The third term in (8) represents the noise contribution, which is affected by the scaling factor ξ. Minimization of the MSE
(8) will yield optimum values of (x, ξ); having obtained x, the actual optimum pre-equalizer coefficients g are computed as
g = R−0.5tr Vx.
A suboptimum approach, adopted in [13], consists of minimizing the MSE for a fixed ξ, and then selecting ξ such that the
constraint imposed by the transmitter is satisfied. When using for the MSE the expression (8) rather than (7), this approach
‘yields xm = ξsubqm/sm for m ∈M1 (which cancels the first term in (8)) and xm = 0 for m ∈M0, with
ξ2subσ
2
d
∑
m∈M1
q2m
s2m
= Etr (10)
The corresponding MSE is given by
(E[(en)2])sub = σ2d
∑
m∈M0
q2m + µσ
2
d
∑
m∈M1
q2m
s2m
(11)
where µ = σ2/Etr. Essentially, this solution minimizes the residual ISI under the transmit energy constraint. This approach is
not optimum, because during the optimization over x the coupling between x and ξ, introduced by the constraint (9), is ignored.
The approach that is optimum in terms of MSE involves the joint minimization of (8) w.r.t. x and ξ under the constraint (9).
The resulting minimum MMSE solution is xm = ξmmseqmsm/(s2m + µ) for m ∈M1 and xm = 0 for m ∈M0, with
ξ2mmseσ
2
d
∑
m∈M1
q2ms
2
m
(s2m + µ)
2
= Etr (12)
The resulting minimum MSE is given by
(E[(en)2])mmse = µ2σ2d
∑
m∈M1
q2m
(s2m + µ)
2
+ σ2d
∑
m∈M0
q2m + µσ
2
d
∑
m∈M1
q2ms
2
m
(s2m + µ)
2
(13)
= µσ2d
∑
m∈M1
(
q2m
s2m + µ
)
+ σ2d
∑
m∈M0
q2m (14)
The sum of the first and second term in (13) denotes the contribution from the residual ISI, which is larger than the corresponding
contribution (first term in (11)) for the suboptimum pre-equalizer; however, as ξ2mmse > ξ
2
sub, the MMSE pre-equalizer gives
rise to a smaller noise contribution (third term in (13) smaller than second term in (11)). The net effect is a smaller total
MSE for the MMSE solution (first term in (14) smaller than second term in (11)). At high signal-to-noise ratio (SNR) (i.e.,
13
µ minm∈M1 s2m), both approaches yield essentially the same MSE.
The MSE performance of the pre-equalizer depends on the sampling delay τ at the receiver. This delay can be decomposed
as τ = nsT + T , where ns = bτ/T c and 0 ≤  < 1; nsT and T denote the integer delay and fractional delay, respectively.
Basically, nsT and T should be selected such that the sampling delay compensates for the delay introduced by the transfer
function Hc(f) from the transmit filter input to the receive filter output. When the implementation of the sampling clock does
not allow modifying the fractional delay T , only nsT can be adjusted, by simply introducing the appropriate integer delay
at the sampler. Having considered a causal pre-equalizer g = (g0, g1, ..., gLg−1)
T is without loss of generality: a non-causal
FIR pre-equalizer can be turned into a causal pre-equalizer that yields the same performance, by introducing additional delay
at the transmitter and applying the same additional delay to the sampler.
V. ERROR PERFORMANCE ANALYSIS
We consider a detector that ignores the presence of the residual ISI in (6), and therefore assumes zn = wn + νn, with wn
given by (5). It can be verified that
[wn]2L = [2an − (L− 1) hT (D)|D=1]2L (15)
so that the modulo-2L reduction of wn depends only on the digit an. Based on the relation (15), SymDet can be performed on
[zn]2L [7]. The corresponding decision aˆn is given by aˆn = α, where α ∈ {0, 1, ..., L− 1} minimizes F ([zn]2L, [w(α)]2L) ,
with w(α) = 2α − (L − 1) hT (D)|D=1 and F (x, y) = min(|x − y|, 2L − |x − y|). The resulting symbol error probability
PE = Pr[aˆn 6= an] in the absence of residual ISI is well approximated by PE = 2Q( 1σν ) [11], where Q(x) is the complement
of the cumulative distribution function of a zero-mean Gaussian random variable with unit variance. A better error performance
is achieved by applying SeqDet, i.e., we look for the sequence {aˆn} for which the corresponding sequence {wˆn} minimizes
the squared Euclidean distance
∑
n(zn − wˆn)2; SeqDet exploits the correlation that is present in {wn} due to PR signaling,
and is implemented efficiently by means of the Viterbi algorithm. In the absence of residual ISI, the resulting PE is essentially
proportional to Q(dmin2σν ), where d
2
min denotes the minimum of the squared Euclidean distances between allowed sequences
{wn} [11]; hence, for given hT (D), SeqDet yields a performance gain (expressed in dB) of 10 log(d
2
min
4 ) over SymDet. In
the case of FR, we have wn = 2an − (L− 1), so that SymDet gives rise to aˆn = α, where α ∈ {0, 1, ..., L− 1} minimizes
|zn − w(α)|; in the absence of residual ISI, this yields PE = 2L−1L Q( 1σν ). The actual error performance of the detectors
described above is deteriorated by the presence of residual ISI. Here we investigate the performance of the SymDet, taking
into account the residual ISI that results from a finite-length pre-equalizer. The performance of the Viterbi-based SeqDet will
be assessed by means of computer simulations in section VI.
Taking into account that the SymDet of the digit an in the case of PR signaling is based on [zn]2L, a correct symbol
decision is obtained when zn − wn ∈ S, with S =
⋃
i∈Z(2iL − 1, 2iL + 1). The sample zn from (6) can be represented as
zn = wn + ISIn + νn, where
ISIn =
∑
m
dn−mem (16)
14
represents the residual ISI. The coefficients em in (16) are obtained as em = htot,m−hT ,m, where {htot,m} are the coefficients
of the filter Htot(ej2pifT ) from the pre-equalizer input to the scaled output of the receive filter, i.e.,
Htot(e
j2pifT ) =
1
ξ
G(ej2pifT ) ·H(ej2pifT ) (17)
with H(ej2pifT ) =
∑
m hme
−j2pifmT ; note that Htot(ej2pifT ) does not depend on the PAM constellation size. Let us denote
by dn the vector of data symbols that contribute to ISIn; in order to emphasize the dependence of ISIn on dn, we write
ISIn = isi(dn). The symbol error probability, defined as PE = Pr[aˆn 6= an], can be expressed as PE = E[PE(dn)], where the
expectation is over the symbol vector dn, and PE(dn) = Pr[isi(dn) + νn /∈ S] is the symbol error probability conditioned on
dn. One can easily verify that
PE(dn) = 1−
∑
i∈Z
(
Q
(
∆i,−(dn)
σν
)
−Q
(
∆i,+(i,dn)
σν
))
(18)
where ∆i,−(dn) = 2iL−1− isi(dn) and ∆i,+(dn) = 2iL+1− isi(dn). The infinite summation over i in (18) can be truncated
to only a few terms; more specifically, when for given dn we have isi(dn) ∈ [(2i0 − 1)L, (2i0 + 1)L], the summation index
can be safely restricted to i ∈ {i0 − 1, i0, i0 + 1}.
Let us restrict our attention to the practically important case where the pre-equalizer has been designed such that no decision
errors occur when noise is absent, i.e., the eye opening at the receive filter output is not closed at the decision instants nT + τ .
Denoting by isimax the maximum of |isi(dn)| over all possible dn, the eye is open when isimax < 1 (implying that |isi(dn)| < 1
for all dn). It follows from (16) that
isimax = (L− 1)
∑
m
|em| (19)
Assuming that isimax < 1 and σ2ν  1, the conditional error probability PE(dn) is well approximated by keeping in (18) only
the terms with i = 0, i.e.,
PE(dn) = Q
(
1 + isi(dn)
σν
)
+Q
(
1− isi(dn)
σν
)
(20)
Using the approximation (20) instead of the exact expression (18), we obtain
PE = 2E
[
Q
(
1 + isi(dn)
σν
)]
(21)
where we have taken into account that the vectors dn and −dn have the same probability. In the absence of residual ISI, (21)
reduces to PE = 2Q
(
1
σν
)
.
Let us denote by Me the set of indices m for which em in (16) is nonzero, and by Ne the number of elements in Me. The exact
computation of the expectation E[PE(dn)] then involves a summation of LNe terms, which becomes computationally prohibitive
for large Ne. This problem can be circumvented by computing bounds on PE in the following way. First, we partition Me into
the subsets Mlarge and Msmall where Mlarge contains the indices m of the N1 coefficients em with the largest magnitudes,
and Msmall contains the indices of the N2 = Ne − N1 remaining coefficients em with the smaller magnitudes; we have
15
0 ≤ N1 ≤ Ne. Next, we decompose ISIn as ISIn = ISI1,n + ISI2,n, where
ISI1,n =
∑
m∈Mlarge
dn−mem (22)
ISI2,n =
∑
m∈Msmall
dn−mem (23)
Denoting d1,n = {dn−m, m ∈ Mlarge} and d2,n = {dn−m, m ∈ Msmall}, we write ISI1,n = isi1(d1,n) and ISI2,n =
isi2(d2,n). Taking into account that Q(u+v)+Q(u−v) is an increasing function of |v| when u > 0 and assuming isimax < 1,
the error probability (21) can be bounded as PE,low ≤ PE ≤ PE,up, where
PE,low = 2E
[
Q
(
1 + isi1(d1,n)
σν
)]
(24)
PE,up = E
[
Q
(
∆up,+
σν
)
+Q
(
∆up,−
σν
)]
(25)
In (25), we have ∆up,+ = 1 + isi1(d1,n) + isi2,max and ∆up,− = 1 + isi1(d1,n)− isi2,max, with
isi2,max = (L− 1)
∑
m∈Msmall
|em| (26)
denoting the maximum of |isi2(d2,n)| over d2,n. As compared to (21), which involves a summation over LNe terms, the
expectations in (24) and (25) over d1,n represent summations over only LN1 terms; the selection of N1 is a trade-off between
high accuracy (large N1) and low computational complexity (small N1). A looser upper bound on (21) is obtained as
PE ≤ 2Q
(
1− isimax
σν
)
(27)
where isimax is given by (19), with the summation index restricted to m ∈ Me. The bound (27) is obtained from (21) by
assuming that isi(dn) = −isimax for all dn, and relates the error performance to the noise variance σ2ν and the eye opening
1− isimax at the input of the detector.
In the case of FR signaling, the detection does not involve the modulo operation. In order to obtain the symbol error
probability for given L and {em} for FR signaling, we first consider the error probability PE(α) = Pr[dˆn 6= dn | dn = α]
conditioned on the transmitted symbol, and next we average PE(α) over α ∈ Ad, The resulting error probability is obtained
as PE = PE,in + PE,out, with
PE,in =
2
L
∑
α∈Ad,in
E
[
Q
(
1 + e0α+ isi0(d
(0)
n )
σν
)]
(28)
PE,out =
2
L
E
[
Q
(
1 + (L− 1)e0 + isi0(d(0)n )
σν
)]
(29)
where Ad,in = {−(L− 3),−(L− 5), ..., (L− 3)} is the set of inner constellation points, d(0)n collects the data symbols that
16
contribute to ISIn from (16) with the exception of the useful symbol dn,
isi0(d
(0)
n ) =
∑
m 6=0
dn−mem (30)
denotes the ISI caused by the symbols contained in d(0)n , and the expectation in (28-29) is with respect to d
(0)
n . In the absence
of residual ISI, FR gives rise to PE = 2L−1L Q
(
1
σν
)
. Using a similar reasoning as for PR signaling, upper and lower bounds
on PE are easily derived when isimax < 1, by bounding the individual terms in (28-29).
VI. NUMERICAL RESULTS
In this section, we will derive numerical performance results, based on a channel transfer function obtained from simulation
of an electrical backplane interconnect including two traces on daughter boards, two high-speed backplane connectors and
a 10-cm long differential trace on a printed circuit board as indicated in Fig. 3. The unit-energy transmit and receive filter
impulse responses are obtained by truncating to 41 symbol intervals the impulse response sin(pit/T )/(pit/T ) of a square-root
Nyquist filter with zero roll-off and a bandwidth of 50 GHz; the corresponding symbol rate 1/T equals 100 Gbaud, while the
transmit and receive filters give rise to Rtr(ej2pifT ) = 1 and σ2 = N02 , respectively. The resulting hc(t) is confined to about
15 symbol intervals. The pre-equalizer performance results have been optimized not only over the pre-equalizer coefficients
{gn} and the scaling factor 1/ξ, but also over the sampling delay τ (which we restrict to be a multiple of T/10); the error
performance results correspond to the same selection of {gn}, 1/ξ and τ .
We will investigate finite-length MMSE pre-equalization. Besides FR signaling, we will consider PR signaling with poly-
nomials hT (D) = 1 +D and hT (D) = (1 +D)2 = 1 + 2D + D2, which will be referred to as duobinary (DB) and double
duobinary (DDB), respectively.
A. Pre-equalizer performance
For FR, DB and DDB signaling, Fig. 7 shows 1/MSE as a function of Etr/N0 in the case of 2-PAM, with MSE denoting
the mean square error (14) after scaling the receiver filter output sample. Note that for these systems the transmit power is
given by Ptr = Etr/T , with 1/T = 100 Gbaud. For increasing Etr/N0, MSE converges to the last summation in (14), which
is caused by hT not belonging to the column space of H; this gives rise to the 1/MSE floor occurring at large Etr/N0 in Fig.
7. For moderate and large Etr/N0, 1/MSE considerably increases when going from 5 to 11 pre-equalizer taps: the residual
ISI is substantially reduced when using more pre-equalizer taps, yielding a much larger 1/MSE floor. We observe that PR
signaling significantly reduces MSE as compared to FR, with DB slightly outperforming DDB. Taking into account that for
2-PAM and 4-PAM we have σ2d = 1 and σ
2
d = 5, respectively, it follows from (14) that the curves of 1/MSE versus Etr/N0
for 4-PAM at 100 Gbaud (i.e., 4-PAM operating at 200 Gbit/s) are obtained by shifting downward by 7 dB the curves from
Fig. 7.
17
Figure 7. 1/MSE as a function of Etr/N0 for 2-PAM at 100 Gbaud.
The superior pre-equalizer performance for PR signaling on the considered electrical interconnect is confirmed by Fig. 8,
which shows the transfer functions |HT (ej2pifT )|2 and |Htot(ej2pifT )|2 (expressed in dB), assuming a 11-tap pre-equalizer
and Etr/N0 = 60 dB. If the residual ISI were absent, we would have Htot(ej2pifT ) = HT (ej2pifT ). We observe from Fig. 8
that |Htot(ej2pifT )|2 shows some ripple as compared to |HT (ej2pifT )|2, which gives rise to residual ISI. The ripple is largest
for FR, which is in agreement with FR having the lowest 1/MSE floor in Fig. 7.
Figure 8. Comparison of HT (ej2pifT ) and Htot(ej2pifT ) for Etr/N0 = 60 dB.
B. Error Performance for 2-PAM
We consider the symbol error probability PE resulting from SymDet, assuming MMSE pre-equalization with 5 taps and 11
taps, for FR, DB and DDB signaling; the constellation is 2-PAM. The left-hand side of Fig. 9 shows as a function of Etr/N0
the simulated error probability, along with the upper bound (25) on PE for the cases where isimax from (19) does not exceed
18
1 (see the legend ’Pe,U’ in left-hand side of Fig. 9); when computing (25) we have selected N1 such that the horizontal shift
between the upper bound and the lower bound (24) at high Etr/N0 is less than about 0.5 dB, so that the upper bound can
be considered as sufficiently tight. For FR signaling with 5-tap and 11-tap pre-equalization, we get isimax > 1, which results
in a symbol error probability floor because of eye closure. We observe (see left-hand side of Fig. 9) that FR is significantly
outperformed by both DB and DDB at moderate to high Etr/N0, and that DB performs better than DDB; this behavior is in
agreement with the 1/MSE curves from Fig. 9 and the size of the vertical eye opening from Fig. 10, which shows the eye
diagram of the signal at the output of the receive filter Hrec(f) in Fig. 6, for Etr = 1.
Figure 9. PE as a function of Etr/N0 for SymDet (left-hand side) and SeqDet (right-hand side) (2-PAM, 100 Gbaud)) .
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
FR(Lg=5)
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
DB(Lg=5)
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
DDB(Lg=5)
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
FR(Lg=11)
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
DB(Lg=11)
−0.4 −0.2 0 0.2
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
DDB(Lg=11)
Figure 10. The eye-diagram in the high SNR regime (2-PAM, 100 Gbaud).
19
Simulation results regarding the symbol error performance for SeqDet using DB and DDB signaling are shown in the
right-hand side of Fig. 9 as a function of Etr/N0. Whereas for SymDet DB performs better than DDB, we see that DDB
outperforms DB when SeqDet is applied. The benefit from SeqDet is larger for DDB than for DB, because the former yields
the larger minimum squared Euclidean distance d2min between allowed sequences {wn}: for DB and DDB we have d2min = 8
and d2min = 16, respectively.
It should be noted that we limited the simulations to an error rate level of about 10−10, because the reliable simulation of
lower error rates would require an excessively long computation time. As the high-speed electrical interfaces (such IEEE 400
Gbit/s Ethernet) are expected to operate at a BER of 10−13 for systems without forward error correction and of 10−15 with
forward error correction [29], the reliable error rate simulation at the intended operating point is highly problematic in terms of
computation time. The recommended line of action for BER verification is the extrapolation from the available simulated error
performance curves. For a given constellation size and PR signaling, such extrapolation is valid when isimax < 1, because in
this case an error floor cannot occur, neither for SymDet nor for SeqDet; for 2-PAM operating at 100 Gb/s, this condition is
fulfilled for DB and DDB, but not for FR.
C. Error performance for 4-PAM
Here we investigate the error performance for 4-PAM. In a first scenario, the 4-PAM transmission operates at the same
symbol rate (100 Gbaud) and the same bandwidth (50 GHz) as the 2-PAM transmission considered before, but achieves twice
the bitrate (200 Gbit/s for 4-PAM, 100 Gbit/s for 2-PAM). As in this first scenario the 2-PAM and 4-PAM transmissions operate
at the same baudrate, their error performance will be shown as a function of Etr/N0; note that Ptr = Etr/T . In a second
scenario, we consider a 4-PAM bitrate of 100 Gbit/s, in which case the 4-PAM transmission operates at half the baudrate
and half the bandwidth (50 Gbaud and 25 GHz, respectively) as compared to the 100 Gbit/s 2-PAM transmission; hence, the
channel is less dispersive in the case of 4-PAM. As in this second scenario the 2-PAM and 4-PAM transmissions operate at the
same bitrate, it is convenient to compare their error performance for given Eb/N0, with Eb = Etr/ log2(L) representing the
transmitted energy per bit (for 2-PAM, we have Eb/N0 = Etr/N0); note that Ptr = EbRb, with Rb = log2(L)/T denoting
the bitrate.
20
Figure 11. PE as a function of Etr/N0 for 4-PAM operating at 100 Gbaud.
Figure 12. PE as a function of Eb/N0 for 4-PAM operating at 50 Gbaud with 5-tap (left-hand side) and 11-tap (right-hand side) pre-equalizer.
The 4-PAM error performance corresponding to the first scenario (i.e., 100 Gbaud) is shown in Fig. 11, for FR, DB and
DDB signaling, considering both SymDet and SeqDet for the latter two. Taking into account that 2-PAM and 4-PAM yield
σ2d = 1 and σ
2
d = 5, respectively, it follows from section IV that, for given Etr/N0, a given number of pre-equalizer taps and
a given signaling format, the optimum pre-equalizer taps and scaling factors (g4−PAM, ξ4−PAM) and (g2−PAM, ξ2−PAM) for
4-PAM and 2-PAM are related by (g4−PAM, ξ4−PAM) = 1√5 (g2−PAM, ξ2−PAM). Hence, 4-PAM gives rise to a noise power at
the input of the decision device (i.e., after scaling the receive filter output sample) that is 7 dB larger than for 2-PAM operating
at the same baudrate. Considering (19), the ISI peak power isi2max for 4-PAM is roughly 9.5 dB larger than for 2-PAM, so
that 4-PAM yields a smaller eye opening. This explains the increased symbol error probability when moving from 2-PAM to
21
4-PAM at a given baudrate. We observe that Lg = 5 with SeqDet yields an unacceptable error floor, caused by the large ISI
peak power; with SymDet, performance is even worse (results not displayed). For Lg = 11, only DB yields isimax < 1, so
that both SymDet and SeqDet of DB do not exhibit an error floor; for FR and DDB with SymDet, an error floor is visible.
Using SeqDet, the performance for Lg = 11 is improved compared to SymDet, with DB outperforming DDB for PE < 10−7.
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
FR(Lg=5)
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
DB(Lg=5)
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
DDB(Lg=5)
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
FR(Lg=11)
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
DB(Lg=11)
−0.4 −0.2 0 0.2
−1
−0.5
0
0.5
1
DDB(Lg=11)
Figure 13. The eye-diagram in the high SNR regime for 4-PAM operating at 50 Gbaud with 5 and 11-tap pre-equalizer.
Fig. 12 shows the 50 Gbaud 4-PAM error performance related to the second scenario, for Lg = 5 (left-hand side) and
Lg = 11 (right-hand side), respectively. For Lg = 5, only DDB has isimax > 1, causing an error floor for both SymDet and
SeqDet. We observe that for Lg = 5 with SymDet, FR outperforms DB (2 dB difference at low PE), whereas DB yields the
better performance when SeqDet is used (3-4 dB better than FR at low PE). The residual ISI is substantially reduced when
taking Lg = 11, so that isimax < 1 and, hence, error floors are absent for FR, DB and DDB. For Lg = 11 with SymDet, DB is
slightly better than FR (1.5 dB difference at low PE), which in turn outperforms DDB; error performance is further improved
by means of SeqDet, with DB slightly outperforming DDB (less than 0.5 dB difference at low PE). The error performance is
confirmed by the size of the vertical eye opening in the eye-diagram of the signal at the output of the receive filter, shown in
Fig. 13 for Etr = 1.
D. Error performance comparison
Let us define the power efficiency as the average transmit power Ptr needed to achieve PE = 10−9; the smaller Ptr, the more
efficient the considered scheme. Using Ptr = EbRb = Etr/T , the power efficiency of a particular scheme can be derived from
the corresponding curve showing PE versus Eb/N0 or Etr/N0 when N0 is known. Noting by Pref the power efficiency for
precoded DB 4-PAM with SeqDet at 100 Gbit/s and 11-taps pre-equalizer, Table I shows the relative power efficiency Ptr/Pref
22
2-PAM, 100 Gbit/s 4-PAM, 100 Gbit/s 4-PAM, 200 Gbit/s
5 taps 11 taps 5 taps 11 taps 11 taps
FR - - - - - - 5.7 dB 3.3 dB - - -
DB SymDet 10.5 dB 7.0 dB 7.3 dB 2.8 dB 19.1 dB
DDB SymDet 18.4 dB 8.5 dB - - - 6.6 dB - - -
DB SeqDet 5.7 dB 3.9 dB 2.4 dB 0.0 dB 13.7 dB
DDB SeqDet 4.8 dB 1.8 dB - - - 0.1 dB 14.7 dB
Table I
RELATIVE TRANSMIT POWER Ptr/Pref AT A TARGET PE OF 10−9
(expressed in dB) for the different modulation schemes; the entry “- - - ” indicates the occurrence of an error probability
floor, which either exceeds 10−9 or yields an unacceptably large transmit power to reach PE = 10−9. Fig. 12 shows that,
for precoded DB 4-PAM with SeqDet at 100 Gbit/s and 11-taps pre-equalizer, PE = 10−9 is achieved at Eb/N0 = 23.4 dB,
yielding (Pref)dBm = 23.4 + 110 + (N0)dBm/Hz; for instance, when N0 = -140 dBm/Hz, we have Pref = -6.6 dBm.
Let us first focus on 2-PAM operating at 100 Gbit/s. Considering a 5-taps pre-equalizer and using SymDet, DB yields the
best performance (Ptr/Pref = 10.5 dB) among the modulations considered; using SeqDet, DDB is to be preferred (Ptr/Pref
= 4.8 dB), yielding an improvement of 5.7 dB over SymDet. Increasing the number of pre-equalizer taps to 11, the best
modulation when using SymDet is DB (Ptr/Pref = 7.0 dB), whereas SeqDet improves the power efficiency by 5.2 dB for
DDB (Ptr/Pref = 1.8 dB). Hence, moving from 5 taps to 11 taps and from SymDet to SeqDet yields performance gains in
the order of 3 dB and 5 dB, respectively.
In the case of 4-PAM operating at 100 Gbit/s, the best schemes for 5-taps pre-equalization with SymDet and SeqDet are
FR (Ptr/Pref = 5.7 dB) and DB (Ptr/Pref = 2.4 dB), respectively, with the latter providing a 3.3 dB gain over the former.
When using 11 taps, DB is the best modulation for both SymDet (Ptr/Pref = 2.8 dB) and SeqDet (Ptr/Pref = 0.0 dB), with
the latter performing 2.8 dB better than the former. Note that at 100 Gbit/s, for a given number of taps and a given detection
method, the best schemes for 4-PAM outperform the best schemes for 2-PAM, by roughly 4.5 dB and 2 dB for SymDet and
SeqDet, respectively.
Finally, we consider 4-PAM operating at 200 Gbit/s, using 11-tap pre-equalization. Table I shows that DB is the best scheme,
for both SymDet (Ptr/Pref = 19.1 dB) and SeqDet (Ptr/Pref = 13.7 dB), with the latter offering a 5.4 dB performance advantage.
Compared to the best schemes for 100 Gbit/s with 5 taps (11 taps) pre-equalization, doubling the bitrate from 100 Gbit/s to
200 Gbit/s gives rise to a power penalty of 13.4 dB (16.3 dB) for symbol detection and 11.3 dB (13.7 dB) for SeqDet.
Considering the performance parameters of the analyzed schemes as well as the associated complexity of implementation, a
prototype has been built realizing the high speed electrical interconnect employing duobinary modulation. A 5-tap transversal
filter has been used as a pre-equalizer to compensate for the losses of a 10 cm long backplane channel [30]. To the best of
the authors’ knowledge, the pre-equalization method presented in [30] offers equalization at the highest demonstrated baud
rate. It is realized by employing analog microwave techniques for signal processing functions such as delay and high-speed
gain cells for applying equalizer tap values. The realization of this equalizer topology has been shown to operate at a BER
of 5.3×10−12 at rates up to 84 Gbit/s [27]. The limitation in bit rate of that particular demonstrator stemmed primarily from
the frequency-dependent losses of the backplane channel at high frequencies and from signal reflections leading to channel
23
transfer function variations which could not be compensated by the pre-equalizer.
VII. CONCLUSIONS
In this contribution, we have investigated limited-complexity pre-equalization for FR and PR signaling in the context of
high-rate data transmission on electrical interconnects. We have presented the mathematical framework for deriving the MMSE
pre-equalizer coefficients under an average transmit power constraint. For a specific representative interconnect, we have
determined the symbol error performance for various combinations of data rate (100 Gbit/s, 200 Gbit/s), type of signaling (FR,
DB, DDB), constellation size (2-PAM, 4-PAM), detection method (SymDet, SeqDet) and pre-equalizer complexity (5 taps, 11
taps). The various schemes have been compared in terms of the required transmit power in order to achieve a symbol error
probability of 10−9.
Our main findings indicate that a performance improvement can be achieved by using more equalization taps and/or
advanced detection schemes, hence yielding increased transmitter and receiver complexity, respectively. Furthermore, the error
performance strongly depends on the type of modulation. We have shown that, for a given bitrate, higher-order PAM modulation,
e.g., 4-PAM, requires less equalization effort and allows simple detection schemes. For instance, for 4-PAM operating at 100
Gbit/s with 5-taps equalization and SymDet, FR signaling outperforms DB and DDB signaling, whereas for 2-PAM operating
at 100 Gbit/s PR signaling is mandatory for achieving acceptable error performance.
This study illustrates the need for carefully selecting the constellation size, the signaling format, the detection method and
the pre-equalizer complexity in order to achieve a satisfactory error performance for transmission at 100 Gbit/s and beyond on
electrical interconnects. Yet, targeting very high-speed links of 200 Gbit/s and beyond, a large number of equalizer taps and
a high operating SNR would be required; the associated circuit complexity and high power consumption can be avoided only
by a better design of the electrical interconnect, e.g., by using materials with less high-frequency attenuation.
VIII. ACKNOWLEDGEMENTS
Part of this research has been funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science
Policy Office. M. Guenach and B. Kozicki would like to thank the Flanders Innovation & Entrepreneurship organization
(VLAIO, former IWT) for providing funds to perform this work.
REFERENCES
[1] F. Karinou, R. Borkowski, D. Zibar, I. Roudas, K. G. Vlachos, and I. T. Monroy, "Advanced Modulation Techniques for High-Performance
Computing Optical Interconnects" IEEE Journal of Selected Topics in Quantum Electronics, vol. 19, no. 2, March-April 2013.
[2] A. Benner, "Optical Interconnect Opportunities in Supercomputers and High End Computing", Optical Fiber Communication Conference (OFC),
Los Angeles, CA, USA , 4-8 March 2012.
[3] D. Law, D. Dove, J. D’Ambrosia, M. Hajduczenia, M. Laubach, and S. Carlson, "Evolution of ethernet standards in the IEEE 802.3 working
group", IEEE Communications Magazine, vol. 51, no. 8, pp. 88-96, August 2013.
[4] J. Fan, X. Ye, J. Kim, B. Archambeault, and A. Orlandi, "Signal Integrity Design for High-Speed Digital Circuits: Progress and Directions",
IEEE Transactions on Electromagnetic Compatibility, vol. 52, no. 2, pp. 392-400, May 2010.
[5] D. R. Stauffer, J. T. Mechler, M. A. Sorna., K. Dramstad, C. R. Ogilvie, A. Mohammad, and J. D. Rockrohr, “High Speed Serdes Devices and
Applications”, Springer, 2009.
24
[6] J. H. Sinsky, M. Duelk, and A. Adamiecki, "High-speed electrical backplane transmission using duobinary signaling", IEEE Transactions on
Microwave Theory and Techniques, vol. 53, no. 1, pp. 152-160, January 2005.
[7] P. Kabal and S. Pasupathy, ‘"Partial-response signaling", IEEE Transactions on Communications, vol. 23, no. 9, pp. 921-934, September 1975.
[8] H Kobayashi, "Correlative Level Coding and Maximum Likelihood Decoding", IEEE Transaction on Information Theory, vol. 17, no. 5, pp.
586 - 594, September 1971.
[9] A. J. Viterbi, "Error bounds for convolutional codes and asymptotically optimum decoding algorithm", IEEE Transaction on Information Theory,
vol. 13, no. 2, pp. 260 - 269, April 1967.
[10] J. Liu and X. Lin, "Equalization in high-speed communication systems", IEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 4-17, Second
quarter 2004.
[11] J. G. Proakis, "Digital Communications", McGraw-Hill, 4th edition, 2000.
[12] L. Moeller, A. Awny, and J. Junio, "80 Gbit/s Decision Feedback Equalizer for Intersymbol Interference Limited Channels", in Optical Fiber
Communication Conference/National Fiber Optic Engineers Conference (OFC/NFEOEC), Anaheim California, March 2013.
[13] M. R. Ahmadi, J. Moon, and R. Harjani, "Constrained Partial Response Receivers for High-Speed Links", IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 55, no. 10, pp. 1006-1010, October 2008.
[14] M. Bichan and A. C. Carusone, "A 6.5 Gbit/s Backplane Transmitter with 6-tap FIR Equalizer and Variable Tap Spacing", IEEE Custom
Integrated Circuits Conference (CICC), San Jose California, September 2008.
[15] J. H. Sinsky, A. Adamiecki, and M. Duelk, "10-Gb/s electrical backplane transmission using duobinary signaling", IEEE MTT-S International
Microwave Symposium Digest, June 2004.
[16] V. Balan, J. Caroselli, J.-G. Chern, C. Chow, R. Dadi, C. Desai, L. Fang, D. Hsu, P. Joshi, H. Kimura, C. Y. Liu, T.-W. Pan, R. Park, C. You, Y.
Zeng, E. Zhang, and F. Zhong, "A 4.8–6.4-Gb/s Serial Link for Backplane Applications Using Decision Feedback Equalization", IEEE Journal
Solid-State and Circuits, vol. 40, no. 9, pp. 1957-1967, September 2005.
[17] D. Lewis, "SerDes Architectures and Applications", DesignCon, Austin, TX, USA, February 2004.
[18] M. Hecht and A. Guida, "Delay modulation", Proceedings of IEEE, vol. 57, no. 7, pp. 1314 - 1316, July 1969.
[19] M. Gast, "T1: A survival Guide", O’Reilly Media, 2001.
[20] W. Debus, "General method for calculating the spectrum of a zero substitution coded signal", IEEE Transaction on Communication Technology,
vol. 27, no. 11, pp. 1637-1643, November 1979.
[21] T. Beukema, "Design considerations for high-data-rate chip interconnect systems", IEEE Communications Magazine, vol. 48, no. 10, pp. 174-183,
October 2010.
[22] A. Healey and C. Liu, "Modulation, Equalization, and Forward Error Correction Coding Technologies for a 56 Gbps Chip-to-Module Link",
DesignCon, Santa Clara California, January 2014.
[23] E. Bogatin, “Signal and Power Integrity Simplified”, 2nd edition, Prentice Hall, 2009.
[24] K. Murata, K. Yonenaga, Y. Miyamoto, and Y. Yamane, "Parallel precoder IC module for 40 Gbit/s optical duobinary transmission systems",
Electronics Letters, vol. 36, no. 18, pp. 1571-1572, August 2000.
[25] T. De Keulenaer, J. De Geest, G. Torfs, J. Bauwelink, Y. Ban, J. Sinsky, and B. Kozicki, "56+ Gbit/s Serial Transmission using Duobinary
Signaling", DesignCon, Santa Clara, January 2015.
[26] B. Murmann, "The Race for the Extra Decibel: A Brief Review of Current ADC Performance Trajectories", IEEE Solid-State Circuits Magazine,
vol. 7, no. 3, pp. 58-66, Summer 2015.
[27] T. De Keulenaer, G. Torfs, Y. Ban, R. Pierco, R. Vaernewyck, A. Vyncke, Z. Li, J. H. Sinsky, B. Kozicki, X. Yin, and J. Bauwelinck, "84 Gbit/s
SiGe BiCMOS duobinary serial data link including Serialiser/Deserialiser (SERDES) and 5-tap FFE", Electronics Letters, vol. 51, no. 4, pp.
343-345, February 2015.
[28] K. D. Sadeghipour, P. D. Townsend, and P. Ossieur, "Design of a sample-and-hold analog front end for a 56Gb/s PAM-4 receiver using 65nm
CMOS", IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, May 2015.
[29] P. Anslow, "Error performance objective for 400GbE", IEEE 400 Gb/s Ethernet Study Group, November 2013.
[30] Y. Ban, T. De Keulenaer, Li Zhisheng, J. Van Kerrebrouck, J. H. Sinsky, B. Kozicki, J. Bauwelinck, and G. Torfs, "A Wide-Band, 5-Tap
Transversal Filter With Improved Testability for Equalization up to 84 Gbit/s", IEEE Microwave and Wireless Components Letters, vol. 25, no.
11, pp. 739-741, November 2015.
25
Mamoun Guenach is a research scientist with the Nokia Bell Labs. He received the degree of engineer in electronics and communications from the Ecole
Mohamadia d’Ingénieurs in Morocco. Following that, he moved to the faculty of applied sciences at the Université Catholique de Louvain (UCL) Belgium,
where he received a M.Sc. degree in electricity and a Ph.D. degree in applied sciences. He served as a post-doctoral researcher at Ghent University where,
since 2015, he is a part-time visiting professor.
Lennert Jacobs received the M.Sc. degree and the Ph.D. degree in electrical engineering from Ghent University, Ghent, Belgium, in 2006 and 2012,
respectively. He is currently serving as a post-doctoral researcher in the Department of Telecommunications and Information Processing at Ghent University.
His research interests are in fading channels, MIMO techniques, signal processing, and modulation and coding for digital communications.
Bartek Kozicki is a Head of Research Department at Nokia Bell Labs. He received his Ph.D. degree from Osaka University. He worked at NTT Network
Innovation Laboratories between 2008 to 2010. Since 2011 he has been with Bell Labs focusing on design and modeling of high-speed interconnects. His
current interests are network architectures and control platforms. Dr. Kozicki holds over 20 patents and authored over 50 papers.
Marc Moeneclaey (M.Sc. 1978, Ph.D. 1983) is Full Professor at the Telecommunications and Information Processing Department, Gent University,
Belgium. His main research interests are in statistical communication theory. He authors more than 400 scientific papers, and co-authors the book Digital
communication receivers – Synchronization, channel estimation, and signal processing. (J. Wiley, 1998). He is co-recipient of the Mannesmann Innovations
Prize 2000.
