A 5.9mW/Gb/s 7Gb/s/pin 8-Lane Single-Ended RX with Crosstalk Cancellation Scheme using a XCTLE and 56-tap XDFE in 32nm SOI CMOS by Cevrero, Alessandro et al.
A 5.9mW/Gb/s 7Gb/s/pin 8-Lane Single-Ended RX with Crosstalk Cancellation
Scheme using a XCTLE and 56-tap XDFE in 32nm SOI CMOS
A. Cevrero1,§, C. Aprile2,§, P.A. Francese1, U. Bapst1, C. Menolfi1, M. Braendli1,
M. Kossel1, T. Morf1, L. Kull1, H. Yueksel1, I. Oezkaya1, Y. Leblebici2, V. Cevher2, and T.Toifl1
1 IBM Research – Zurich, Rueschlikon, Switzerland, 2 EPFL, Lausanne, Switzerland
§ both authors contributed equally to this work
Abstract
This work reports an 8-lane single-ended RX featuring com-
pact and low power far-end crosstalk (FEXT) cancellation cir-
cuits. The RX data-path consists of a cross continuous-time
linear equalizer (XCTLE) to remove FEXT by nearest aggres-
sors within the channel bundle. Residual post-cursor FEXT
is suppressed by a direct feedback 7x8-tap cross decision-
feedback equalizer (XDFE). A CTLE and 8-tap DFE equalize
single-ended channels with 28dB insertion loss at Nyquist fre-
quency without TX FFE. The circuit, fabricated in 32nm SOI
CMOS, was measured to receive 7Gb/s/pin PRBS11 data at
BER< 10−12 with 12.5%UI margin. It occupies 300x350µm2
with an energy efficiency of 5.9mW/Gb/s.
Introduction
Over the past decade, aggregate I/O bandwidth requirements
have increased at a rate of approximately 2x-to-3x every 2
years [1]. Single-ended-signaling improves aggregate data-
rate, resulting in nearly twice the performance of similar buses
operating with two differential lines per signal. Unfortunately,
single-ended PCB traces with reduced lane-to-lane spacing
suffer from increased crosstalk (xtalk) noise by electromag-
netic coupling. A significant challenge is to ensure proper sig-
nal transmission over single-ended wires at rates previously
attainable only with differential pairs. In this work, a pow-
erful equalization method is proposed that combines a cross
continuous-time linear equalizer (XCTLE) and multi-tap cross
decision-feedback equalizer (XDFE). Since far-end crosstalk
(FEXT) is approximately proportional to the derivative of the
channel, FEXT(ω)=-jωβH(ω) [2] a XCTLE equalizes xtalk
by differentiating the received signals from nearest neighbors
and adding them with appropriate gain (G0,G1) to match the
xtalk strength β as proposed in [2]. Compared to [2], the im-
plemented RX does not require wider spacing between bundle
pairs, since residual error terms are suppressed by the XDFE.
Furthermore, a XDFE compensates non trivial xtalk patterns
generated by connectors and via-arrays. Only the synergy be-
tween XCTLE and XDFE results in error free data for the
channel investigated in this work. Although RX with multi
XDFE taps are commonly used in ADC-based 100/10GBASE-
T transceivers, they are not yet used for chip-to-chip link ow-
ing to their increased demand for power and area. A low power
analog 56-tap XDFE is implemented using a switched capaci-
tor (SC) approach proposed in [3].
Architecture
Fig.1 shows the architecture of our RX circuit which is in-
tended for use in source-synchronous links. It consists of 8
single-ended data lanes and 1 shared differential clock lane.
The reference voltage Vref is extracted from the differential
clock common-mode using a low-pass filter. The received sig-
nal is terminated to Vdd=1V (1V, 500mV DC levels at RXin)
using T-coils for bandwidth enhancement in the product-level
ESD protection circuit. The signals on the victim and adja-
cent aggressor lanes are processed by a XCTLE, which uses
two single-ended high-pass RC filters to differentiate the ag-
gressor signals. The xtalk cancellation and forward signals
are weighted into 3 VGAs to adjust the xtalk cancellation tar-
get before being summed. The XCTLE also performs single-
ended to differential conversion. The xtalk-equalized signal
passes into a 2-stage CTLE which provides up to 17dB peak-
ing at 3.5GHz with -3.7dB DC-gain, and the CTLE output is
then fed to an integrating amplifier which connects to the 8-tap
SC DFE and 7x8-tap SC XDFE, resulting in 64-tap per lane in
total. 56 XDFE cells are driven by FIFO data from 7 aggressor
lanes. A 1:4 demux outputs quarter-rate data to a digital cor-
relator/PRBS checker for adjusting all RX parameters (latch
offsets, DFE and XDFE coefficients, CTLE and XCTLE set-
tings).
Fig.2 shows the XCTLE circuit diagram. It consists of two
passive high-pass RC (R=972Ω, C=30fF) filters for imple-
menting the differentiators. R and C values have been chosen
such that they provide return-loss below -10dB up to 4GHz at
each of the 50Ω terminated RX inputs. VGA bias currents are
binary weighted with 4-bit resolution. The XCTLE dissipates
only 0.56mW/Gb/s. Including CTLE, the analog front-end has
an energy efficiency of 1.56mW/Gb/s.
The DFE core is shown in Fig.3. The DFE runs at full
rate for improved area efficiency. The continuous time signal
equalized by the CTLE is amplified by a current integrating
stage for 1/2 UI. The absence of samplers is advantageous as it
avoids kT/C noise with a cost of 0.9dB loss due to 1/2 UI time-
window integration. The analog DFE correction is performed
by adding charge on the integration node with a digitally pro-
grammable SC-DAC. The SC implementation relaxes timing
of the DFE loop compared with current summation DFE [3].
Each capacitive DAC has 6-bit resolution with 1LSB=250aF
(Cmax=15.75fF) implemented with M1-M2 finger caps. To
cover a large correction range, tap 1 uses 3 SC cells connected
in parallel. To close the DFE tap-1 timing with reasonable
margin the data representation is kept in pre-charged dynamic
logic format from the offset-programmable strongARM latch
to the input of each SC cell. Each DFE core drives 8 DFE cells
and 7x8 XDFE cells (8 cells per victim). Each lane includes
an additional offset-programmable latch (error/amplitude sam-
pler) for RX internal eye measurement and DFE tap calibra-
tion.
Measurement Results
The dies were flip-chip mounted on an high-frequency, low
loss substrate (LCP) that itself is embedded in a rigid metal-
lic frame including impedance-matched high-frequency coax-
ial connectors. The RX was connected to a 72cm channel bun-
dle (Rogers PCB) with lane-spacing equals to 1.5 times lane-
width (s=1.5w=142µm) which includes 2 daughter boards, 4
5mm thru-via-arrays and 4 Erni MicroSpeed connectors along
the signal path to create severe FEXT. The signal loss includ-
ing cables, connectors and package was about 28dB at 3.5GHz,
HGtapiSCicells
FIFO
=[m
DEM
corrU
)
PRBS
check
IbO
interface
slicer
h5 h0 hH
6CTLE
RXidatailaneiniXHXd
Clkp
Clkn
VrefiGeneration
BUF
CML
5
CMOS
ClockiRXiXsharedibyiHilinksd
Testiinterface
Vrefi=i6xymV
TGcoil
)
ESD
5x5iXCTLEi
G=
HitapiSCiXGDFEHitapiSCiXGDFEHitapiSCiXGDFEHitapiSCiXGDFEHitapiSCiXGDFEHitapiSCiXGDFEHGtapiSCicells
6xHiXDFEiSCicellsi fromii
aggrU
FIFO
h=
Vddi
IntegratingiDFEifrom
lanein)=
from
laneinG=
xg
Vddi
Vddi
6iGHziclockitoiRXilanes
RXin[n]
Gy
CR
CR
G5
TGcoil
)
ESD
XH
XH
Fig. 1. 8-lane single-ended RX architecture with XDFE and XCTLE.
2 6.5 27 .0 27
FEXT signal after XCTLE [−3.748mV] FEXT signal before XCTLE Ascaled0 [4.231mV]
Rs
ib
high-pass)RC)differentiators
Vin[n-1]
VGA
G0 G1G2
Vin[n61]Vin[n]
Vref
ib ib ib
VGAVGA
RLRL
Fig. 2. Proposed XCTLE circuit diagram.
integ
Amp
DCVS
latch
dyn.
latch
SR
latch FIFO
h2 h3
C8
C8
C8
C2
C8
C1
C8
CX8
C8
CX2
C8CX1
7x8EXDFEESCEcellsVint[n]E Vint[n]E
CL
CL
XDFEEfromEaggr.EnE
7E
h8
DFE core
tap 1 path 
SC cell
Fig. 3. DFE core with fast tap-1 feedback.
with FEXT from adjacent lanes 4dB lower. A 3-lane measure-
ment was performed owing to limitation of the measurement
equipment. Three uncorrelated 7Gb/s NRZ streams (PRBS7
on aggressors, PRBS11 on victim) were sent over 3 adja-
cent lanes. The correlator/PRBS checker was used to adjust
the DFE and XDFE coefficients driving the correlation with
postcursor channel taps to zero (Fig. 4 left). The BER bathtub
curves are shown in Fig.4 (right). With the aggressor turned
off, the RX eye is open with a horizontal margin of 40% at
10−12 BER. Once 2 aggressors are switched on, the link no
longer operates error free (10−4 BER). After turning on xtalk
cancellation, the eye is reasonably open with a 12.5%UI mar-
gin, showing that both a XCTLE and a XDFE equalizer are
necessary to ensure error-free operation of the RX. The verti-
cal eye margins, measured by sweeping the data latch offset
and reading out the internal error counter, are 22.4mVppdiff
and 64mVppdiff at 10−8 BER with and without xtalk, respec-
tively. The internal data eyes displayed in Fig.5 were generated
by sweeping the data horizontally with an Agilent phase gen-
erator and vertically by sweeping the amplitude programmable
latch offset with an R2R voltage DAC. The measured power ef-
ficiency of the RX is 5.9 mW/Gb/s with 1V supply at package.
1Postcursor Tap 
XDFE OFF
XDFE ON
3 5 7-1
-0.5
0
1
0.5
Co
rre
lat
ion
1e-12
1e-10
1e-8
1e-6
1e-4
1e-2
OFF
0No Xtalk
12.5%
40%
log(BER)
25 50 75 100Phase Position [%UI]
OFF
ON
ON
XCTLEXDFE
OFF
OFF
ON
ON
Fig. 4. Measured bathtub plots (right) and correlation with postcursor taps
with and without XDFE (left).
0 25 50 75 100
PhaserPositionr[HUI]
SC-XDFEch2 [0 0 0 0 0 0 0 0] 
SC-XDFEch4 [0 0 0 0 0 0 0 0] 
0
Insertion loss
      ch3
Am
pl
itu
de
r[d
B]
FEXT ch2
FEXT ch4
-20
-40
-600 1 2 3 4 5 6
Frequencyr[GHz]
0 25 50 75 100
PhaserPositionr[HUI]
SC-XDFEch2 [30 4 8 0 0 -1 0 0]
SC-XDFEch4 [28 4 6 2 -2 -1 -1 0]
73.6mV
@ 10e-8
symbols
0 25 50 75 100
PhaserPositionr[HUI]
SC-DFE [84 -21 -10 -15 -15 -11 -6 0]
108.8mV
@ 10e-8 
symbols
Fig. 5. Channel attenuation (top left) and received eye diagrams with silent
aggressors (top right), xtalk cancellation off (bottom left), xtalk cancellation
on (bottom right). The programmed SC DAC code is also shown.
300 um
SC Cells SC Cells
33 um
350 um
Fig. 6. Layout of RX and die photograph.
The layout of the fabricated circuit, whose RX macro measures
300x350µm2 is shown in Fig.6.
Acknowledgment
This work was supported in part by the European Com-
mission under grant ERC Future Proof and by the Swiss Sci-
ence Foundation under grants SNF 200021-146750 and SNF
CRSII2-147633.
References
[1] “ISSCC 2014 technology trends,” [Online].
[2] T. Oh, et al., “4x12 Gb/s 0.96 pJ/b/lane analog-IIR crosstalk can-
cellation and signal reutilization receiver for single-ended I/Os in
65 nm CMOS” VLSI, 2012.
[3] T. Toifl, et al., “A 2.6 mW/Gbps 12.5 Gbps RX With 8-Tap
Switched-Capacitor DFE in 32 nm CMOS,” JSSC, Apr. 2012.
