This work reports an 8-lane single-ended RX featuring compact and low power far-end crosstalk (FEXT) cancellation circuits. The RX data-path consists of a cross continuous-time linear equalizer (XCTLE) to remove FEXT by nearest aggressors within the channel bundle. Residual post-cursor FEXT is suppressed by a direct feedback 7x8-tap cross decisionfeedback equalizer (XDFE). A CTLE and 8-tap DFE equalize single-ended channels with 28dB insertion loss at Nyquist frequency without TX FFE. The circuit, fabricated in 32nm SOI CMOS, was measured to receive 7Gb/s/pin PRBS11 data at BER < 10 −12 with 12.5%UI margin. It occupies 300x350μm
Introduction
Over the past decade, aggregate I/O bandwidth requirements have increased at a rate of approximately 2x-to-3x every 2 years [1] . Single-ended-signaling improves aggregate datarate, resulting in nearly twice the performance of similar buses operating with two differential lines per signal. Unfortunately, single-ended PCB traces with reduced lane-to-lane spacing suffer from increased crosstalk (xtalk) noise by electromagnetic coupling. A significant challenge is to ensure proper signal transmission over single-ended wires at rates previously attainable only with differential pairs. In this work, a powerful equalization method is proposed that combines a cross continuous-time linear equalizer (XCTLE) and multi-tap cross decision-feedback equalizer (XDFE). Since far-end crosstalk (FEXT) is approximately proportional to the derivative of the channel, FEXT(ω)=-jωβH(ω) [2] a XCTLE equalizes xtalk by differentiating the received signals from nearest neighbors and adding them with appropriate gain (G0,G1) to match the xtalk strength β as proposed in [2] . Compared to [2] , the implemented RX does not require wider spacing between bundle pairs, since residual error terms are suppressed by the XDFE. Furthermore, a XDFE compensates non trivial xtalk patterns generated by connectors and via-arrays. Only the synergy between XCTLE and XDFE results in error free data for the channel investigated in this work. Although RX with multi XDFE taps are commonly used in ADC-based 100/10GBASE-T transceivers, they are not yet used for chip-to-chip link owing to their increased demand for power and area. A low power analog 56-tap XDFE is implemented using a switched capacitor (SC) approach proposed in [3] .
Architecture Fig.1 shows the architecture of our RX circuit which is intended for use in source-synchronous links. It consists of 8 single-ended data lanes and 1 shared differential clock lane. The reference voltage V ref is extracted from the differential clock common-mode using a low-pass filter. The received signal is terminated to V dd =1V (1V, 500mV DC levels at RXin) using T-coils for bandwidth enhancement in the product-level ESD protection circuit. The signals on the victim and adjacent aggressor lanes are processed by a XCTLE, which uses two single-ended high-pass RC filters to differentiate the aggressor signals. The xtalk cancellation and forward signals are weighted into 3 VGAs to adjust the xtalk cancellation target before being summed. The XCTLE also performs singleended to differential conversion. The xtalk-equalized signal passes into a 2-stage CTLE which provides up to 17dB peaking at 3.5GHz with -3.7dB DC-gain, and the CTLE output is then fed to an integrating amplifier which connects to the 8-tap SC DFE and 7x8-tap SC XDFE, resulting in 64-tap per lane in total. 56 XDFE cells are driven by FIFO data from 7 aggressor lanes. A 1:4 demux outputs quarter-rate data to a digital correlator/PRBS checker for adjusting all RX parameters (latch offsets, DFE and XDFE coefficients, CTLE and XCTLE settings). Fig.2 shows the XCTLE circuit diagram. It consists of two passive high-pass RC (R=972Ω, C=30fF) filters for implementing the differentiators. R and C values have been chosen such that they provide return-loss below -10dB up to 4GHz at each of the 50Ω terminated RX inputs. VGA bias currents are binary weighted with 4-bit resolution. The XCTLE dissipates only 0.56mW/Gb/s. Including CTLE, the analog front-end has an energy efficiency of 1.56mW/Gb/s.
The DFE core is shown in Fig.3 . The DFE runs at full rate for improved area efficiency. The continuous time signal equalized by the CTLE is amplified by a current integrating stage for 1/2 UI. The absence of samplers is advantageous as it avoids kT/C noise with a cost of 0.9dB loss due to 1/2 UI timewindow integration. The analog DFE correction is performed by adding charge on the integration node with a digitally programmable SC-DAC. The SC implementation relaxes timing of the DFE loop compared with current summation DFE [3] . Each capacitive DAC has 6-bit resolution with 1LSB=250aF (C max =15.75fF) implemented with M1-M2 finger caps. To cover a large correction range, tap 1 uses 3 SC cells connected in parallel. To close the DFE tap-1 timing with reasonable margin the data representation is kept in pre-charged dynamic logic format from the offset-programmable strongARM latch to the input of each SC cell. Each DFE core drives 8 DFE cells and 7x8 XDFE cells (8 cells per victim). Each lane includes an additional offset-programmable latch (error/amplitude sampler) for RX internal eye measurement and DFE tap calibration.
Measurement Results The dies were flip-chip mounted on an high-frequency, low loss substrate (LCP) that itself is embedded in a rigid metallic frame including impedance-matched high-frequency coaxial connectors. The RX was connected to a 72cm channel bun- dle (Rogers PCB) with lane-spacing equals to 1.5 times lanewidth (s=1.5w=142μm) which includes 2 daughter boards, 4 5mm thru-via-arrays and 4 Erni MicroSpeed connectors along the signal path to create severe FEXT. The signal loss including cables, connectors and package was about 28dB at 3.5GHz, with FEXT from adjacent lanes 4dB lower. A 3-lane measurement was performed owing to limitation of the measurement equipment. Three uncorrelated 7Gb/s NRZ streams (PRBS7 on aggressors, PRBS11 on victim) were sent over 3 adjacent lanes. The correlator/PRBS checker was used to adjust the DFE and XDFE coefficients driving the correlation with postcursor channel taps to zero (Fig. 4 left) . The BER bathtub curves are shown in Fig.4 (right) . With the aggressor turned off, the RX eye is open with a horizontal margin of 40% at 10 −12 BER. Once 2 aggressors are switched on, the link no longer operates error free (10 −4 BER). After turning on xtalk cancellation, the eye is reasonably open with a 12.5%UI margin, showing that both a XCTLE and a XDFE equalizer are necessary to ensure error-free operation of the RX. The vertical eye margins, measured by sweeping the data latch offset and reading out the internal error counter, are 22.4mV ppdiff and 64mV ppdiff at 10 −8 BER with and without xtalk, respec- tively. The internal data eyes displayed in Fig.5 were generated by sweeping the data horizontally with an Agilent phase generator and vertically by sweeping the amplitude programmable latch offset with an R2R voltage DAC. The measured power efficiency of the RX is 5.9 mW/Gb/s with 1V supply at package. The layout of the fabricated circuit, whose RX macro measures 300x350μm 2 is shown in Fig.6 .
