Vitesse Semiconductor, Somerset, NJ Ethernet, SONET and wireless base-station systems transfer data across backplanes of up to 1m in length using NRZ signaling at 1.25 to 3.125Gb/s. Attempts to double the capacity of these systems by transmitting 5Gb/s NRZ data over these FR4 backplanes result in a closed eye due to high frequency attenuation of the traces, crosstalk between connector pins and reflections due to vias acting as open-ended stubs, necessitating adaptive equalization at either the TX or the RX. Achieving 5Gb/s operation over existing backplanes using NRZ signaling as opposed to alternate modulation schemes [1] has been analyzed and will soon be standardized by the optical interconnection forum (OIF) [2] . The NRZ backplane transceiver presented here employs TX pre-emphasis and RX high-frequency boost to partially compensate for high-frequency attenuation and a DFE to cancel ISI without amplifying crosstalk.
Vitesse Semiconductor, Somerset, NJ Ethernet, SONET and wireless base-station systems transfer data across backplanes of up to 1m in length using NRZ signaling at 1.25 to 3.125Gb/s. Attempts to double the capacity of these systems by transmitting 5Gb/s NRZ data over these FR4 backplanes result in a closed eye due to high frequency attenuation of the traces, crosstalk between connector pins and reflections due to vias acting as open-ended stubs, necessitating adaptive equalization at either the TX or the RX. Achieving 5Gb/s operation over existing backplanes using NRZ signaling as opposed to alternate modulation schemes [1] has been analyzed and will soon be standardized by the optical interconnection forum (OIF) [2] . The NRZ backplane transceiver presented here employs TX pre-emphasis and RX high-frequency boost to partially compensate for high-frequency attenuation and a DFE to cancel ISI without amplifying crosstalk.
The RX (Fig. 3.1 .1) has a linear front end comprising a continuous-time FFE, an amplifier with a 2-stage VGA, and a 3-tap DFE. This is followed by a bang-bang CDR and a 1:16 DEMUX. Sign-sign-LMS algorithms are used for adapting the coefficients of the FFE, DFE, VGA and DC-offset cancellation circuit. Error signals for the LMS algorithm are generated by slicing the equalizer output at the target ±1 levels. These slicers are in addition to the main slicer set at zero level to detect the polarity of the received data. The LMS algorithm runs at 1/16 times the data rate to save power. The transmission characteristics of the backplane change gradually with temperature and therefore, the adaptation algorithms can run slowly.
The FFE is realized by summing the outputs of a fixed-gain DC path and a variable-gain AC path, resulting in a continuous-time IIR filter with a single tunable zero that is adapted to control the amount of high-frequency boost. A continuous-time IIR structure is chosen over the conventional FIR structure to save power and chip area. The first stage of the AC path is a differential pair degenerated by an RC network ( Fig. 3.1 .2). Its response is optimized to match the high-frequency roll-off of a backplane trace. The variable coefficient is realized using cross-coupled differential pairs with variable tail currents. The FFE provides a boost of up to 10dB at half the NRZ data rate.
The 2-stage VGA following the FFE has variable gain range of 14dB. The variable gain is realized using a cross-coupled differential pair with variable currents (Fig. 3.1 .2) in parallel with a differential pair with a fixed tail current. The VGA output is fed to a buffer followed by a pre-amp and a chain of flip-flops. The decision values from the slicers are fed back as differential currents to the buffer output through weighted differential pairs to implement the DFE. A differential DC is also injected into the buffer output for offset cancellation. To detect the error of the equalized signal from the ideal ±1 values, the output of the buffer is fed to 2 comparators whose slicing levels are offset from zero.
The entire transceiver operates from a 1.2V supply. All stages of the receiver use differential pairs with active inductor loads to extend the bandwidth (Fig. 3.1.2 ) and maintain constant DC gain over process and temperature variations. The gates of the active inductor stages are biased above the supply using an onchip high-voltage generator (Fig. 3.1.3) . A voltage doubler (M n1,2 , M p1,2 ) generates a DC voltage V 2x close to twice the supply voltage. The high voltage bias HV dd , fed to the amplifiers (Fig. 3.3.2 ) should be such that the input transistors M1 and the load transistors M2 operate in the saturation region. To accomplish this, feedback is used around M L,R (a replica of load transistors M L in the amplifiers) in the high-voltage bias generator (Fig. 3.1.3 ) such that its source voltage is stabilized to the desired commonmode level of the amplifiers V cm . V H , the gate voltage of M L,R is filtered and used to bias the amplifier loads (M L ). High voltage (thick oxide) transistors are used in the voltage doubler, the feedback circuit, and the ripple filters.
The full-rate CDR uses a bang-bang phase detector because the high gain of the latter at zero phase error suppresses static phase error due to charge-pump offset and enables superior phase alignment for re-timing the data symbols [3] . The bangbang CDR measures the phase error with a single slicer having a zero threshold. In the locked condition, the falling edge of the VCO is aligned to the zero crossings of the data, and the rising edge of the clock retimes the data. In the absence of data, the CDR locks to a reference clock using a phase-frequency detector to train the VCO to within 200ppm of the target data rate. The FFE without the DFE opens the data eye enough to permit the CDR to achieve lock. This guarantees start-up even over the worst backplanes. A 3-stage differential ring VCO with active inductor broadbanding loads (Fig. 3.1.2 ) is used to conserve area. To minimize CDR jitter, the gain of the VCO is kept low by using range switching during a power-on calibration cycle.
The TX (Fig. 3.1.4) consists of a 16:1 MUX that serializes the input data. Symbol-rate discrete-time filtering (pre-emphasis) of the data stream is realized by weighted summation of 3 delayed versions of output data. The filter tap weights can be digitally programmed with 4b resolution. The transmit amplitude can be digitally programmed from 200 to 800mV ppd . A 2 7 -1 PRBS generator and a training pattern generator are included in the TX.
The VCO (Fig. 3.1.4) in the TX CMU uses a differential inductor with an NMOS cross-coupled pair for positive feedback. It has range switching using a switched-capacitor array and fine tuning using a MOS accumulation varactor. The buffer following the VCO has a resonant tank load identical to that of the VCO to ensure that its resonance frequency tracks the VCO output frequency. Resonant-mode operation reduces the buffer's power dissipation and improves the output duty cycle. On startup, a frequency detection loop selects the appropriate range for the VCO and the buffer. The CMU then switches to phase-locked mode that maintains lock to the reference frequency over environmental variations. 
