Abstract-This paper proposes an adaptation engine for a 2 blind sampling ADC-based receiver. The proposed adaptive engine uses a triangular desired waveform, instead of two fixed desired levels, to shape the equalizer output in spite of blind nature of sampling. The measured results confirm the adaptive engine restores a 5 Gb/s eye subjected to 13 dB of attenuation at Nyquist frequency to an equivalent of 320 mV of vertical opening. The receiver consumes 192 mW, out of which 78 mW is used by the digital CDR.
I. INTRODUCTION
T HE increasing demand in higher data rates through legacy backplane channels with limited bandwidth has introduced severe signal degradation due to inter-symbol interference (ISI) to the received signal. To recover data from this severely degraded signal, high equalization levels are required [1] . While analog equalization could be used in binary CDR's as shown in Fig. 1 , the use of ADC as the sampler provides another layer of equalization in the digital domain. The combined equalization in analog and digital can be used to recover data from higher attenuation channels ( Fig. 1(b) ). Digital equalizers are easy to design and are portable across the technology nodes because they can be implemented in RTL. In addition, digital equalizers consume less power with technology advancement and are more robust to PVT variations.
As shown in Fig. 2 , the sampling clock in ADC-based receivers could either track the phase of the incoming data by or it could ignore the phase when a blind (asynchronous) clock, , is used. In a phase tracking system, as shown in Fig. 2(a) , a digital phase detector compares the phase of the incoming data with the phase of the sampling clock. A low pass filter then sends digital control bits to a digitally-controlled oscillator (DCO) or a phase-interpolator (PI) in order to adjust the phase of the sampling clock [2] . In this system, there is a feedback loop containing both digital and analog components and, as a result, the delay of the feedback loop plays an important role in the stability of the system [3] . During the design, delay of both digital and analog blocks in the loop should be taken into account, which makes the mixed signal design complicated. On the other hand, a blind sampling CDR [4] , as shown in Fig. 2(b) , eliminates the feedback path and hence is unconditionally stable. This allows for independent design of the ADC and the remaining digital building blocks.
As mentioned earlier, the main advantage of an ADC-based CDR is in the availability of extra equalization in the digital domain. This extra equalization can be done either as a feed-forward equalizer (FFE) or a decision-feedback equalizer (DFE). An FFE [5] boosts both the signal and the noise at high frequencies. This noise, in the case of ADC-based CDR, includes the ADC quantization noise that may limit the performance. A DFE for blind sampling CDR is proposed in [6] to address this noise enhancement. In [6] , the DFE coefficients are obtained manually by measuring the pulse response of the channel and 0018-9200/$26.00 © 2011 IEEE subtracting it from a desired pulse response, where the latter is defined so as not to contain any ISI. This approach, however, does not lend itself easily to adaptation unless the data communication is interrupted or initiated by a training sequence so as to obtain the channel pulse response. To overcome this limitation, we propose [7] an adaptive DFE where the DFE coefficients are obtained during data transmission, i.e. without interruption by a training sequence. We have further integrated this adaptive DFE with the rest of the building blocks to demonstrate a complete receiver as shown in Fig. 3 .
We explain the details of the ADC design [5] , the feed-forward CDR [4] , and the DFE as it was presented in [6] in the background section. The details of the proposed adaptive DFE will be discussed in Section III, followed by simulation and measurement results in Section IV and conclusions in Section V.
II. BACKGROUND

A. ADC
Flash ADC's are known to have higher conversion rate compared to other ADC architectures. The implemented CDR requires a sampling rate of 10 GS/s which is provided by four time-interleaved 5-bit flash ADC's, each sampling at 2.5 GS/s. The ADC sampling clocks are generated by a 4-phase clock divider (Fig. 4) which is driven by an external 5 GHz clock source. The divider is the only CML component in the system. While time interleaving increases the aggregate sampling rate, it reduces the input bandwidth of the ADC as it increases the input capacitance of the ADC. To reduce the input capacitance of each ADC, an interpolating flash ADC [8] was used in this design to reduce the number of pre-amplifiers (PA) that load the input node. Fig. 5 shows an overall block diagram of an interpolating flash ADC. The PA's amplify the difference between the input signal and the reference voltages. For a typical 5-bit flash ADC, a total of 31 PA's are required at the front-end. In this interpolating design, we use a total of 17 PA's instead, relying on a resistive ladder to generate the remaining 14 levels. The PA's and resistive ladder outputs are then latched and sent to a thermal-to-binary encoder.
It is desirable for a PA to have a high gain as this would reduce the effect of latch offset and the probability of metastability [9] . The gain offered by a continuous-time PA is not sufficient for high-speed applications due to inherent trade-off posed by the gain-bandwidth product of the PA [10] . We use instead a Strong-Arm regenerative PA as shown in Fig. 6 where the overdrive recovery is improved by resetting the previous state of the amplifier.
In an interpolating flash ADC, the PA's must be linear; otherwise the interpolated values will not correspond to the correct intermediate reference voltages. The implemented regenerative PA has a high gain and its output will easily enter into a nonlinear region. To demonstrate this point, Fig. 7(a) shows the outputs of two adjacent PA's, PA(N) and , when the input voltage lies between their two input reference voltages, but is closer to . As a result, the output of PA(N), , has a smaller slope magnitude compared to that of , . The difference in slope causes the interpolated voltage, , to initially become negative (which is the expected correct value), then move towards zero, as the outputs of the PA's saturate. This would be an incorrect interpolated value and may send the following latch into a metastable state.
To overcome this problem, we have added another regenerative amplifier, denoted by IA in Fig. 8 , with its sampling aperture occurring after PA's aperture and before the settling of their outputs. The timing diagram of this modified structure is shown in Fig. 7(b) . The IA performs interpolation by amplifying the transient output difference of two PA's while valid. The same clock that triggers the PA's also triggers IA's. The amplifying window of the interpolating amplifiers is delayed with respect to the PA's by reducing the size of and (Fig. 6 ) in the IA's with respect to the corresponding transistors in the PA's.
B. Feed-Forward Blind-Sampling CDR
A feed-forward blind sampling CDR was implemented similar to that in [4] . Fig. 9 (a) shows a simplified block diagram of the CDR, where the ADC samples the data at twice the data rate and a digital phase detector calculates the sampling phase of ADC with respect to incoming data. Fig. 9 (b) shows the method of phase recovery using linear interpolation [4] . The samples are first arranged in groups of three with one sample being shared between two adjacent groups. The position of a possible zerocrossing with respect to the first sample of the group, , is calculated using linear interpolation. A digital low-pass filter averages this instantaneous zero-crossings and produces an average phase,
. Depending on and , the sliced value of either , and is selected as the recovered bit. The linear interpolation in the PD requires smooth data transitions for accurate phase recovery. While a 5 dB or more loss in typical channels is sufficient for this purpose, an anti-aliasing filter [11] has to be integrated with the receiver for shorter channels or when the CDR operates at lower data rates where channel attenuation drops significantly.
A frequency offset between transmitter data rate and receiver sampling rate will cause the samples to drift in the UI. Whenever the sampling phase moves one UI forward (backward) one sample needs to be inserted (dropped). The CDR produces a signal, which is sent to FIFO to add or drop the extra bit (refer to Fig. 3 ). In this paper, the FIFO data is read out at the exact rate of incoming data and, hence, the FIFO is never over/under flowed. In a commercial product, a flow control in data link layer is needed to adjust data throughput, so that FIFO will not over/under flow. 
C. DFE for Blind-Sampling CDR
The structure of digital DFE depends on the sampling scheme. In a phase tracking CDR, the sampling is performed at the eye-center of incoming data. This fixed sampling phase implies that the main cursor and the first post-cursor ISI are fixed for a given channel, thus the ISI replica generation block is simply providing a constant DFE coefficient, as illustrated in Fig. 10(a) . On the other hand, in a blind sampling CDR, as the sampling phase sweeps the UI, the value of main cursor and first post-cursor ISI change, as shown in Fig. 10(b) . This implies that the ISI replica generation should take into account the sampling phase and dynamically change the DFE coefficients according to the sampling phase.
To address the variable DFE coefficients of different sampling phases, the authors in [6] propose dividing the UI into eight bins (as shown in Fig. 11(a) ) and choosing an appropriate DFE coefficient from a look-up table based on where the sampling phase falls within one UI. Fig. 11(b) shows the simplified full-rate implementation of DFE for the CDR as proposed in [6] . As can be seen from this figure, two look-up tables produce the phase-dependent DFE coefficients for even and odd samples based on . Since the samples are half a UI apart, the corresponding DFE coefficients are shifted by four in the look-up table. The CDR uses three-sample windows to calculate the sampling phase. The samples are arranged such that two of the samples correspond to the current UI while the other corresponds to the previous UI . Hence for the implementation of 1-tap DFE, both and are required to remove the first post cursor ISI from and respectively. The measurement in [6] show that the manual 1-tap DFE is only capable of equalizing up to 13.3 dB of attenuation. However, for typical channels with higher attenuation, a 2-tap DFE or a combination of the 1-tap DFE with a linear equalizer should be used. Theoretically the DFE combined with the FFE presented in [5] is capable of equalizing channels up to 28 dB. Fig. 12 shows the block diagram of the conventional LMS adaptation engine for a phase-tracking CDR. In this diagram, represents the received signal at a discrete time , which corresponds to the center of the UI. Similarly, and represent the equalized signal and the recovered bits corresponding to the same interval. The core of the adaptive engine consists of subtracting , the equalized signal, from a reference level, , to produce an error signal, . This error signal is then correlated with the previous recovered bit, , to produce the DFE coefficient, , for a 1-tap DFE.
III. PROPOSED ADAPTATION ENGINE
If we limit the channel to the one that produces only one post-cursor ISI, the can take one of four values as depicted in Fig. 13(a) . After the ISI is removed, the equalized signal, , can only take one of two values. In fact, these two values are used as the reference voltage in Fig. 12 . For a CDR with a blind clock, on the other hand, the choice of is more complicated as illustrated in Fig. 13(b) . In this case, the sampling clock is not phase aligned to the center of the UI, and hence the equalized signal, at the sampling phase, may assume any of four possible values, two of which are also phase dependent. The two values corresponding to no transition do not depend on the sampling phase. The two that correspond to data transitions depend on the sampling phase. With transition filtering, the adaptation engine can use either sets as the desired levels. However, in this design we only use the phase-dependent desired levels, because proper operation of the phase detector requires equalization of edge samples. The phase-dependent desired levels provide a reference to the samples near the zero-crossings and thus can guide the adaptation engine to equalize those samples.
To accommodate this phase-dependent desired levels, we propose the modified LMS engine shown in Fig. 14 . The generator block in this diagram produces a desired level corresponding to the sampling phase. The only remaining problem is that the DFE coefficient has to change with the sampling phase and if the adaptation speed is lower than the rate at which the sampling phase is changing, then the adaptation may not converge to its final value for that phase. To resolve this issue, eight registers are used to store the DFE coefficients as in [6] but updated dynamically. At each sampling phase, only the corresponding DFE coefficient will be updated. In this way, each coefficient will reach its final value corresponding to that sampling phase. This may require several passes of sampling phase through that phase bin. Fig. 15 shows the detailed implementation of adaptation engine. To reduce adaptation area and power overhead, only two consecutive ADC samples, , are used. Based on these samples and , the desired waveform generator block produces phase-dependent desired levels, , which correspond to sampling phases 1/2 UI apart.
are then compared with corresponding equalized samples, . The resulting errors are multiplied by adaptation loop gain, , and the previous recovered bit. DFE coefficient updates are produced after a transition filtering that removes errors not corresponding to data transitions. Two 1:8 DMUX use to select two accumulators that store the corresponding DFE coefficients to be updated. The DFE coefficient select block (DCS) then selects the two DFE coefficients, , that are used in the DFE adders. The shape of the desired waveform can be derived from an equalized eye by dividing UI into 8 bins and then averaging the samples that fall in each bin. One drawback of this averaging scheme is the extra hardware required to store and update the desired waveform. Another drawback is the formation of interacting adaptation and desired waveform generation loops which can cause unpredictable behavior. As an example, if the adaptation starts with zero initial conditions, the eye opening at the output of DFE would be small, producing in turn small desired levels. As a result, the adaptation will not be able to work properly and the eye opening will not improve.
Another way to produce the desired waveform is to use a fixed pre-defined shape with adjustable amplitude to accommodate different input power levels. A triangular waveform is a suitable candidate because it is consistent with linear interpolation by the PD. In other words, if the adaptation converges perfectly so that the equalized eye becomes diamond shape, then the error in PD due to the linear interpolation should be minimal.
It is possible to merge the two methods described above to produce the desired levels. First we let the engine to adapt based on a pre-defined desired waveform and then switch to the averaging technique. Fig. 16 compares the performance of this combined approach against that of a triangular waveform only. It can be seen that the receiver jitter tolerance is better with the averaging scheme whenever high levels of random noise or jitter are added to the received signal or the receiver clock, respectively. In the actual implementation, we used triangular desired waveform because of its simplicity and less overhead compared to the other method. The desired waveform generator is shown in Fig. 17 . Two dynamic look-up tables calculate the desired levels for 2 samples that are 1/2 UI apart, based on a stored triangular waveform and . The height of the triangular waveform is adjusted based on the incoming data amplitude. The ADC samples that are closer to the center of the eye are rectified and averaged to produce an approximation of the incoming data amplitude.
The limited bandwidth and nonlinearity of the analog front-end (AFE) and the quantization noise of the ADC may adversely affect the adaptation or equalization. The bandwidth limitation of the AFE can be absorbed into the channel loss, thus it will only reduce the equalization range of 1-tap DFE. Both nonlinearity and quantization noise can be represented with additive noise and as a result they can also reduce the equalization range as they degrade the received signal on top of ISI degradation. For a random bit sequence, the adaptation loop, however, remains almost unaffected because it finds the DFE coefficients by correlating equalization error with the previous bit and averages out any high speed uncorrelated variation caused by the quantization noise and nonlinearity.
IV. SIMULATION AND MEASUREMENT RESULTS
The channels used in measurement consist of two FR4 daughter cards with 5-inch traces each and a backplane with FIG. 18(c)   Fig. 18 . Channel insertion loss for (a) a 26-inch and (b) a 34-inch FR4 channel.
(c) S of the channels used in simulation for Table I. adjustable trace length. The total length of the FR4 channels are 26-inch and 34-inch corresponding to insertion loss of 9.9 dB and 13.3 dB at the Nyquist frequency of 2.5 GHz (Fig. 18(a)  and (b) ).
The functional simulations were performed in Simulink using event-driven modeling [12] to increase simulation speed. The pulse response of the channels extracted from measured S-parameters were used in the simulation to emulate channel attenuation. The effect of adaptive 1-tap DFE on vertical and horizontal eye opening of the received signal and BER of the receiver for different channels has been presented in Table I . Although the 1-tap DFE is not able to open the eye for the lossy channels, the adaptation has improved the BER. Fig. 19 compares the simulated jitter tolerance of the receiver with the adaptive DFE (this work) against the manual DFE (based on [6] ). In both simulations, the target BER is (as contrasted with in measurements) and PRBS7 is used. A frequency offset of 50 ppm is introduced between the receiver and transmitter clock frequencies to emulate blind sampling. In addition, a Gaussian random jitter of 0.17 and 0.23 is introduced to the transmitter and the receiver clock, respectively. The simulation results confirm the adaptation is achieved with little or no loss to performance (jitter tolerance) in the 34-inch channel. To find the limit of adaptation, the manual DFE coefficients were swept for a given channel and the set of coefficients which reduced the receiver BER to less than were compared to the adapted coefficients in the adaptive DFE. It was observed that the 1-tap DFE is able to reach the target BER for channels up to 14.8 dB of attenuation, but the adaptive DFE, in spite of convergence of coefficients, was unable to achieve the target BER. Although the adaptive DFE falls behind the manual DFE by 1.5 dB, it automatically provides DFE coefficients that are otherwise quite time consuming to find.
The receiver test chip was implemented in Fujitsu's 65 nm CMOS process. The die photo is shown in Fig. 20 . The ADC and the digital CDR including all the test structures occupy an area of m and m respectively. A simplified measurement setup is shown in Fig. 21 . A Centellax board generating PRBS7 at 5 Gb/s was used as data source. The output amplitude of the PRBS generator did not cover ADC's input range, therefore we used a wideband amplifier with a gain of 7 dB after the PRBS generator. Based on the on-chip PRBS checker, the receiver operates at 5 Gb/s with . Fig. 22 shows the reconstructed eye diagrams of received data before and after the 1-tap DFE equalization. A small frequency offset between the receiver and the transmitter was used so that the sampling points sweep the UI. The samples from the ADC and DFE were extracted and post-processed to produce the eye diagrams. For the 34-inch channel, the adaptive DFE is able to open the otherwise closed eye of the received data by 320 mV.
The learning curves of the DFE coefficients are shown in Fig. 23 . Coefficients 1 to 4 are shown on the first and 5 to 8 on the second row. It can be seen that the DFE coefficients converge in around 80 . The implemented adaptation engine uses 2 out of 16 ADC samples to perform the adaptation. The adaptation speed can be increased by utilizing more samples at the expense of more hardware and power consumption. Increasing adaptation loop gain can also speed up the adaptation, however this may cause coefficients to drift whenever a non-random bit sequence is received.
The measurement results of receiver jitter tolerance for are plotted and compared with simulation results in Fig. 24 . Sinusoidal jitter was applied to the transmitted data by modulating the clock frequency of the PRBS board. Using an Agilent E8257D signal generator as the clock source, the maximum modulation frequency that this signal generator supports is 8 MHz, thus jitter tolerance measurement was limited to this frequency. It can be seen that at 8 MHz the receiver tolerates 0.29 and 0.2 of sinusoidal jitter for the 26-inch and the 34-inch channels, respectively. Finally, a performance summary is presented in Table II .
V. CONCLUSION
An adaptive DFE for a 2 blind sampling ADC-based CDR was described. The adaptation engine which provides the DFE coefficients uses phase-dependent desired levels for adaptation. A triangular waveform was used as the ideal reference waveform to guide the adaptation. While the CDR cannot provide error-free operation at 5 Gb/s for the 34-inch FR4 channel without equalization, it does provide a jitter tolerance of 0.2 with after adaptive equalization. The receiver consumes 192 mW, out of which, 114 mW is consumed by the flash ADC and 78 mW by the digital blocks. It is possible to reduce the overall power consumption by using fractional sampling architectures [13] or by reducing the ADC power consumption using different ADC architectures such as SAR. 
