ADC-based receivers process the received data in the digital domain, eliminating the need for much of the analog front end. In addition, a feed-forward blind architecture [1, 2] eliminates the feedback loop between digital and analog domains so that the ADC and digital CDR can be designed and simulated independently. Previous works [1,2] sampled the incoming data at 2 samples per UI and at 1.45 samples per UI to achieve 5Gb/s and 6.875Gb/s, respectively. To further increase the data rate to 10Gb/s, we sample at baud rate (1 sample per UI). Existing baud-rate architectures [3] rely on a phase-tracking clock to sample at the middle of the data eye. This paper presents a blind baud-rate CDR, fabricated in 65nm CMOS. At 10Gb/s, the CDR demonstrates a high-frequency jitter tolerance of 0.19UI with ±300ppm of frequency offset. Also, the CDR demonstrates successful operation with 1000ppm offset, which amounts to sub-baud-rate sampling.
.4.1 illustrates the main challenge in the blind baud-rate sampling approach. Given an ideal channel, the data eye is open when sampled with a phase range spanning 1UI. However, with blind sampling, frequency offset between the data and receiver clock will cause the baud-rate sampling phase to shift continuously across a 1UI window. When the sampling occurs near the boundary, any high-frequency jitter may shift the sampling outside the 1UI phase range, resulting in the loss of data bits (i.e., zero jitter tolerance). To increase jitter tolerance, we introduce a controlled amount of ISI in the data by using a rectangular filter implemented as an integrate-and-dump (I&D) circuit [4] in the receiver front end. A 1UI rectangular filter, convolved with the ideal channel, spreads the pulse response to 2UI. If we have a perfect DFE to cancel all postcursor ISI, then the eye would be open for a range of 1.5UI. If the blind samples shift beyond the 1UI window, there is still a remaining jitter margin of 0.5UI pp . A 2UI rectangular filter increases this margin to 1UIpp and results in a symmetric eye opening with respect to the blind sampling window. For these reasons, we choose a 2UI I&D circuit in our design. The front end consists of four interleaved I&D and ADC blocks, each operating at 2.5GS/s. The I&D circuit integrates over 1UI intervals and samples the result, which the ADC converts into 5b digital values that are demuxed into 16 parallel samples. The digital CDR adds adjacent 5b 1UI I&D samples to synthesize 6b 2UI I&D samples. Since our ADC resolution is limited to 5 bits, if we were to obtain 2UI I&D samples directly in the analog domain and feed them to the ADC, we would have lost the added resolution.
The digital CDR contains a feedback loop including a data interpolator, a speculative 2-tap DFE, a speculative Mueller-Muller phase detector (MMPD), and a conventional 2 nd -order loop filter. The data interpolator estimates the desired sample at the average interpolation phase (ϕ avg ) by linearly combining neighbouring blind samples. Given a negative frequency offset (i.e., data slower than baud rate), the interpolated sample is dropped each time the phase completes a 1UI rotation. A positive frequency offset would result in cases where no blind sample exists between two desired samples; the interpolator resolves these cases by interpolating twice between the two closest blind samples. The DFE compensates for post-cursor ISI from the channel and I&D filter. As shown in Fig. 7 .4.1, recovering data from an ideal channel and 2UI I&D filter would require one DFE tap; a more attenuative channel may require more taps (we use two taps in this design). The DFE is implemented speculatively to reduce feedback loop latency. DFE adaptation is not included in this design. The MMPD is described later in Fig. 7 shows the circuit details of the 1UI I&D front end. This circuit introduces controlled ISI into the ADC input and also operates as a frequency-scalable anti-aliasing filter [4] . The I&D block consists of a single source-degenerated transconductance stage that integrates the signal on the input capacitance of the ADCs. Four clock pulses synchronize I&D block operation across 3 phases: integrate, hold, and reset. Any skew between the clock phases would cause jit-ter tolerance degradation in one of two ways. First, clock skew would appear effectively as high-frequency periodic jitter. Second, the interleaved I&D blocks would experience a gain mismatch because the integration is performed over the pulse width. For these reasons, a variable delay of up to 20ps is included in each of the four CML-to-CMOS converters in order to compensate for clock skew. [5, 6] . The maximum vertical eye opening occurs when the main cursor, h 0 , is at time T-this is our desired sampling phase. This sampling phase results in a post-cursor ISI equal to the main cursor and zero pre-cursor ISI, which allows us to fully benefit from the DFE and to minimize the need for an FFE. To identify this desired phase location, we estimate the MMPD function, F = h 0 -h 1 , at every sample and force it to zero through the feedback loop. Since our actual sampling phase is blind, we force the desired phase on the interpolating phase.
It is known [5] that h 0 and h 1 can be estimated by the expected values, E[x k A k ] and E[x k A k-1 ], respectively, where x k is the data sample (in our design, the interpolated I&D sample) and A k is the recovered bit. To reduce loop latency, we estimate h 0 by the expected value of x k-1 A k-1 . The resulting MMPD output, (x k-1 -x k )A k-1 , can be evaluated speculatively; the PD output is ready in the same cycle that A k-1 becomes available. . The DC transfer characteristic shows the gain errors before and after skew correction. We are able to reduce the relative gain error by adjusting the phase delays between the four divided clocks, as described in Fig. 7.4. 3. The right diagram shows the eye after the 1UI I&D and ADC blocks (the 2UI I&D is performed in the digital domain). Since four ADCs are interleaved in this design, we have superimposed four output eyes to construct the worst-case eye diagram. Despite the mismatch between the interleaved front-end blocks, the digital CDR is able to recover the data as we will see in Fig. 7 .4.6. Since the 2UI I&D produces a null at the Nyquist frequency (5GHz), this design is not limited by the channel attenuation at Nyquist frequency. Instead, we are limited by the combined channel and I&D attenuation at half the Nyquist frequency (i.e., 2.5GHz), which is a less-restrictive constraint. In our measurements, the total attenuation at 2.5GHz is approximately 10dB. Figure 7 .4.6 shows the jitter tolerance curve for 10Gb/s PRBS-7 input for cases of zero and ±300ppm frequency offset with a worst-case high-frequency jitter tolerance of 0.19UI pp . In addition, we demonstrate successful operation when sampling below the baud rate by including a jitter tolerance curve for +1000ppm frequency offset. This offset is close to loop filter limit for this design. 
