Burst-mode clock and data recovery circuits (BMCDR) are widely used in passive optical networks (PON) [1] and as a replacement for conventional CDRs in clock-forwarding links to reduce power [2] . In PON, a single CDR performs the task of clock and data recovery for several burst sequences, each originating from a different source. As a result, the BMCDR is required to lock to an incoming data stream within tens of UIs (for example 40ns in GPON). Previous works use either injection locking [3, 4] or gated VCO [5, 6] to achieve this fast locking. In both cases, the control voltage of the CDR's VCO is generated by a reference PLL with a matching VCO to guarantee accurate frequency locking. However, any component mismatch between the two VCO's results in a frequency offset between the reference PLL frequency and the CDR's VCO frequency, and hence in a reduction of the CDR's tolerance for consecutive identical digits (CID). For example, [7] reports a frequency offset of over 20MHz (2000ppm) for 10Gb/s operation. We present a BMCDR that is based on phase interpolation (PI), eliminating the possibility of local frequency offset between the reference and recovered clock. We demonstrate 1 to 6Gb/s operation in 65nm CMOS with a locking time of less than 1UI.
Burst-mode clock and data recovery circuits (BMCDR) are widely used in passive optical networks (PON) [1] and as a replacement for conventional CDRs in clock-forwarding links to reduce power [2] . In PON, a single CDR performs the task of clock and data recovery for several burst sequences, each originating from a different source. As a result, the BMCDR is required to lock to an incoming data stream within tens of UIs (for example 40ns in GPON). Previous works use either injection locking [3, 4] or gated VCO [5, 6] to achieve this fast locking. In both cases, the control voltage of the CDR's VCO is generated by a reference PLL with a matching VCO to guarantee accurate frequency locking. However, any component mismatch between the two VCO's results in a frequency offset between the reference PLL frequency and the CDR's VCO frequency, and hence in a reduction of the CDR's tolerance for consecutive identical digits (CID). For example, [7] reports a frequency offset of over 20MHz (2000ppm) for 10Gb/s operation. We present a BMCDR that is based on phase interpolation (PI), eliminating the possibility of local frequency offset between the reference and recovered clock. We demonstrate 1 to 6Gb/s operation in 65nm CMOS with a locking time of less than 1UI.
The principle of operation for the proposed PI-based BMCDR is illustrated in Fig.  8 .7.1. Since the main function of a BMCDR is to align the rising edge of the clock with the data transition, we use the data to sample-and-hold (S/H) two quadrature clocks (CK I and CK Q ) with a pair of dual-edge-triggered S/H circuits. The samples of CK I and CK Q at a data transition, denoted by β and α respectively, are then used to interpolate between CK I and CK Q . This, as shown on Fig. 8.7 .1, produces a recovered clock (CK REC ) with a rising edge that is aligned with the data transition. To demonstrate this analytically, assume sin(2πft) and -cos(2πft) represent CK I and CK Q respectively. A data transition at t = t 0 yields the PI coefficients α=CK Q (t 0 ) = -cos(2πft 0 ) and β=CK I (t 0 ) = sin(2πft 0 ). As a result, CK REC (t) = βCK Q (t) -αCK I (t) = sin(2πf(t-t 0 )) which is a clock whose zero crossing coincides with the data transition.
In our implementation, CK I and CK Q are provided externally; however, in an integrated system CK I and CK Q may be provided from a PLL or be generated from the forwarded clock. In the case of using a received clock, there is no frequency mismatch between CK I /CK Q and the embedded clock in the data. When using a PLL, a small frequency offset may be present between CK I /CK Q and the input due to a frequency error in the PLL reference. For sufficient transition density, the PI's output phase is updated at every data transition, hence any frequency offset between data and the reference clock is tracked. Dual-edge sampling is implemented by connecting the outputs of two S/Hs operating at opposite data edges. Each S/H is implemented as a master/slave configuration triggered by either the rising or the falling edge of data. The S/Hs are able to track up to 6GHz clock signals and as such no extra capacitor besides the existing parasitic capacitance is required. A buffer is inserted between the pass transistors and the transmission gates to prevent charge sharing and to avoid any kickback from the output transmission gates to the input pass gates. To provide rail-to-rail levels for switching, a CML-to-CMOS converter is placed before the S/Hs.
The circuit implementation of the analog PI is shown in Fig. 8.7.3 . A differential trans-conductance stage is used to convert the coefficients (α and β) to current. The tails currents of CK I and CK Q are used as weighting factors in the PI. The operation of -αCK I + βCK Q is performed in current mode at the nodes of the resistors and then converted to an output voltage, CK REC . Since α and β are differential signals, their signs can be changed. This allows the PI to create phases from 0° to 360°. Immunity to charge injection from the S/H is enhanced by providing inputs to the PI differentially. Figure 8.7.4 shows the block diagram of the entire receiver along with the recovered data for a 6Gb/s PRBS10 input. The output of the clock recovery unit is buffered to drive the input of the clock divider and flip-flop. To compensate for the delay of the buffer in the clock path, a replica of the CK REC buffer is placed in the data path. The recovered clock and data are buffered further (not shown) to be observed off-chip for direct probing. The measured peak-to-peak clock jitter is 24ps and the eye opening for recovered data is 500mV. The CDR recovers a 6Gb/s PRBS7 sequence with a BER of less than 10 -12 in the presence of a 100MHz frequency offset between CK I /CK Q and the input.
Figure 8.7.5 shows the measured characteristics of the PI. The input data has been phase shifted and the corresponding incremental phase shift of phase interpolator output at 4GHz and 6GHz has been recorded. At 4GHz and 6GHz, the recovered clock is shown for 30° input phase steps. The maximum deviation from ideal interpolation is 6.5° at 4GHz and 2° at 6GHz. The PI's latency and unequal delay between the clock path and the data path may produce a phase offset in the CDR. At 6Gb/s this delay was simulated to be less than one tenth of a UI and as such was left uncompensated.
To measure the time it takes for the clock edge to align with the data edge (locking time), a deliberate frequency offset is introduced. This frequency offset combined with a long CID introduces a phase jump that allows the CDR locking behaviour to be observed (Fig. 8.7.6 ). In the 1Gb/s and 6Gb/s cases, the clock is delayed by the PI in order to align with the data edge. In the 2.5Gb/s case, the clock is reversed whereas in the 4Gb/s case the clock is advanced. A Centellax OTB3P1A PRBS generator is used to create PRBS10 patterns from 1 to 6Gb/s. An Agilent DCA-J 81600C digital communication analyzer (with 8611A 20GHz electrical module) is used to capture sharp phase transitions in the recovered clock.
The chip is fabricated in a 65nm CMOS process. The receiver area is 250×70μm 2 , of which 50×70μm 2 is occupied by the clock recovery circuitry (S/H and PI). The chip operates from a 1.2V supply and consumes a total power of 22mW (excluding output drivers), of which 3.8mW is consumed by the clock recovery circuit. DIGEST 
