Abstract: With a new phase-rotating phase locked loop (RPLL), a 5-Gbit/s quarter-rate clock and data recovery (CDR) circuit is presented in this brief. The RPLL employs a split-tuned architecture to decouple the tradeoff between RPLL bandwidth and power consumption. The uncertainty of phase interpolation due to the non-deterministic characteristics of the phase frequency detector (PFD) is eliminated by employing a PFD synchronizer (PFDS). Hence RPLL precisely performs seamless phase adjustment. The CDR, implemented in a digital 65 nm CMOS technology, shows 5.5-ps rms and 47.2-ps peak-to-peak jitter in the recovered clock and 10 −12 bit error rate while consuming 10.3 mW from a 1.2-V supply.
Introduction
The rapidly growing input-output (IO) bandwidth demand in optical communications, backplane routing and chip-to-chip interconnects has raised the importance of high-throughput serial transceiver integrated circuits (ICs). This brought multiple serial links to be integrated into a large digital system as an IO circuit block. Hence, reducing power consumption and area occupancy of serial link is critical in the development of high performance serial transceiver. The clock and data recovery (CDR) circuit is the most essential building block of the serial link determining the power consumption and performance. Fig. 1 shows the dual-loop CDR architecture that is the most popular structure in multi-channel serial links because extra receiver (RX) can be simply appended by adding only the CDR loop while sharing the phase-locked loop (PLL). Furthermore, it is easy to implement and the stability of PLL is not coupled to that of CDR loop [1] . The PLL generates multiple phases of clocks, which are distributed to each RX channel and used by the phase interpolator and phase rotator to adjust sampling phases to the incoming data. Despite the simplicity of this architecture, high resolution phase inter-polator over a wide frequency range requires extraordinary area and power [2] . Also global multi-phase clock distribution and phase interpolators consume considerable power as the number of channel increases even though the PLL is shared.
To overcome these problems, phase-rotating PLLs (RPLLs) are presented [3, 4, 5] . They are able to reduce the area and power consumption by merging generation, interpolation and rotation of phases in a single PLL feedback path. However, the XOR phase detector in adjustable PLL [3, 4] has narrow lock-in range, and susceptibility to duty cycle variation. In addition, an auxiliary PFD loop is needed for initial lock acquisition. Although the phase-rotating method with dual phase frequency detector (PFD) in DP-PLL [5] does not require an auxiliary PFD loop and perfect duty cycle correction, it is vulnerable to wrong coarse phase adjustment because PFDs may capture different clock edges depending on which moment the PFDs are activated. Moreover, they employ the conventional supply regulated VCO to achieve low jitter, so that the tradeoff between PLL bandwidth and power dissipation poses several design challenges for low power RPLL design.
In this paper, a multi-PFD RPLL is proposed and applied to a 5-Gbit/s CDR circuit. The proposed RPLL presents phase-rotating techniques with a PFD synchronizer (PFDS) that eliminates the drawbacks associated with conventional phase-rotating methods. Furthermore, to decouple the tradeoff between PLL bandwidth and regulator power consumption, the split-tuned architecture is employed [8] . The multiple-pass loop VCO architecture is adopted to increase the proposed RPLL's operating frequency [10, 11] . Section 2 describes the CDR architecture and presents stability analysis and other general considerations related to the jitter characteristics. Section 3 provides the implementation details of the proposed RPLL. The test results are reported in Section 4. Finally the conclusions are drawn in Section 5.
Proposed CDR architecture
The detailed block diagram of the proposed CDR circuit is shown in Fig. 2 . It consists of two main components, an analog RPLL and a digital CDR loop. In order to ensure the sufficient sampling margin to the samplers, a quarter-rate data sampling is used. The frequency tracking range is extended by adopting a second order digital loop filter (DLF).
With the reference frequency input (È ref ), the RPLL generates the eight-phase clocks (È 1 -È 8 ) evenly spaced by 45°whose frequency is 1/8 of the input data rate. Also, the phase shift is introduced in the phase rotator controller which is digitally controlled by the 6-bit loop filter output. The eight VCO phases enter the receiver samplers, SD 0:3 for data sampling and SE 0:3 for edge sampling. The eight sampler outputs are then de-multiplexed to 32 samples by 8:32 de-multiplexer operating at one quarter of the VCO frequency. To alleviate speed requirements in the downstream digital circuitry, the subsequent circuitry is also clocked with this quarterrate recovered clock RXCLK. The data D 0:15 and edge E 0:15 samples go through 16 bang-bang phase detectors (BBPDs), to detect the 16 polarity of the phase comparators. The resulting error signals are resolved by a majority vote to produce a 3-level (early/late/no transition) phase error signal. The phase error is filtered by a second order digital loop filter consisting of a proportional and an integral path. The phase position is then encoded as 6-bit digital control signal to track the phase and frequency error by the RPLL, which closes the CDR loop.
Unlike the conventional dual-loop CDR, the VCO output serves as the recovered clock instead of the phase interpolator output. Also, the RPLL is embedded within the CDR loop and hence the loop response of the RPLL influences the loop dynamics of the CDR loop. The following analysis discusses the criterion for choosing the bandwidth of the RPLL and CDR loop to guarantee the stability and to improve the jitter characteristics.
A. Stability analysis
To ensure the stability of second order CDR loop, as described [6] , the phase change due to the proportional branch dominates the phase change due to the integral branch. The interaction between the two branches can be described using a stability factor ξ, defined as the ratio of the output phase change by the proportional branch to the output phase change by the integral branch. The stability factor can be expressed as
where K P , K I represent the gain of the proportional and the integral branches, respectively, and ÁÈ is the phase step of the RPLL that equals to 1UI/2 6 (UI is clock period of VCO) in the prototype. In this work, a large stability factor ðK P =K I ¼ 128Þ is set to guarantee stable operation of the CDR.
Although a large stability factor can ensure the stability of the second order CDR loop, another stability analysis of the proposed CDR has to be verified because there is the interaction of the RPLL and the CDR loop. In the proposed architecture, the phase rotator is placed inside the RPLL feedback path, the RPLL acts simply as a low-pass filter for the phase rotating. In order for the RPLL to not affect stability of CDR loop, the RPLL should respond more quickly than the maximum rate of phase change of the CDR loop. In other words, the RPLL bandwidth should be larger than the maximum bandwidth of CDR loop. This condition can be expressed as
where DF is the decimation factor used to reduce the dithering jitter to one phase step, and T RXCLK is the update interval of the CDR loop. Substituting design parameters of T RXCLK ¼ 4UI and DF ¼ 1, the PLL bandwidth needs to be much larger than 2.5 MHz to ensure stability of the CDR loop. Since the RPLL's output frequency is the same as its input reference clock, it can operate at very high bandwidth. The simulated RPLL bandwidth is 120 MHz which can easily satisfy the stability requirements in (2).
B. Jitter characteristics
Jitter tolerance is the maximum input jitter that the CDR loop can tolerate without increase in the bit error rate at a given jitter frequency. At low jitter frequency region, the integral path can respond to the input jitter as well as the proportional path, causing the output phase slewing to be quadratic. This slewing makes the maximum tolerable jitter amplitude rolls off at a −40 dB/dec slope. As the jitter frequency increases, the integral path changes negligibly and the output phase slews linearly at a rate proportional to the proportional step (K P Á ÁÈ). At high jitter frequency region, the CDR is unable to track the jitter, so the amplitude of jitter larger than sampling margin (ideally 0.5UI) will cause a bit error regardless of jitter frequency. Thus, the tracking bandwidth f T , defined as the maximum frequency of the input jitter that can be tracked by the CDR is given by the following equation [7] 
where f 0 is the nominal bit rate of the input data, α is the transition density defined with respect to a period of T RXCLK . Clearly, boosting proportional step leads to a large tracking bandwidth and improves jitter tolerance at all frequency. Another important metric of CDR performance is jitter generation, which is jitter component created by a CDR even if its input is jitter-free. The nonlinear nature of BBPD causes the CDR output clock to dither when the CDR is in lock. This dithering appears as dither jitter and is the major component of the jitter generation of the CDR. Unfortunately, the amount of dither jitter is proportional to proportional step, resulting in the direct tradeoff between jitter generation and jitter tolerance. Moreover, the proposed CDR has an inconvenient stability requirement in optimizing its jitter tolerance in the presence of the coupled stability of the RPLL and digital CDR loop. As indicated in equations (2), the maximum CDR bandwidth is also proportional to proportional step. Thus, to improve jitter tolerance, RPLL bandwidth should be large enough to meet the stability condition.
All this underlines the importance of selecting an appropriate proportional step and wide RPLL bandwidth. Although the linear nature of linear PD can decouple the tradeoff between jitter generation and jitter tolerance, the conversion of a time difference to a digital word requires complicated circuits with a large area and large power penalty [7] . Furthermore, a linear PD cannot alleviate stability requirement between the RPLL and digital CDR loop. Therefore, the proposed CDR adopts a BBPD which only detects the polarity of the phase error and trim means to sweep the proportional and integral gains. Assuming a transition density of 1, K P ¼ 1, and K I ¼ 2 À7 , the simulated tracking bandwidth of the proposed CDR is close to 19.5 MHz.
Multi-PFD phase rotating PLL design
The degraded jitter due to power supply noise is a main performance limiting factor. Scaling of CMOS process dictates increasing VCO gain, the impact of supply noise on PLLs is further increased. As is generally observed, the most sensitive effect of supply noise is on the VCO and hence a low dropout regulator that shields the VCO from the supply noise is commonly used [3, 4, 5] . However, this conventional approach suffers from the tradeoff between PLL bandwidth and power consumption. Since the regulator represents additional poles in the PLL loop, to maintain PLL stability the regulator bandwidth should be much wider than the PLL bandwidth, resulting in excessive power consumption. The RPLL needs to have sufficiently wide bandwidth due to CDR stability and jitter tolerance, further exacerbating power consumption by increasing the regulator bandwidth.
The drawbacks of the conventional supply regulation techniques described above can be solved with the use of a split-tuned architecture. Fig. 3 shows the overall block diagram of the proposed multi-PFD RPLL that enables wide RPLL bandwidth with low power dissipation. It consists of a phase rotator controller, a low-pass loop filter realized using passive components R 1 , C 1 , and C 2 , a switched G m -C 3 integrator, a replica-biased (I REP ) supply regulator, and a 4-stage split-tuned VCO that is controlled by separating a high-gain low-bandwidth coarse control path (V C ) and a low-gain high-bandwidth control path (V F ). The replica-biasing helps RPLL achieve effective supply noise rejection with minimum power and area penalty [8] . In the coarse control path, the switched G m -C 3 integrator uses a 1:512 duty-cycle clock signal, SCK, to effectively increase the integrator time constant. The switched G m -C 3 integrator generates the coarse control voltage which is buffered through a low dropout regulator to produce the supply voltage of the VCO.
The placement of the regulator in the low-bandwidth coarse control path decouples the tradeoff PLL bandwidth and power consumption [8] . By designing the cross over frequency of the coarse control path to be much smaller than the RPLL zero frequency, the coarse control path has little effect on the RPLL phase margin or bandwidth. Hence, the RPLL loop dynamics are mainly dictated by the low-gain high-bandwidth fine control path. This architecture decouples the RPLL bandwidth from regulator bandwidth, thereby alleviating the above tradeoff. 
A. Phase adjustment
In Fig. 3 , the control of the phase rotating is achieved through components including four PFDs, two PFD multiplexers (MUXs), two weighted CPs, and a PFDS. The phase adjustment is controlled by 6-bit digital inputs -2 MSB h0:1i and 4 LSB h0:3i. Upper and lower PFD clock inputs are È 1 , È 5 and È 3 , È 7 respectively. A coarse phase adjustment is achieved by selecting 2-to-1 MUX and the two PFDs. Selected two PFDs compare input clocks with the reference clock È ref . A fine phase adjustment is achieved by multiplying the output values of the CPs by weighting factor of β and by summing the relative weighting currents. However, the phase adjustment method using multi-PFD has a potential problem due to non-deterministic edge capture characteristics of the PFD. Fig. 4 shows that a phase adjustment may end up with two different locking states with the same input set depending on which moment the working PFD are selected. In Fig. 4(a) edges. In Fig. 4(b To avoid the phase interpolation error due to the non-deterministic characteristics of the PFD, the enable timing of the PFD select signal should be controlled [9] . Fig. 5 shows the PFDS block diagram that generates PFD select signals at a right time. Flip-flops are used to set up PFD select signals for capturing the next rising edge of input clock in response to 90°delayed clock, respectively. Thus PFD select signals always activate during desired selection period, no undesired interpolation happens.
B. VCO design
The VCO, shown in Fig. 6 , employs a four-stage split tuned ring oscillator composed of pseudo-differential delay cells. To achieve a high frequency operation at low regulated supply voltages, the multiple-loop architecture is employed [10, 11] . This technique adds the secondary loop that provides an additional entry to decrease slew time of the output nodes. The secondary loop is implemented by adding the set of additional inputs, ins+ and insÀ, to every stage and switching these earlier than the primary inputs. The supply voltage, V C , of the VCO serves as its high gain coarse tuning voltage. Tunable CMOS-based inverters are employed in each delay cell for fine tuning the VCO frequency as shown in Fig. 6(b) .
In order to have fine control gain much lower than that of the coarse control, the secondary loop tunable inverter is split into an x1 inverter of fixed strength and another x1 inverter tuned by fine control voltage V F . Since the fine control path needs to have a wide bandwidth, the tunable inverter is implemented with a simple NMOS-only V/I converter. The fine control voltage is applied onto all four delay cells to preserve equal spacing between phases thus making this architecture suitable for multi-phase clock recovery. Two small cross coupling inverters are added to guarantee stable oscillation and to achieve pseudo-differential clock phases. The simulated coarse and fine gains of the VCO are approximately 3.4 GHz/V and 500 MHz/V, respectively. A self-biased VCO buffer similar to the one employed in [3] is used to level shift the delay cell output to a rail-to-rail signal with 50% duty cycle.
Experimental results
The proposed dual-loop CDR circuit was fabricated in a 65 nm, one-poly sevenmetal, CMOS technology. The layout and the die microphotograph of the CDR are shown in Fig. 7 . The core of the CDR circuit is 0.6-mm Â 0.4-mm. The measured phase delay characteristic is shown in Fig. 8 . The measured delay curve is monotonic, with a phase step value ranging from 9 ps to 17 ps at the nominal value of 800 ps=64 ¼ 12:5 ps. Fig. 9 shows the measured eye diagram of the retimed parallel data at 1.25 Gb/s (¼ 5 Gb/s/4) and recovered clock operating at 1.25 GHz. With the 5 Gb/s 2 15 À 1 PRBS data input, the measured bit error rate (BER) is less than 10 À12 . This BER verifies the proper functionality of the proposed PFDS. The jitter histogram of the recovered clock is shown in Fig. 10 where the rms and peak-to-peak jitter are measured to be 5.5 ps and 47.2 ps respectively. Table I compares the key performance of the proposed multi-PFD RPLL with other published RPLLs. Operating at 1.25 GHz, the RPLL consumes 1.4 mW power from a 1.2 V supply, of which 0.54 mW is consumed in the VCO and regulator. It shows the smallest power among the RPLLs reported in [3, 4, 5] .
Conclusion
A 5-Gbit/s CDR circuit with a multi-PFD RPLL has been presented. With the proposed split-tuned VCO architecture, the RPLL achieves both higher frequency oscillation and wider bandwidth with low power consumption. Furthermore, by using multi-PFD phase-interpolating techniques with PFDS, precise and correct phase adjustment is accomplished. The CDR performance of the recovered clock jitter of 5.5 ps rms and BER less than 10 À12 while drawing 10.3 mW of power from the 1.2-V supply is verified. 
