# A 5Gbit/s CMOS Clock and Data Recovery Circuit Tan Kok-Siang, Mohd Shahiman Sulaiman, Tan Soon-Hwei, Mamun B I Reaz, F Mohd-Yasin Abstract – This paper presents a half-rate 5Gb/s clock and data recovery circuit. Retiming of data is done by the linear PD that provides practically no systematic offset for the frequency band of interest. The circuit was designed in a $0.18-\mu m$ CMOS process and occupies an active area of $0.2 \times 0.32$ mm<sup>2</sup>. The CDR exhibits an RMS jitter of $\pm 1.2$ ps and a peak-to-peak jitter of 5ps. The power dissipation is 97mW from a 1.8-V supply. ## I. INTRODUCTION Data transmitted at gigahertz range inherently gets distorted as it travels beyond a specific distance. The resulting data and clock signals are considerably noisy, asynchronous, jittery, and very difficult to be extracted. In order to receive data at gigahertz speed reliably, both embedded clock and data signals have to be regenerated before proceeding to signal-processing circuitries. For this reason, clock and data recovery (CDR) circuit plays a significant role. Designing a CDR is one of the most challenging tasks because the block defines the performance of the overall chip-to-chip transceiver system. A common method to realize a CDR is by using a phase-locked loop (PLL). The PLL will adjust the frequency of the recovered clock and compensate for process, temperature and supply voltage variations. The PLL requires a linear phase detector (PD) that exhibits low-jitter performance in lock condition, but suffers from non-linearity for non-uniform data patterns and requires an external loop filter. The linear PD is highly sensitive to mismatches and its maximum operating frequency is limited by the speed of the flip-flops. This work solves the linearity and speed issues mentioned above through a simple approach of using analog logics rather than the conventional digital flip-flops. A half-rate architecture is chosen to reduce VCO sensitivity which results in better jitter performance and lower power dissipation. This paper will describe the design of a 5-Gbps CMOS CDR for inter-chip communications, employing a linear half-rate PD, integrated filter, and a wide-tuning-range interpolation-based ring oscillator. The outline for the remainder of this paper is as follows. CDR architecture and circuit design are discussed in Section II. Results and CDR circuit performance are described in Section III. Conclusions are presented in Section IV. # II. CDR ARCHITECTURE AND CIRCUIT DESIGN Half-rate architecture has been reported in open literature [1]-[3]. [1] makes use of a binary phase detector. Although it is easy to design, yet it could result in high jitter due to large ripple on the VCO control signal and for a system with a very sensitive VCO, i.e. large gain. [2] and [3] are good CDRs but the large peak-to-peak jitter ([2]) and high power consumption ([3]) are not so desirable. The circuit presented in this article has a significantly smaller output jitter than [2], makes use of linear phase detector, which is easier to design compared to a binary phase detector in [1] and [2], and is data-dependent. #### A. Phase Detector Fig. 1 shows the architecture of a half-rate CDR. A linear phase detector provides linear characteristic that reduces jitter in locked condition as opposed to the binary or bang-bang phase detector. The half-rate property of the PD also relaxes VCO design because the VCO can now run at one-half the input frequency. This, then, translates to lower jitter. Apart from that, the dynamic power consumption is significantly reduced. Fig. 1 Linear half-rate architecture The linear phase detector automatically re-times in the middle of a bit period, like a Hogge phase detector, and de-multiplexes the input data to generate two half-rate signals, $D_A$ and $D_B$ [4]. The two half-rate signals are then combined through a multiplexer to obtain full-rate output data streams, Dout. The phase detector generates *Error* and *Ref* signals to eliminate its data dependency problem and dead zone issue. Due to this reason, the architecture of the XOR/XNOR gate used in the phase detector is crucial. Source couple logic (SCL) XOR is not suitable for low supply voltage Tan Kok-Siang, Mohd Shahiman Sulaiman, Tan Soon-Hwei, Mamun B I Reaz, and F Mohd-Yasin are with the VLSI Research Group, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia. because there are four transistor stage stacks from VDD to GND. It also requires the use of level shifter because the input voltages are at different DC level. The XOR architecture in Fig. 2 is more appropriate for low power supply technology compared to the SCL XOR [5]. However, the inputs B+ and B- are connected to two different transistors that results in two current paths, leading to phase offset, which is solved by the symmetric XOR gate proposed by [3]. However, it requires on chip V<sub>ref</sub> that needs to track PVT variations to keep V<sub>ref</sub> stable. This may increase circuit complexity. The common-mode current source in the circuit might draw different current due to the mismatch in the size of M<sub>out</sub> for both Error and Ref XOR gates. This difference causes phase offset that requires careful design and layout technique to in order to minimize it. In this work, an XNOR gate without V<sub>ref</sub> (Fig. 3) is used to minimize phase offset problem and also considering that the next transistor stage is a PMOS charge pump input. Fig. 2 XOR/XNOR gate with different propagation delays For the XNOR gate illustrated in Fig. 3, the gates of M1 and M2 are connected to nodes A and B, respectively, to eliminate the need for $V_{ref}$ , which helps to reduce circuit complexity and makes the circuit operating frequency less dependent on parasitic capacitance unlike the symmetric XOR gate in [3]. The level converters for *Error* and *Ref* signals have identical current flow to avoid different values of threshold voltage, $V_{th}$ . This ensures minimal systematic mismatch in charge pump. Fig. 3 Symmetric XNOR gate without V<sub>ref</sub> Fig. 4 shows the phase detector characteristic. It demonstrates a linear behavior and absence of dead zone. The linear detectable range is roughly from 54° (30ps) to 306° (170ps), which is about 252° (140ps). The linearity of this phase detector results in minimal charge pump activity and small ripples on the control line while in locked condition, hence improving jitter performance tremendously. Fig. 4 Linear phase detector characteristic ## B. Voltage-Controlled Oscillator, VCO The VCO is made-up of three stages of differential ring oscillator. Due to the wide tuning range target for the CDR, hence the VCO, the differential delay cell architecture with resistive load (Fig. 5) is chosen for this work. Fig. 5 Transistor level of a delay cell The delay cell consists of a fast path, a slow path and an additional slow path [6]. Simulations suggested that this delay cell has 3.8-GHz tuning range. It outperforms that of the differential delay cell employed in [3]. For the delay cell used in this work (Fig. 5), two additional transistors $M_5$ and $M_6$ are connected back to back to form an additional latch-type slow path that gives an option to further increase the cell's delay time hence increasing VCO tuning range. Maintaining constant voltage swings at the output is desirable so that a delay cell can be easily cascaded in many stages. By making total current, $I_{total}$ , fixed, voltage drops across RI and R2 are constant. The concept behind the circuit (Fig. 5) is based on equation: $$I_{total} = I_{fast} + I_{slow1} + I_{slow2}$$ (1) where $I_{fast}$ is the total current flow in fast path, $I_{slowI}$ and $I_{slow2}$ are the total current flow in slow path and additional path, respectively. $I_{slowI}$ is made equal to $I_{slow2}$ . Because of that, $I_{M9}$ , $I_{M11}$ and $I_{M10}$ , $I_{M12}$ is equal to one-half of $I_{M7}$ and $I_{M8}$ , respectively, to satisfy Equation (1). The oscillator frequency is achieved through proper steering of current between the fast and the slow paths. Low supply voltage limits the possibility of stacking differential pairs under transistors M<sub>1</sub>-M<sub>2</sub>, M<sub>3</sub>-M<sub>4</sub> and M<sub>5</sub>-M<sub>6</sub>. As a result, current variation that controls the delay time is performed through mirror arrangements driven by PMOS differential pairs from coarse and fine control cells (Fig. 6). This suggests that the VCO employs current folding technique. Fig. 7 illustrates the linearity of the VCO oscillation frequency controlled by the fine control signal for temperature ranging from 0 - 100°C. The linearity helps in reducing jitter at the output. Small signal gain of the delay cell is 4.4dB at 2.5GHz and is considered moderate. If high gain value is obtained, it might result in higher unwanted phase noise [7]. Fig. 6 Coarse and fine control cells Fig. 7 Fine control gain C. Loop Filter A second order loop filter is used to ensure CDR's stability, with the third pole placed far from the origin to filter out high frequency noise. However, jitter peaking in a PLL system can be reduced by increasing the system's damping ratio, $\zeta$ , turning the CDR into an overdamped system [8]. This may result in long acquisition time. Therefore, values for $\zeta$ , resistor value, R, and capacitor value, $C_1$ , should be carefully chosen based on the loop bandwidth. Loop bandwidth of the Type II CDR given by: $$\omega_{BW} \approx K_{VCO} K_{PD} I_{CP} R$$ (2) where $I_{CP}$ = charge pump current, $K_{VCO}$ = VCO gain, $K_{PD}$ = phase detector gain, and R = resistor in the loop filter. Loop bandwidth is given by total system gain multiplied by R, yet not a function of capacitor, $C_1$ . Equation (2) is used to determine the value of R. Value for capacitor $C_1$ is determined based on the amount of jitter peaking JP allowed in the system, based on the jitter peaking equation given as: $$JP \approx 1 + \frac{1}{K_{VCO}K_{PD}I_{CP}R^{2}C_{1}}$$ $$= 1 + \frac{1}{\omega_{BW}RC_{1}}$$ (3) Equation (3) also suggests that increasing C while the loop bandwidth, $\omega_{BW}$ , remains fixed could reduce jitter peaking. #### III. RESULTS The CDR was designed on CMOS 0.18-µm process and occupies an active area of 0.2 x 0.32 mm<sup>2</sup>. The maximum power dissipation for the CDR is 97mW from a 1.8-V supply at 5GHz. Fig. 8 shows the spectrum of the clock in response to a 5-Gbps data sequence. The maximum lock time for the CDR is less than 150 ns. RMS jitter and peak-to-peak jitter for 11-bit Pseudo-Random Bit Sequence (PRBS) input are 1.03ps and 5ps respectively. A plot of RMS jitter against the number of bit of PRBS input is shown in Fig. 9. The jitter is not more than 1.2 ps for PRBS input of 5 up to 24 bits. 5-bit PRBS RMS jitter is around 0.3ps, and the RMS jitter peaks up at 1.2ps for PRBS input of 17 bits. In order to have more insight on jitter in the existence of modulated noise signal, RMS jitters against various modulated noise frequencies are plotted in Fig. 10 for 11-bit PRBS input. It is found that the average clock jitter is around 2.83ps for 11-bit PRBS signal. The RMS jitters deceases gradually as the modulated noise frequencies increase. This is true because the CDR is able to track high modulated noise frequencies. Fig. 8 Spectrum of recovered clock Fig. 9 RMS jitter against number of bits of PRBS input Fig. 10 RMS jitter against modulated noise frequency for 11 bits PRBS #### IV. CONCLUSIONS A 5-GHz CMOS Clock and Data Recovery circuit has been designed based on 1.8V TSMC 0.18µm technology for PCI-Express standard and the circuit achieves a 1.2-ps RMS jitter with maximum lock time of 150 ns, consuming 97-mW of power at full-speed. #### ACKNOWLEDGEMENT This work was supported in part by Intel Corporation (Malaysia) through Intel Research Grant. ## REFERENCES - [1] M. Rau et al., "Clock/Data Recovery PLL using Half-Frequency Clock", IEEE J. Solid-State Circuits, vol. 32, pp. 1156-1159, July 1997. - [2] L. Sang-Hyun et al., "A 5Gb/s 0.25um CMOS Jitter-Tolerant Variable-Interval Oversampling Clock/Data Recovery Circuits", ISSCC 2002 Digest of Technical Papers, pp. 463 465, February 2002. - [3] J. Savoj, B. Razavi "A 10Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector", IEEE J. Solid-State Circuits, Vol. 36, pp. 761-767, May 2001. - [4] C.R. Hogge, "A Self Correcting Clock Recovery Circuit", Journal of Lightware Technology, Vol. LT-3 No. 6, pp. 1312-1314, 1985. - [5] S.J. Song et al., "A 4Gb/s CMOS Clock and Data Recovery Circuit Using 1/8-Rate Clock Technique", IEEE JSSC, Vol. 38, pp. 1213-1219, July 2003. - [6] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, Singapore, 2001. - [7] A. Hajimiri, S. Limotyrakis, T.H. Lee, "Jitter and Phase Noise in Ring Oscillators", IEEE JSSC, Vol. 34, pp. 790-804, 1999. - [8] L.M. DeVito, "A Versatile Clock Recovery Architecture and Monolithic Implementation", Monolithic Phase-Locked Loops and Clock Recovery Circuits, Theory and Design, New York: IEEE Press, 1996.