Abstract-A hybrid analog-digital quarter-rate clock and data recovery circuit (CDR) that achieves a wide-tracking range and excellent frequency and phase tracking resolution is presented in this paper. A split-tuned analog phase-locked loop (PLL) provides eight equally spaced phases needed for quarter-rate data recovery and the digital CDR loop adjusts the phase of the PLL output clocks in a precise manner to facilitate plesiochronous clocking. The CDR employs a second-order digital loop filter and combines delta-sigma modulation with the analog PLL to achieve sub-picosecond phase resolution and better than 2 ppm frequency resolution. A test chip fabricated in a 0.18 m CMOS process achieves BER 10 12 and consumes 14 mW power while operating at 2 Gb/s. The tracking range is greater than 5000 ppm and 2500 ppm at 10 kHz and 20 kHz modulation frequencies, respectively, making this CDR suitable for systems employing spread-spectrum clocking.
I. INTRODUCTION

S
CALING of CMOS technology has progressed relentlessly for the past several decades and brought unprecedented benefits to digital integrated circuits (ICs). In order for the improvements of individual ICs to benefit the overall system, the performance of inter-chip communication circuits must also scale accordingly. To meet this growing demand for large input-output (IO) bandwidth, systems such as network switches and processor-memory interfaces employ hundreds of IO links to achieve an aggregate IO bandwidth in excess of 100 Gb/s. A simplified block diagram of such an IO link is depicted in Fig. 1 . It consists of a transmitter, a channel, and a receiver. The transmitter sends the data to the receiver over a channel, typically a printed circuit board trace or a co-axial cable. Since the clock is embedded in the data, the receiver needs to recover both the clock and the data from the incoming serial data. The design of such a receiver is the focus of this paper. In particular, the design issues of the clock and data recovery (CDR) circuit such as limited frequency/phase acquisition and tracking range, recovered clock (RCK) jitter, and sensitivity to intrinsic noise sources are addressed. The proposed CDR architecture employs delta-sigma modulation and a second-order digital loop filter and achieves wide tracking range without degrading the phase noise performance. Also, this design uses digital circuitry for high-resolution phase interpolation and obviates the need for a high-resolution, process, voltage, and temperature sensitive analog phase interpolator. The high phase and frequency resolution achieved by this architecture enables both low dithering jitter and precise tracking of frequency modulated input data. This paper is organized as follows. The issues of conventional CDRs are first highlighted through a brief review of two popular CDR architectures in Section II. The proposed CDR architecture that seeks to overcome the drawbacks of prior CDRs is presented in Section III. The implementation details of the two main components of the proposed architecture, namely the digital clock and data recovery loop and the phase-locked loop are discussed in Sections IV and V, respectively. The measured results obtained from the prototype chip are presented in Section VI and the key contributions of this paper are summarized in Section VII.
II. PRIOR ART
A. Dual-Loop CDR
One of the most commonly used CDR architectures is the dual-loop structure shown in Fig. 2 [1 ]- [3] . It consists of a cascade of two loops, namely, a core phase-locked loop (PLL) and a peripheral clock and data recovery (CDR) loop. A PLL generates multiple phases, which are used by the phase interpolator to introduce a controlled phase shift in the recovered 0018-9200/$25.00 © 2008 IEEE clock (RCK). The quantized phase error output of the bang-bang phase detector (!!PD) drives the finite state machine (FSM), which controls the phase interpolator through a digital control word . The negative feedback of the CDR loop forces the recovered clock phase to the middle of the received data. Conceptually, the state machine can be viewed as a simple roll-over integrator to facilitate unlimited phase shifting for plesiochronous clocking between the transmitter and the receiver. Even though the simplicity of this architecture led to its widespread usage, there are three main disadvantages: 1) excessive clock jitter due to the nonlinearity of the phase interpolator; 2) tightly coupled jitter generation and jitter tolerance parameters; and 3) large area and power penalty for multi-phase clock recovery. These issues are elaborated next.
The steady state of the bang-bang controlled CDR is a limit cycle whose amplitude and frequency is determined by the feedback loop delay. This oscillatory steady state manifests itself as recovered clock dithering jitter that is proportional to the feedback loop delay and phase resolution of the interpolator. Even though a decimation filter can reduce this dithering jitter to one phase step [3] , a small phase step is still needed to minimize the dithering jitter of the recovered clock. However, designing a high resolution (or small phase step) phase interpolator is a challenging task. As discussed in [4] and [5] , the output of the phase interpolator not only depends on the input digital control word , but also on the input rise time, phase separation between the interpolator inputs, and the interpolator output time constant. The nonlinearity resulting from any of these sub-optimal interpolator parameters introduces large phase jumps as illustrated in the representative transfer function of the phase interpolator, shown in Fig. 3 . Ideally, the minimum phase step is equal to , but interpolator nonlinearity can introduce a much larger phase jump, , that severely degrades the recovered clock jitter. It is important to note that integral nonlinearity (INL) of the phase interpolator is not a concern since it is suppressed by the large feedback loop gain. Overcoming the differential nonlinearity of the phase interpolator is a challenging design task and typically requires considerable area and power. In addition to the phase interpolator design issues discussed thus far, this architecture also suffers from tightly coupled jitter generation and jitter tolerance parameters.
In applications employing spread-spectrum clocking such as serial-ATA [6] , it is important to maximize the CDR's ability to track a large frequency difference between the transmitter and the receiver [7] . The presence of any such frequency difference results in an increasing (or decreasing) phase error between the incoming data and the recovered clock. In order to acquire lock, the CDR needs to continuously advance (or delay) the phase of the recovered clock so as to reduce the phase error. Consequently, the maximum tolerable frequency error, referred to as frequency tolerance, directly depends on the maximum rate of phase change. It is easily calculated to be (1) where is the resolution of the phase interpolator, is the update period of the state machine, DF is the decimation factor used to reduce the dithering jitter to one phase step, and is the transition density defined with respect to a period of . Substituting typical parameters of  ,  ,  , and , where is equal to the incoming bit period, results in a maximum frequency tolerance of about 490 parts per million (ppm). In practice, frequency tolerance is somewhat smaller due to jitter on both the incoming data and the recovered clock. On the other hand, the tracking bandwidth defined as the maximum frequency of the input sinusoidal jitter that can be tracked by the CDR is given by (2) where is the amplitude of the input sinusoidal jitter. As indicated by (1) and (2), both the frequency tolerance and the tracking bandwidth can be improved by increasing the step size of the interpolator. 1 This requirement contradicts the need for a smaller phase step to minimize dithering jitter. In addition to the conflicting requirements on the interpolator step size, the dual-loop CDR also requires additional hardware to generate multiple equally spaced clock phases. For example, to use the dual-loop CDR for sub-data-rate clock recovery either additional interpolators [8] or a delay-locked loop [9] is needed to generate multiple clock phases, thus, incurring both area and power penalties.
In summary, the dual-loop CDR offers the advantages of a relatively simple implementation and uncoupled stability of the core PLL and peripheral CDR loops. However, the difficulty in implementing a high-resolution phase interpolator that operates over a wide range of frequencies, the direct tradeoff between jitter generation and jitter tolerance, and the power penalty for multi-phase clock recovery mandate a new architecture to over come these limitations. A phase averaging CDR, described next, is one such architecture that seeks to improve phase interpolator resolution and is also suitable for multi-phase clock recovery with smaller area and lower power.
B. Phase Averaging CDR
An improved architecture proposed by Larsson [10] increases phase resolution and mitigates the effects of the interpolator's discrete phase jumps via phase averaging. A block diagram of the Larsson CDR is shown in Fig. 4 . In this architecture the interpolator is placed inside the PLL feedback and the voltage-controlled oscillator (VCO) output serves as the recovered clock instead of the phase interpolator output. This architecture offers two important benefits. First, it is well-suited for multi-phase clock recovery since the VCO provides the required multiple equally spaced clock phases, thereby obviating multiple powerhungry and area-inefficient phase interpolators. Second, phase jumps and phase quantization error at the output of the interpolator are suppressed by the low-pass transfer function of the PLL. In order to illustrate this filtering action, the phase interpolator controlled by the digital control word is modelled as a summing block as shown in Fig. 5 , where represents the phase quantization error, denotes the low-pass transfer function of the PLL, and is the one-sided phase quantization error power spectral density (PSD) given by, (3) where is the step size of the phase interpolator and is the input reference frequency of the PLL. Since timing noise due to both inter-symbol interference and intrinsic noise sources such as thermal and flicker cause the digital loop filter output to change by several LSBs, we assume the phase quantization error is uniformly distributed and has a flat spectrum. Clearly, the PLL suppresses phase quantization error outside its bandwidth, thereby improving the phase resolution. The filtered quantization error PSD assuming a second-order PLL response can be written as (4) Since the PLL suppresses out-of-band noise, this architecture simplifies the design of the phase interpolator. In other words, high output phase resolution can be achieved with a coarse interpolator phase step . This improvement in phase resolution can be seen by calculating the variance of the output phase as below:
Using (6), we can conclude that a critically damped PLL with a bandwidth of one tenth the reference frequency, reduces the root mean square (rms) value of the output phase quantization error by approximately a factor of 2 compared to that of a conventional phase interpolator with no phase averaging. Unfortunately, this improvement only doubles with every quadrupling reduction of the PLL bandwidth. Consequently, a very low PLL bandwidth is needed to achieve a high phase resolution using this approach. This requirement conflicts with the high PLL bandwidth needed to suppress VCO phase noise and reduces the effectiveness of this architecture. Even though it is possible to design low phase noise VCOs, such designs dissipate exorbitantly large power due to the well-understood phase noise versus power consumption tradeoff [11] .
The phase averaging CDR as outlined earlier, eases the design of the phase interpolator and is inherently suitable for multiphase clock recovery. However, the effectiveness of this CDR is severely limited by the bandwidth conflict to simultaneously reduce the phase noise of the VCO and the phase interpolator. In view of these drawbacks with the phase-averaging CDR, an architecture that seeks to achieve wide PLL bandwidth while achieving excellent phase and frequency resolution is presented in Section III.
III. PROPOSED CDR ARCHITECTURE
The proposed architecture employs: 1) delta-sigma modulation to alleviate the noise bandwidth tradeoff and 2) a second-order digital loop filter to improve frequency-tracking range. A delta-sigma modulator shapes the quantization error of the phase interpolator to high frequency so that the loop dynamics of the PLL can filter the phase noise of the interpolator more efficiently. As a result of noise shaping and phase filtering, the proposed CDR achieves sub-picosecond phase resolution and about 2 ppm frequency resolution. The proportional-integral loop filter improves the tracking range of the CDR to better than 2500 ppm, thus making it suitable for applications employing spread-spectrum clocking. The proposed CDR is also amenable to multi-phase clock recovery by tapping multiple phases directly out of the VCO.
A. Overall Operation
A block diagram of the proposed architecture is shown in Fig. 6 [12] . There are two main components of the CDR-an analog PLL and a digital CDR. The PLL's main function is to generate evenly spaced multi-phase clocks, , that drive interleaved receiver samplers. There are eight such clock phases and samplers-four for clock recovery and four for data recovery. These eight sampler outputs are down sampled by 4 to alleviate speed requirements in the downstream digital circuitry, albeit at the expense of sacrificing some tracking bandwidth. The additional delay due to down-sampling increases dithering jitter in conventional CDRs. However, in phase averaging CDRs the feedback loop delay is dominated by the finite bandwidth of the phase generating PLL. A bang-bang phase detector generates 3-level phase error information by performing early/late detection and a simple majority vote on the 32 incoming samples. This phase error is filtered by a digital loop filter consisting of a proportional and an integral path to produce a 14-bit filter output. Given the difficulty of implementing a 14-bit phase interpolator with good linearity, a fully digital CDR controller that takes advantage of the phase filtering characteristics of the PLL is employed. The 14-bit loop filter output is quantized to three levels ( , 0) by a second-order delta-sigma modulator (DSM). This 3-level output drives a phase rotator which converts the DSM output of , 0, and to phase delay, no-change, and phase advance, respectively. The phase rotator also ensures unlimited phase capture range and consequently accommodates plesiochronous clocking. The phase rotator is implemented as a one hot 8-bit circular shift register whose output is used to select one out of the eight VCO phases. The selected phase is fed back to the PFD after dividing its frequency by four. The low-pass phase transfer function of the PLL filters the shaped quantization noise of the delta-sigma modulator and, in combi- nation with the digital control of the CDR loop, appropriately aligns the VCO clock phases to optimally sample the incoming data. In other words, the delta-sigma modulator combined with the analog PLL functions as a very high resolution phase interpolator. Since the phase-interpolation is implemented by phase selection and filtering, this architecture completely eliminates problems associated with conventional phase interpolators.
One common problem of conventional interpolators is their sensitivity to PVT variations. Since the proposed interpolator is realized by a combination of digital control circuitry and a simple multiplexer, it is immune to process, voltage, and temperature (PVT) variations. Furthermore, the rotating nature of the phase interpolator used in this design, guarantees an average gain of radians over the total number of control bits as illustrated in Fig. 7 . Therefore, if we assume jitter from the incoming data is constant and all of the other components of the loop gain are set by digital circuitry, the CDR's loop dynamics are immune to PVT variations.
The phase and frequency resolution achieved by this architecture is considered next. By virtue of delta-sigma dithering and PLL filtering, phase resolution is ideally given by (7) where 1UI is equal to the VCO clock period, is the number of VCO phases, and B is the number of bits in the loop filter output. Quantitatively, there are phases in between the adjacent phases of the VCO. In practice, however, incomplete filtering of the shaped noise lowers the resolution from this ideal value and frequency resolution is determined by the rate at which the VCO phase can be updated by the CDR loop. Assuming integral gain to be , the CDR loop can increment/decrement the VCO phase by phase steps every update period, . In the proposed architecture, the update rate is 4 times slower than the VCO clock frequency. Consequently, the frequency resolution can be expressed as (8) ppm (9) Using parameters from the prototype, better than 2 ppm frequency resolution is achieved. Note that it is also possible to achieve higher phase/frequency resolution by simply choosing a larger number of loop filter bits at the expense of proportional reduction in tracking bandwidth.
B. Phase and Frequency Tracking
The tracking bandwidth of this CDR depends on both the input jitter amplitude and frequency as shown earlier by (2) . This equation derived for a dual-loop CDR is also valid for the proposed architecture. It indicates that if the input jitter has large amplitude or if it varies with high frequency, the CDR slews, and as a result the output phase cannot track the input jitter. In this architecture, the integral path in the loop filter extends the tracking bandwidth of the CDR. In the presence of a large phase error, the bang-bang phase detector is overloaded and outputs a long string of 's or 's. The integrator accumulates this continuous stream of identical outputs and drives the VCO toward frequency lock. In the prototype, with an integral gain equal to one, the integral loop moves the VCO center frequency in steps of about 2 ppm. The tracking range of the CDR, defined as the range of input frequencies the CDR can track without losing lock, is equal to ppm (10) is the width of the integrator output and is the frequency resolution. An extra factor of one half in this equation is used to accommodate the fact that only about half the full-scale range of the delta-sigma modulator is used in this design. 2 Using a 14-bit integrator and a frequency resolution of 2 ppm, the CDR has 7780 ppm of frequency tracking range. This rather large tracking range is valid only if the input data frequency varies at a slow rate. If the frequency varies at a rate faster than 2 The full-scale input to the 3-level DSM overloads the internal quantizer and degrades noise shaping significantly and may even cause instability. It is a common practice to limit the DSM input through scaling or by some other means [13] . , the integral loop will not be able to move the VCO frequency fast enough to track it. As a result, the integral loop slews and the phase error grows quadratically, causing the CDR to eventually lose lock.
C. Lock Range
We have thus far discussed the tracking properties of the proposed CDR. However, this analysis is based on the assumption that the CDR is in phase lock. When the CDR is not phase locked, and if the frequency difference between the incoming data and the local VCO frequency is small, the proportional path will acquire phase lock without cycle slipping. In other words, if the phase error resulting from the frequency error varies at a rate slower than the phase tracking range of the proportional path, phase lock will be achieved. Therefore, the lock range, defined as the range of frequencies within which the CDR acquires phase lock without cycle slipping is given by ppm (11) is the proportional gain and is equal to 128 in the prototype chip. This lock range is adequate to accommodate the ppm frequency tolerance of commercial crystal oscillators.
D. Stability Analysis
The proposed architecture contains two feedback loops where the PLL is embedded within the global digital clock and data recovery loop. The following analysis discusses the criterion for choosing the bandwidth of each of these loops to guarantee unconditional stability. The stability of the phase-locked loop is addressed later and is assumed to be stable for the analysis here. The inherent nonlinear nature of the digital clock and data recovery loop precludes the use of well-known stability-analysis techniques such as Nyquist plots, available for linear systems. However, in the case of nonlinear loops, as described by Walker [14] , the stability of the second-order digital CDR loop can be guaranteed by ensuring the output phase change due to the proportional path dominates the phase change due to the integral path. Analogous to the damping factor in linear systems, stability factor , defined as the ratio of the output phase change due to the proportional path to the output phase change due to the integral path, can be used to quantify the stability of the nonlinear CDR loop. (12) A large stability factor ensures that loop dynamics are dominated by the proportional path, thereby achieving an under-damped response.
Even though the stability of a second-order CDR loop is guaranteed by a large , the stability analysis of the proposed CDR is complicated by the interaction of two feedbacks namely the PLL and the digital CDR loops. Consider the block diagram of the proposed CDR shown in Fig. 8 in which the PLL is depicted simply as a low-pass filter. From this figure, one can see the PLL loop is embedded within the digital CDR loop. Therefore, in order for the PLL to not affect CDR stability, the PLL bandwidth should be larger than the maximum rate of phase change of the digital CDR loop. This condition can be expressed as (13) Substituting typical parameters from the prototype provided earlier, the PLL bandwidth needs to be much larger than 60 kHz to ensure stability of the complete CDR. Since the bandwidth of the PLL operating with several hundred megahertz update rates is at least a few megahertz, the condition in (13) can be easily met. While (13) represents the lower bound on PLL bandwidth, the upper bound is set by the filtering required to sufficiently suppress DSM and VCO noise. The use of the delta-sigma modulator, verified by simulation results presented later, allows for a larger PLL bandwidth, compared to Larsson's architecture, to filter out the VCO noise without exacerbating the recovered clock jitter due to the shaped quantization error.
IV. DIGITAL CLOCK AND DATA RECOVERY LOOP DESIGN This section presents the implementation details of the digital CDR loop. A detailed block diagram of the multi-phase digital clock and data recovery loop is shown in Fig. 9 . The PLLbased phase interpolator provides eight equally spaced clock phases to four data samplers and four edge samplers. This multiphase approach reduces the maximum on-chip clock frequency to quarter of the input data rate. The eight samples, four edge and four data, are further de-multiplexed to 32 samples by two stages of de-multiplexers operating at one half and one quarter of the VCO frequency, respectively. The subsequent digital circuitry is also clocked with this slow quarter-rate recovered clock SCK, whose frequency is nominally at the PLL reference frequency. The 32 samples go through 16 early/late decoders to generate 16 early/late/hold signals, which are then resolved by a majority vote to produce a 3-level phase error signal. The phase error is then filtered by the digital loop filter with a proportional and an integral gain of 128 and 1, respectively. A 14-bit accumulator is used to implement the integral control and the proportional gain of 128 is realized by arithmetically shifting the phase error signal left by 7 bit positions. The resulting 14-bit filter output is quantized to 3-levels by the second-order error-feedback DSM, whose output drives the phase rotator. The phase rotator output [8:1] selects one of the output phases of the VCO.
A. CDR Building Blocks
The sensitivity and the delay of the front-end data/edge samplers determine the maximum operating speed of the CDR. The need to regenerate the high-speed, small-amplitude input data signal to a rail-to-rail output signal makes the design of samplers difficult. A pair of series-connected sense amplifier based sampler, shown in Fig. 10(a) , is employed in this design [15] . A schematic of the sense amplifier (SA) is presented in Fig. 10(b) [16] . The data evaluation time and sensitivity are important performance metrics of the data/edge samplers. Series connected sense amplifiers (Fig. 10(a) ) increase the data evaluation time to 4-bit periods by decoupling the evaluation periods of the two sense amplifiers and . The sensitivity is improved by minimizing the offset of through the choice of reasonably large input devices and careful layout that reduces both transistor mismatch and parasitic capacitances and . Additionally, the input devices of are scaled down to improve the regeneration speed of and a symmetric latch [17] is used to minimize hysteresis in the second stage sense amplifier. The eight sampler outputs are demultiplexed to produce 16 data samples and 16 edge samples that are processed by a set of 16 bang-bang (!!) phase detectors [18] . The intermediate edge sample between two adjacent data samples is used to determine whether the clock is early or late. In the absence of a data transition, the hold signal is activated. To reduce the complexity of the digital loop filter, these 16 sets of early/late/hold signals are resolved by a majority vote to produce a 3-level phase error signal.
The delta-sigma modulator (DSM) used in this design employs a single-loop second-order error feedback architecture with a 3-level internal quantizer [13] . The 3-level DSM output drives the phase rotator, which is implemented by the circular shift register (CSR) shown in Fig. 11 . On power-up, the CSR is reset to a 2-hot state in which the output of the first two stages and is set high while the rest of the stages are reset low. As explained later, setting the first two stages high is needed to guarantee glitch-free phase switching. The DSM output shifts the register contents left (right) corresponding to a of . The CSR contents are held in the same state if the is equal to 0. 
B. Glitch-Free Phase Switching Multiplexer
The operation of the CDR relies on glitch-free phase switching of the clock that is fed back to the PFD through the divider (see Fig. 6 ). In a practical implementation, however, glitches can occur on the phase multiplexer output that can then drive the PLL out of lock resulting in a complete failure of the overall CDR. This problem is illustrated in Fig. 12 . When the current feedback clock switches to an advanced phase in the shaded region, where the two clock phases take on different values, a glitch occurs on the selected phase as marked on the waveform. This spurious glitch injects a large phase error into the PLL-resulting in large jitter on the recovered clock or possibly driving the PLL out of lock. In other phase-switching applications such as the high-frequency divider, theses glitches are avoided by using slow rise times for the control signal [19] or by synchronizing the control signal with the feedback clock [20] or with the latest phase [5] . The slow rise times are susceptible to process variations and resynchronization approaches limit operation speed due to the feedback loop delay. In [21] , a retimer circuit is used to synchronize the control signal in a feed forward way to avoid glitches. Even though this method is robust, it incurs large area and power penalty when extending it to switching 8 phases. In this design a fully-digital control is combined with the retimer circuit in [21] to achieve glitch-free operation.
The operation of the digital phase-switching control circuit is explained with the aid of its schematic shown in Fig. 13 and the associated time diagram shown in Fig. 14 . The eight phases are split into two sets of even ( , , , and ) and odd ( , , , and ) phases. The select control signals, EVEN and ODD, are generated so that the phases in one set are switched when the output phase is selected from the other set. Consequently, even if glitches occur during the phase transition at the output of the unselected multiplexer, they do not appear at the final output . One of the outputs of the two multiplexers is then selected by a glitch-free retimer circuit based on the control signal . In case the DSM output is either or , simply alternates between 0 and 1 and, therefore, can be simply generated by dividing the clock (SCK) by 2. In order to account for the zero DSM output, an additional exclusive NOR gate is used to realize the conditional divide-by-2 operation. Further, the clocked nand gates synchronize the to the negative edge of the clock. Having discussed the details of the digital CDR loop implementation, the PLL design is presented in Section V.
V. PHASE-LOCKED LOOP DESIGN
A major challenge in the design of PLLs in deep submicron processes is the degraded noise sensitivity due to the high gain of the VCO. The continued scaling of CMOS processes to deep submicron dimensions provides transistors with very high unity current gain frequency . However, this scaling requires similar shrinking of the supply voltage to ensure transistor reliability. As a result of these two scaling trends, the voltage-controlled ring oscillators designed in deep submicron processes to operate over wide frequency range have very large gain. In such oscillators, a reduction in gain typically comes at the expense of a reduced operating frequency range. A block diagram of a split-tuned PLL that breaks the tradeoff between the operating range and the VCO gain is shown in Fig. 15 . This architecture enables both wide operating range and reduced VCO gain. The PLL consists of a phase frequency detector (PFD), a level shifter (SHFT), a charge pump (CP), a loop filter consisting of an RC network, a integrator, two voltage-to-current (V2I) converters, a 4-stage split-tuned current-controlled ring oscillator (CCO) controlled by separate high-gain coarse and low-gain fine inputs, and a divider in the feedback path. The PFD compares the frequency and phase of the reference clock (REF) with the frequency and phase of the feedback clock to generate the error output in the form of digital up (UP) and down (DN) pulses. These output pulses are level shifted to minimize clock feed through in the charge pump. The CP converts the UP/DN pulses into an analog current that is converted to a voltage via the passive loop filter. The output of the loop filter serves as the fine control voltage. A separate frequency-tracking loop (referred to as the coarse loop hereafter) integrates the voltage across the loop filter capacitor and drives the VCO toward frequency lock [22] , [23] . The integrator is implemented as a first order filter. Note that the coarse loop also biases the output of the charge pump to a pre-defined voltage, , irrespective of the operating frequency. Voltage-to-current converters (V2Is), as discussed later, are used to linearize the VCO transfer characteristics and also to suppress the sensitivity of loop dynamics to resistor variation in the loop filter.
A. PLL Stability Analysis
Stability of the split-tuned PLL is complicated by the additional coarse tuning loop. In fact, this PLL behaves as a fourth-order control loop and as a result, ensuring unconditional stability is more difficult compared to conventional third-order PLLs. However, it is still possible to perform a stability analysis using conventional methods. The loop gain of the PLL can be (16) where is the charge-pump bias current, N is the feedback divider ratio, and are the bandwidths of fine, and coarse V2Is respectively, and are the coarse and fine gains of the VCO, 3 respectively. Equation (15) assumes the output impedance of the stage is infinite for simplicity. In practice, the finite output impedance moves the integrator pole from DC to a slightly higher frequency. Note that the fine loop gain shown in (14) is same the as the loop gain of conventional third-order PLLs. By designing the cross-over frequency of the coarse loop to be much smaller than the zero frequency , the coarse loop has little effect on the overall loop phase margin. Hence, the loop dynamics of the proposed PLL are determined by the fine loop. The design procedure involves choosing the fine loop parameters to meet bandwidth and phase margin requirements and then adjusting the coarse loop parameters, and , to reduce the effect of the coarse loop on the overall loop dynamics. The simulated loop gain frequency responses using the PLL loop parameters in Table I [24] is shown in Fig. 16 . Fig. 16(a) depicts the loop gain magnitude response of the fine and coarse loops. Note that the cross-over frequency of the coarse loop gain is much smaller than the zero frequency of the fine loop gain. This is achieved by designing the coarse loop integrator with a large time constant . The effect of the coarse loop on the overall PLL loop dynamics is examined by using the loop gain frequency response shown Fig. 16(b) . The solid line represents the full PLL loop gain response, which is the sum of the coarse loop and the fine loop responses. This plot shows the PLL has a cross-over frequency of about 6 MHz and has phase margin and gain margin of 65 and 30 dB, respectively. In order to evaluate the effect of the coarse loop on the loop dynamics of the complete PLL, the frequency response of just the fine loop is overlaid on the overall (coarse + fine) loop response in Fig. 16(b) . This plot indicates negligible gain and phase margin degradation of the overall loop due to the coarse loop.
B. PLL Noise Analysis
Noise is an important design concern in a phase-locked loop. The intrinsic and extrinsic noise sources such as thermal/flicker noise and the shaped quantization error of the DSM, respectively, along with the interference from supply and substrate noise appear as phase noise on the output clock phases. This phase noise manifests itself as the uncertainty of the zero crossing times of the output clocks, referred to as clock jitter, to degrade the bit error rate of the receiver. Hence, it is of paramount importance to minimize PLL output phase noise. A small-signal model of the PLL depicting all the intrinsic noise sources is shown in Fig. 17 . Each of the noise sources are represented by their respective power spectral densities (PSD). For example, the current noise PSD of the charge pump output current and the voltage noise PSD of the loop filter resistor are represented by and , respectively. The feedback network noise PSD, , represents the noise of the divider, the phase-selecting multiplexer, and the buffers used at the output of MUX. and are the loop filter transfer functions of the fine and coarse loops and are given by (17) (18)
The noise sources depicted in Fig. 17 are shaped differently by the PLL loop, determined by the noise transfer functions (NTFs) associated with each of them. For example, the charge pump noise is low-pass filtered and the VCO phase noise is high-pass shaped, both with a bandwidth equal to that of the PLL. In other words, increasing the bandwidth of the PLL exacerbates the charge pump noise and suppresses VCO noise, and vice versa. Given this tradeoff, the PLL bandwidth is carefully optimized to minimize the total output phase noise (19) where each of the individual terms are equal to the product of noise PSD with the squared magnitude of the corresponding NTF. For example, is calculated as follows:
The final result obtained from noise bandwidth optimization simulations is presented in Fig. 18 . The PLL output phase noise is shown by the thick solid line. Other lines indicate the noise contribution from individual noise sources as depicted in the figure. The PLL phase noise at low frequency is dominated by the charge pump and the feedback network, while the VCO noise is the dominant source at mid-to-high frequencies. Note that the delta-sigma dithering noise is sufficiently suppressed despite the use of a reasonably large PLL bandwidth of about 6 MHz.
C. PLL Building Blocks
A detailed description of the PLL building blocks in Fig. 15 is presented here. The phase frequency detector (PFD) employs the popular 3-state machine architecture and is implemented using the latch-based structure [25] . Small cross-coupled inverters are used at the output to generate fully differential UP and DN signals. These PFD outputs are level shifted to minimize glitches on the differential charge pump output that would otherwise be caused by the feed-through of rail-to-rail UP/DN signals. The level shifters (SHFT) are implemented using diode clamped common source amplifiers. An important concern in the design of the charge pump in our PLL is the folding of the shaped high-frequency noise [26] . This folding is caused by the nonlinearity of the charge pump resulting from current mismatch. There are two major sources of UP/DOWN current mismatch in a charge pump. First, the inherent mismatch between the UP current (PMOS) and the DN current (NMOS) sources introduces a static error. Second, a varying charge pump output voltage modulates the pMOS and nMOS current sources differently resulting in a current mismatch that is dependent on the operating frequency of the PLL. In the split-tuned PLL, however, the coarse loop drives the charge-pump output voltage to independent of the operating frequency. As a result, the UP and DOWN current mismatch due to a varying charge pump output voltage is eliminated.
The charge-pump circuit shown in Fig. 19 uses replica biasing to suppress the static current mismatch. The bias for the pMOS current source is derived from the nMOS current source by a slow feedback loop. As a result, the static UP/DN current mismatch is suppressed by the loop gain of the feedback. The integrator in Fig. 15 is implemented using a folded-cascode transconductor and a poly-poly integrating capacitor . As mentioned earlier, a large coarse loop integrator time constant, , is needed to ensure stability and to suppress the coarse loop's impact on PLL loop dynamics. In order to reduce the area penalty imposed by a large capacitor , the transconductance is minimized by using weak positive feedback. The simulated is equal to 5 and output impedance is roughly 72 M . Both the coarse and fine voltages are converted to current signals using V2I converters shown in Fig. 20 . The coarse V2I converter shown in Fig. 20 (a) uses negative feedback to maximize the operating frequency range of the PLL. The single-stage folded-cascode feedback amplifier employs complementary differential input pairs to achieve a rail-to-rail input common mode range and nearly rail-to-rail output swing. Since the amplifier bandwidth does not influence the coarse loop operation, low bias currents are used to reduce power consumption. On the other hand, the V2I converter in the fine loop needs to have a large bandwidth to minimally impact PLL loop dynamics. An open loop V2I converter shown in Fig. 20(b) is used to achieve the required large bandwidth. The simulated V2I bandwidth is more than 15 times the PLL bandwidth. There are two advantages of using resistor-based V2I converters. First, the linear transfer characteristic 4 of the V2I suppresses the gain variation of the VCO. The simulated coarse and fine VCO gain variation is less than 10% over the whole operating range. Second, by matching the V2I resistors and to the loop filter resistor , the loop gain variation due to loop filter resistor variations is cancelled by an equal and opposite change in the VCO gain. As a result, sensitivity of PLL dynamics to resistor variations is suppressed. Behavioral simulation results presented in Fig. 21 show that with a 25% variation of the loop filter resistance, the PLL loop bandwidth varies from 4.5 MHz to 7.5 MHz when the V2I converters are not used (see Fig. 21(a) ). However, with the use of V2I converters, both the loop bandwidth and phase margin remain constant over a broad range of loop filter resistance variations (see Fig. 21(b) ).
A four-stage split-tuned ring oscillator composed of pseudodifferential delay cells, shown in Fig. 22 , is used in this design [27] . The delay cell is a current-starved inverter in which the current source is split into a 4X device and a 1X device controlled by coarse and fine control signals, respectively. The simulated coarse and fine gains of the VCO are 1200 MHz/V and 200 MHz/V, respectively, and the operating range is 50-1300 MHz. This wide tuning range is made possible by the ability to reduce the charging current to extremely small levels. A close 4 The output current of the two V2Is is a linear function of the input voltage, since the transfer gain is set by linear resistors R and R . look at the delay cell reveals that the effective output load capacitor (not shown explicitly in Fig. 22 ) is charged by the controllable current source, while it is discharged by the combination of the discharging current determined by the input voltage and the strength of the cross-coupled latch. Typically, at low-to-medium operating frequencies, the delay cell output is pulled down faster than being pulled up to the supply voltage. As a result, the output duty cycle is distorted due to grossly asymmetric rise and fall times. However, it is important to generate 50% duty cycle clocks to sample the input data in order to reduce the timing margin degradation in the multi-phase clock recovery scheme employed in this design.
A buffer circuit that corrects the asymmetric rise and fall times of the delay cell to achieve a 50% duty cycle output is shown in Fig. 23 . The pseudo-differential input ( , ) is buffered by dual-path differential-to-single ended amplifiers. The input capacitance of these amplifiers is minimized to reduce loading on the delay cell. Note that a tail current source is not used to improve the switching speed of the input differential pair. The amplifier outputs are then buffered by large inverters to drive the receiver samplers with fast rise/fall times. A feed-forward path consisting of a push-pull amplifier is used to suppress the duty cycle distortion due to changing inverter threshold voltages with process, voltage, and temperature variations. The simulated duty cycle of the VCO output is nominally 50% with less than 1% variation over the whole operating range.
VI. EXPERIMENTAL RESULTS
The test chip was fabricated in a 0.18 m CMOS process and the die photo is shown in Fig. 24 . A large portion of the PLL area is occupied by the capacitor of the integrator. The active die area is 0.8 mm . The die was packaged in a standard 64-pin TQFP plastic package. The packaged chip is attached to a 4-layer test board through a clamp screw that is used to mechanically press the package to force its leads to contact solder pads on the test board.
The recovered quarter rate data and clock with a 2 Gb/s input data is shown in Fig. 25 . The measured bit error rate (BER) is better than 10 . The measured tracking range of the CDR when the PLL reference clock is modulated with a 20 kHz triangular wave is better than 2500 ppm. This frequency tracking range is measured without any degradation of the BER from its nominal value of 10 . The jitter histogram of the PLL operating at 500 MHz shown in Fig. 26 , indicates an rms clock jitter of 5.3 ps. When the CDR loop is turned on, the recovered clock jitter degraded to 28 ps as shown by the jitter histogram in Fig. 27 . This rather large recovered clock jitter is because of a larger than expected PLL bandwidth. The measured PLL bandwidth was about 32 MHz, which is much larger than the target bandwidth of 6 MHz. Consequently, this large bandwidth does not sufficiently filter the shaped noise of the delta-sigma modulator, thereby severely degrading the recovered clock jitter. The complete performance of the prototype CDR is summarized in Table II. TABLE II  PERFORMANCE SUMMARY VII. SUMMARY A wide tracking range hybrid analog-digital CDR architecture that is capable of operating over a wide frequency range is presented. The proposed design incorporates several techniques: an analog PLL with split-tuning breaks the tradeoff between wide operating range and small VCO gain; a V2I converter minimizes the effects of loop filter resistor variations on PLL loop dynamics; and a delta-sigma modulator in the digital CDR loop truncates the filter output and shapes the quantization noise to high frequency, subsequently filtered by the PLL. Noise-shaping and phase filtering techniques are suitable to achieve precise phase and frequency resolution in digital CDR loops. The use of a second-order digital loop filter enables better than 2500 ppm of tracking range when the input data is modulated with a 20 kHz triangular wave. The proposed design techniques are validated by the prototype chip fabricated in a 0.18 m CMOS process. He is currently an Associate Professor of electrical engineering in the School of Engineering and Applied Sciences at Harvard University, Cambridge, MA. After a brief stint as a Senior Design Engineer at Accelerant Networks, Inc. in Beaverton, OR, he joined the faculty at Harvard as an Assistant Professor in January 2002. His research interests span several areas: high-speed, low-power link design; mixed-signal circuits for communications; ultra-low-power hardware for wireless sensor networks; and co-design of circuits and computer architecture for high-performance and embedded processors to address PVT variability and power consumption that plague nanoscale CMOS technologies. 
Un-Ku Moon
