Abstract-A digital system's clocks must have not only low jitter, but also well-controlled duty cycles in order to facilitate versatile clocking techniques. Power-supply noise is often the most common and dominant source of jitter on a phase-locked loop's (PLL) output clock. Jitter can be minimized by regulating the supply to the PLL's noise-sensitive analog circuit blocks in order to filter out supply noise. This paper introduces a PLL-based clock generator intended for use in a high-speed highly integrated system-on-a-chip design. The generator produces clocks with accurate duty cycles and phase relationships by means of a highspeed divider design. The PLL also achieves a power-supply rejection ratio (PSRR) greater than 40 dB while operating at frequencies exceeding 4 GHz. The high level of noise rejection exceeds that of earlier designs by using a combination of both passive and active filtering of the PLL's analog supply voltage. The PLL system has been integrated in a 0.15-m single-poly 5-metal digital CMOS technology. The measured performance indicates that at a 4-GHz output frequency the circuit achieves a PSRR greater than 40 dB. The peak cycle-to-cycle jitter is 25 ps at 700 MHz and a 2.8-GHz VCO frequency with a 500-mV step on the regulator's 3.3-V supply. The total power dissipated by the prototype is 130 mW, and its active area is 1 48 1 00 mm 2 .
I. INTRODUCTION

H
IGH-SPEED digital systems may experience voltage supply variations as large as 10% in both their internal and I/O supply levels. As a result, supply and substrate noise are often the major causes of jitter in a phase-locked loop's (PLL) output clock [1] . This paper describes a high-performance clock generator designed for use in a high-speed system-on-a-chip (SOC) operating at frequencies exceeding 2 GHz [2] . In particular, this paper focuses on the design and implementation of the system's PLL and subsequent divider stage. The PLL achieves a power-supply-rejection ratio (PSRR) exceeding 40 dB while operating at 4 GHz. This high PSRR is achieved by means of a high-bandwidth voltage regulator that provides a clean, nominally noise-free supply to the PLL's sensitive analog circuit blocks. Previous work has shown that an effective method of minimizing a PLL's sensitivity to supply noise is to regulate the supply to the PLL's noise-sensitive analog circuit blocks. These works, however, have generally been limited to PSRRs of no more than 25 dB and maximum output frequencies of less than 2 GHz [1] , [3] . Publisher Item Identifier S 0018-9200(01)08222-1.
A specialized divider stage that enables robust accurate generation of quadrature clocks is also presented. The use of quadrature clocks enables some blocks in the SOC to operate at 2 the SOC's core frequency of 1 GHz, while avoiding the need to route a 2 clock throughout the entire chip. This approach results in a substantial reduction in both power and complexity, even though the maximum internal frequency of the PLL is 4 that in the SOC's core. The divider has as its primary advantage the ability to produce quadrature clocks that unconditionally maintain their phase relationship without the need for a dedicated reset signal.
II. PHASE-LOCKED LOOP AND DIVIDER DESIGN
A block diagram of the clock generator is shown in Fig. 1 . All digital blocks in the SOC, including those in the PLL, are powered using a 1.2-V supply. The regulator is powered using a dedicated 3.3-V supply that is shared at the board level by a portion of the SOC's I/O circuit blocks. The regulator filters this supply and produces a constant nominally noise-free supply of approximately 2 V. This supply is used to power the PLL's noise-sensitive charge pump and voltage controlled oscillator (VCO).
A. PLL Design
The PLL is based on the classic charge-pump topology [4] . The phase-frequency detector (PFD) employs a dual D-flip-flop topology and is powered by the SOC's 1.2-V supply. Because the charge pump (CP) and VCO are referenced to the 2.0-V regulated supply, a level-shifter circuit is used to translate signals from the 1.2-V domain into the 2.0-V domain [3] . In order to enable the PLL to operate over a wide range of input reference frequencies, the PFD and level-shifter blocks are designed to operate with inputs exceeding 300 MHz.
A schematic of the charge pump that follows the level-shifter is shown in Fig. 2 . Switches and control the current flow to the charge pump's output. In order to attenuate any switching errors that may reach the sensitive output node , these switches are placed on the source side of the current source devices, and . The dummy devices are employed to reduce both charge injection and clock-feedthrough errors. Devices and in Fig. 2 are employed to ensure a fast turn off of and [5] . The matching between and is further improved by balancing the loading on the charge-pump's control signals, UP and DN, and their complements. This is accomplished with the dummy loads indicated in the figure. The VCO, shown in Fig. 3 , consists of a three-stage singleended ring oscillator that is controlled using a current-mirror topology. A level-shifter circuit, following the ring oscillator, converts the ring oscillator's output into a full swing digital clock. Since the regulator provides a nominally noise-free supply to the VCO and because the desired output frequency range is large (800 MHz-4 GHz), cascoding is not employed in the current mirror. The simulated VCO gain is approximately 5 GHz/V. In order to enable the generation of quadrature clocks and to facilitate simple duty-cycle control, the VCO operates at a frequency 4 times greater than that of SOCs.
B. Quadrature Clock Generator
The PLL provides two 1-GHz 50% duty-cycle clocks, clk and clkq in Fig. 1 , that are phase shifted with respect to one another by 90 . As noted in the introduction, quadrature clocks simplify the generation of the local 2-GHz clocks that are required in sections of the SOC that are double-pumped in order to achieve high levels of microarchitectural performance. Each 2-GHz clock is generated by XORing clk and clkq at the output of the clock tree, before the latches. To ensure the correct sequencing of operations within the SOC, a phase shift as close to 90 as possible must be maintained between clk and clkq at all times, even before chip reset is deasserted during the first few clock cycles after powerup. As shown in Fig. 1 , both clocks are driven to all sections of the SOC through a balanced buffered H-tree. The quadrature clocks, clk and clkq, are produced by the divide-by-4 structure shown in Fig. 4(a) . The divider's operation is similar to that of a controlled five-stage ring oscillator. A timing diagram is shown in Fig. 4(b) . It is assumed that initially clk is high, clk_in is high and B, clkq, and D are low. When clk_in switches low, clk is driven low by the inverter following D, while clkq is maintained low by the inverter following B. At the next rising edge on clk_in, signal B is driven high. The subsequent falling edge of clk_in forces clkq to go low. When clk_in rises, signal D is driven low by the inverter driven by clkq. A 50% duty cycle for each output is achieved by keeping the ratio between the loading and drive strength at each node constant. The 90 phase shift is ensured by the structure of the divider and, as a result, a reset signal is not required. Moreover, due to its simplicity and the fact that a dynamic topology is used, the divider achieves a very high operating speed ( 6 GHz). Since this divider is placed at the output of the PLL's VCO, it always receives a clock.
III. VOLTAGE REGULATOR DESIGN
A block diagram of the regulator is shown in Fig. 5 . A bandgap generator, employing parasitic PNPs, generates a constant voltage reference. This reference is amplified by an operational amplifier (op amp) stage that produces the 2.0-V supply, . A second op amp stage is configured as a unity-gain buffer that simply tracks and provides power to the level-shifter circuit shown in Fig. 1 . Since the level-shifter circuit is the interface between the noisy 1.2-V digital domain and the nominally quiet 2.0-V analog domain, isolation between the two domains is improved by this partitioning. The current-generator block provides the nominally constant currents used to bias the op amps.
A. Op Amp Design
Each op amp is implemented using the folded-cascode topology in order to provide the largest possible bandwidth without consuming an excessive amount of headroom. A schematic is shown in Fig. 6 . To ensure that the fully differential cascode devices remain heavily saturated in the presence of supply noise, supply droop, temperature variations, etc., gain boosting is employed [6] . With gain boosting, the nominal simulated dc gain of each op amp exceeds 70 dB. The benefit of gain boosting is underscored by the fact that in its absence the simulated PSRR may degrade by more than 10 dB.
As illustrated in Fig. 5 , the outputs of op amps 1 and 2 drive the gates of the common source devices, and . The common source topology is employed to avoid further reduction in the final regulated output voltage. A primary tradeoff with this configuration, however, is the introduction of another pole and zero into each system's transfer characteristic. Stability over all operating conditions is ensured, in part, by the feedforward, compensation capacitors and . While capacitors and ensure adequate phase margin under typical operating conditions, this topology's phase margin will, nevertheless, vary as a function of the current drawn by the PLL. To ensure that the system remains stable over all possible operating conditions, the current source devices and , shown in Fig. 5 , are added to each stage's output. The current drawn by these devices is made to be inversely proportional to the current drawn by the PLL. As the PLL's output frequency decreases, the total current drawn by these current sources increases and vice versa. This approach effectively guarantees that the load driven by regulator remains relatively constant. In the worst case, this technique increases the total power dissipation by approximately 15%.
Each stage's transfer characteristic also contains a right-halfplane zero that further degrades the system's phase margin. Stability could be further improved by either removing this zero or by moving it into the left half plane. This could be accomplished by placing a resistor in series with each compensation capacitor [7] . This was not done in this design, however, since the system's phase margin is already sufficient, and because the implementation of this resistor would increase the overall complexity.
The source follower devices and in Fig. 5 are employed to isolate devices and from the 3.3-V supply. The gates of these devices are biased using a simple RC circuit, implemented with transistors. This structure further improves supply-noise rejection. The regulated output voltages are further decoupled from the supply by decoupling capacitors and , placed between the regulated outputs and the substrate. Simulations indicate that the regulator achieves a PSRR exceeding 50 dB under nominal operating conditions for a 500-mV voltage step on the regulator's 3.3-V supply and a total decoupling capacitance of approximately 1.2 nF. The minimum simulated PSRR over process, temperature, and supply is 43 dB. For a 50% reduction in the decoupling capacitance, the PSRR degrades by approximately 3 dB.
IV. EXPERIMENTAL RESULTS
The process technology used to implement the prototype offered both low-threshold devices, designed to operate at 1.5-V, and high-threshold voltage devices, intended for use in I/O circuit blocks. The majority of the PLL employs low-voltage devices, while the op amps, charge pump, and the current mirror section of the VCO use high-voltage devices. All capacitors, including the decoupling capacitors and in Fig. 5 and the PLL's loop-filter capacitors, are implemented with high-voltage devices, while and in the regulator are implemented using low-voltage devices. A powerup control circuit prevents these devices from operating beyond their tolerable limits. All resistors, unless noted otherwise, are implemented using nonsilicided p -poly.
The PLL's jitter performance was characterized using a high-speed sampling oscilloscope. The output of the PLL following the divide-by-4 is transmitted off chip by means of an open-drain 50-driver. This configuration allows signals up to 1 GHz to be measured. However, due to a mismatch with the board-level transmission line and the scope's terminating input impedance, jitter measurements were performed with a maximum VCO frequency of 2.8 GHz and an output frequency of 700 MHz. For the case of a nominally quiet supply, the peak-to-peak jitter is approximately 22 ps measured at an output frequency of 700 MHz and a VCO frequency of 2.8 GHz. A histogram is shown in Fig. 7 . This level of jitter is higher than expected, and is, in part, attributed to a large amount of noise contributed by the open-drain driver's external power supply.
The noise rejection performance of the prototype was measured using the on-chip noise generator shown in Fig. 8 . It is assumed that a sharp square wave on the regulator's supply voltage represents the worst-case switching noise generated by the SOC [3] , [8] . Since PLLs are sensitive to high-frequency supply noise, the noise generator is designed to ensure edge rates faster than the minimum output period of the VCO ( 250 psP) [3] , [9] , [10] . The PLL's output jitter increases to approximately 25 ps when a 500-mV step is injected onto the regulator's 3.3-V supply. This level of performance verifies simulation results and, as explained below, indicates a measured supply rejection of greater than 40 dB. Fig. 9(a) shows the measured output histogram in the presence of noise.
In order to determine the regulator's performance, the noise injected onto the regulated output voltage in Fig. 1 as the result of noise injected onto the regulator's 3.3-V supply would have to be measured. However, owing to the expected low amplitude and high-frequency content of this noise, only the dc value of was measured. To overcome this limitation, the regulator's PSRR is estimated using a combination of results from both measurement and simulation. Simulations indicate that a 1-mV step on the voltage regulator's output causes a 1-ps increase in the PLL's peak-to-peak jitter. This result, along with the fact that the amplitude of the injected noise is known, allows an estimate of the peak noise on the PLL's internal regulated supply to be made. The validity of this approach is based on the fact that a noise step injected onto a PLL's supply will cause an instantaneous shift in the PLL's output frequency, as well as longer term transient effects. Because of the PLL's relatively low loop bandwidth, it is assumed that the VCO's initial frequency shift is the same in both closed-loop and open-loop operation. For example, as described earlier, a 500-mV noise step on the regulator's supply causes approximately a 5-ps increase in the PLL's peak jitter. Using the 1-ps/mV relationship measured in simulation, the peak noise step on the PLL's internal supply is estimated to be approximately 5 mV. This yields an estimated PSRR of approximately 40 dB. Fig. 9(b) shows the PLL's output clock cycle variation, measured using a single-shot acquisition, when a noise step with an amplitude of 1.1-V and a frequency of 500 kHz is injected onto the regulator's supply. In this case, the peak cycle-to-cycle jitter, increases from 29 to 50 ps. Even with this large amount of noise, the regulator's supply rejection exceeds 37 dB.
The experimental prototype consisting of the PLL and regulator has an active area of 1.48 mm including a total decoupling capacitance of approximately 1.2 nF. The total power dissipation for a VCO frequency of 4 GHz and an input clock reference frequency of 33 MHz is approximately 132 mW. The regulator's drop-out performance is illustrated in Fig. 10 for the case when the PLL's VCO is operating at approximately 4 GHz. In this case, the minimum supply voltage for which the regulated output voltage is still constant is approximately 2.6 V.
A die photo of the experimental prototype is shown in Fig. 11 . Mirror symmetry was maintained in the differential regions of the chip in order to enhance the rejection of common-mode disturbances. Furthermore, digital circuits are physically separated from the analog circuits to the extent possible. The measured performance of the PLL regulator system is summarized in Table I. 
