A high-speed triangular-modulated spread-spectrum clock generator using a fractional phase-locked loop is presented. The fractional division is implemented by a nested fractional topology, which is constructed from a dual-modulus divide-by-(N-1/16)/N divider to divide the VCO outputs as a first division period and a fractional control circuit to establish a second division period to cause the overall fractional division. The dual-modulus divider introduces a delay-locked-loop network to achieve phase compensation. Operating at the frequency of 3.2 GHz, the measured peak power reduction is around 16 dB for a deviation of 0.37% and a frequency modulation of 33 kHz. The circuit occupies 1.4 × 1.4 mm 2 in a 0.18-μm CMOS process and consumes 52 mW. key words: spread spectrum clock generation, fractional phase-locked loop, delay-locked loop, phase compensation, fractional divider 
Introduction
In electronic products, electromagnetic interference (EMI) is a real issue that must be dealt with to meet the maximum allowed regulated level [1] . The most obvious way to reduce the electromagnetic energy radiation from a product is to add the shielding of the box in which the electronic circuits are placed. With this method, the hardware and assembly increase cost to the product. Another less obvious method for high-speed systems, called spread spectrum clock generation (SSCG), is to modulate the system clock of the computing devices such that the radiated power level in a given bandwidth is lowered. This technique is effective and popular since the system clock is one of the major contributors in EMI and the cost to the system is minimal.
SSCG is a special case of frequency modulation [2] . The basic idea of SSCG is to slightly modulate the frequency of clock signals and the energy of the signals will be dispersed to a controllable small range. With spread spectrum modulation, the energy peak in the spectrum is reduced. Accordingly, SSCG can offer low EMI signals. For example, a spcific application is used in serial ATA [3] , which employs a triangular-modulated down-spread spectrum technique with frequency deviation of less than 0.5% to the clock generator. The common technique to produce SSCG is to apply and insert modulation into a phase-locked loop (PLL). The frequency can be modulated by imposing a signal on the control node of a voltage-control oscillator (VCO) [4] , [5] , or using a fraction-N technique to change the divider ratio to produce the modulation [6] - [10] . Usually, an oversampling Δ-Σ modulator can be used to interpolate the control signal of the programmable divider [11] , [12] . Although this is a commonly used method for integrated applications, the design complexity is considerably increased.
In this work, a nested fractional divider is adopted in a PLL to perform modulated function for SSCG applications. In a coarse fractional part, a dual-modulus divide-by-(N-1/16)/N divider divides the VCO outputs as a first division period. The method employed in the digiphase synthesizer to reduce the periodic tones is the phase error cancellation. The phase compensation is made of a delay-locked-loop (DLL) structure and carried out before the phase-frequency detector [13] . The fine fractional part establishes a second period of multiple first period, provides a selected number of fractional control signals to cause the overall fractional division.
System Configuration

Traditional Fractional-N Frequency Synthesizers with Δ-Σ Modulation
Since the required frequency deviation of the SSCG is small, it results in the narrow channel spacing (e.g. reference frequency) for the conventional integer-N PLL-based synthesizers. As a result, a popular SSCG configuration is based on a Δ-Σ modulated fractional-N frequency synthesizer, such as shown in Fig. 1 [6]- [10] , with a periodic modulation profile. A Δ-Σ modulated technique is employed to modulate the integer-N divider and produces a triangular waveform with a small deviation as a control signal. Using the Δ-Σ modulator, the high resolution with a given reference frequency can be easily achieved with a reasonable control signal.
PLL Using a Nested Fractional Topology
In this work, the fractional divider comprises a coarsefractional counter (CFC) programmed to count first periods, and a fine-fractional counter (FFC) to provide the selected number of fractional control signal upon receipt of the CFC.
Copyright c 2008 The Institute of Electronics, Information and Communication Engineers The CFC and FFC using a nested topology of two fractional counters mean to provide needed fractionality. Shown in Fig. 2 , the coarse-fractional divider is used to divide the VCO output signals by (N-1/16) or N. The CFC divides the output signals which represent a first division period. The FFC establishes a second period of multiple first periods and at the terminal count of each second period provides a selected number of fine-fractional control signals to the coarse-fractional divider control to cause division by a different number. The FFC comprises two counters: a divide-by-16 counter and a divide-by-F counter, where F is programmable and smaller than sixteen, i.e., F ∈ {0, · · ·, 15}. The combination of these counters in accordance with one aspect of the fractional PLL results in a (1-F/256) fraction of the reference frequency as is described below. Both divide-by-16 counter and divide-by-F counters receive the signal, f d , from the dual-modululus divide-by-(N-1/16)/N prescaler. By clocking the fine-fractional counters, a second period has been created which is longer than the basic division period by sixteen times. The provision of divideby-(N-1/16) pulses during the second period results in the fractionality. After providing its programmed number of pulses to the prescaler, the divide-by-F counter waits until it is preset by the divide-by-16 counter to once again provide its programmed number of pulses. Thus the divide-by- 16 counter operates as the denominator and the divide-by-F counter operates as the numerator of a fraction [14] .
As long as the divide-by-F counter has not yet counted down to 0, the prescaler is dividing by N-1/16. The divideby-F counter will therefore step down to 0 when the VCO has generated F × (N-1/16) pulses. At that moment the divide-by-16 counter has step down by F counts; that is, its content is 16-F and the scaling factor of the dual-modulus prescaler is now switched to the value N. An expression applicable to the fractional PLL where the divider divides by N-1/16 and N is the following:
where N div is the resulting divisor that is a fractional value. In this case, the fractional part of the division ratio is equal to the input of the FFC. A triangular frequency modulation profile drives the fractional divider for a down-spreading clock generation, and the conceptual illustration is displayed in Fig. 3 , where f m is the modulation frequency. The frequency deviation can be represented by:
where F max denotes the maximum value of F.
Linear Analysis
Like any Δ-Σ modulated PLL, the output of Fig. 2 exhibits quantization phase error. The proposed spread-spectrum configuration has first-order modulated characteristic. The equivalent divider with fractionality can be modeled as an ideal fractional divider plus a quantization noise source, as shown in Fig. 4 . The transfer function from this phase error source to the VCO output is given as,
where φ q denotes the input quantization noise and φ o the output, F(s) is the impedance of the loop filter, K PD is the gain of the phase detector and charge pump, and K VCO is the VCO sensitivity. Note that other noise sources are not shown in Fig. 4 . The transfer function for quantization exhibits a low-pass characteristic. Therefore, quantization noise outside the loop bandwidth can be attenuated by the low-pass filtering function of the loop. Table 1 summarizes the comparison between Fig. 1 and Fig. 2 . With the same specifications of the SSCG, the scheme of Fig. 1 desires a 8-bit Δ-Σ modulator to satisfy the requirement in Eq. (1) while resulting in a lower reference frequency than Fig. 2 . If a first-order Δ-Σ modulator is employed in Fig. 1 , the main divider is a dual-modulus divider with a division ratio of M-1 and M. As can be seen, M is equal to 16N in this case. Compared to the scheme in Fig. 1 , the modulator in Fig. 2 merely consists of a 4-bit divider and a divide-by-16 divider which are simplified as well. However, it needs more complex realization for the main divider, as shown in Fig. 5 . Thus, the overall circuitry area may not be the main contribution. The important features of the proposed SSCG scheme can be reflected on the performance of the PLL. First, since the loop bandwidth generally is limited by the reference frequency at the cost of slow tuning speed, enlarging the reference frequency can get a wider loop bandwidth and faster switching speed. As a rule of thumb, the loop bandwidth should be at least 10 times lower than the reference frequency to avoid the sampling effect of the PFD [15] . On the other hand, the speed can be also increased by a large reference frequency even if the same bandwidth is given. Second, since the quantization noise is shaped by the oversampling frequency, i.e., the higher reference frequency, the quantization noise can be more attenuated by the filtering characteristics as mentioned previously. Inside the loop bandwidth, the quantization noise is low enough not to affect the overall phase noise due to noise shaping. The key to the loop bandwidth is to ensure that quantization noise at high frequency is sufficiently attenuated. In the case where a large reference frequency is introduced, the energy in the quantization noise can be moved to higher frequencies and be filtered thoroughly by the PLL.
Summarized Features of the Proposed SSCG
Main Circuit Descriptions
Divide-by-(N-1/16)/N Dual-Modulus Divider
The divide-by-(N-1/16)/N dual-modulus divider is based on the circuit in Fig. 5 . It consists of a divide-by-(N-1)/N dualmodulus integer divider, a 16-phase phase generator, a phase selector, and a 4-bit accumulator. Depending on the logic value at MC, the division ratio is N-1/16 (MC = 1) or N (MC = 0).
If MC is low, the division ratio of the divide-by-(N-1/16)/N divider is N and the control bits are constant in the phase selector, and thereby the phase selector simply picks one of its sixteen input signals. The resulting output frequency is thus a factor of N. The divide-by-(N-1) operation in the divide-by-(N-1)/N divider is enabled by the overflow of the accumulator. If the input MC is high, the phase control block is now working. A detail time diagram of the phase interpolation and tuning is depicted in Fig. 6 . The output frequency, f d , and the VCO frequency, f vco , are related as follow,
Thus, the period T d can be calculated as,
where T vco is the VCO period. The instantaneous timing error due to the divide-by-N is determined by
Similarly, the instantaneous timing error due to the divideby-(N-1) is determined by ΔT
Since the timing error sequence can be predicted from the accumulator, the timing correction is possible if right phase is added with opposite direction of timing sequence. Note that N is designed to 16 in this work.
Phase Generator
A DLL is a circuit which synchronizes the output to its input. It consists of a phase detector (PD), a charge pump (CP), a loop filter (LF), and a voltage-controlled delay line (VCDL). The VCDL is controlled by a filtered control voltage and the adjusted output clock is fed back to the PD. In many applications, a DLL can be used to provide evenly spaced clock phases in the delay line. If the input frequency of the DLL is the VCO frequency, however, it is difficult to implement such a high-speed loop with low power consumption even high-resolution requirement. The DLL has its time resolution limited to the unit cell delay. As the demand for higher resolution grows, faster technologies should be used but rather expensive. To overcome the resolution limit, one way using two uniformly DLLs can increase the resolution to a fraction of the intrinsic gate delay. In Fig. 7 , the DLL1 and DLL2, made with the same number of delay elements but different time interval, are used to precisely generate the required offset. In order to increase the resolution of the phase generator, the offset between DLLs should only be a fraction of the delay of the basic cell. An arrangement like the one in Fig. 7 , due to symmetry of the array, is duplicated by the delay cell in DLL2. Through the feedback operation, in DLL1 the closed loop tends to insert a delay time of 4T VCO between two inputs for clock synchronization while DLL2 with a 7/8 delay of DLL1. Each DLL generates 8-phase outputs through eight delay cells in the VCDL. When correctly locked, the delay of the unit delay cell, t da and t db , in DLL1 and DLL2 can equal to T vco /2 and 7T vco /16, respectively. The time bin of such a circuit is
Although the scheme of Fig. 7 can provide the required bin size, how to choice the signals with the needed phases is a more practical work. The time delay of each node related to the input (t 0 ) is represented by t n , which equals to n·t bin . The signals at each node with continued sequences can be found if n ∈ {42, 43, · · ·, 70}. In this work, the 16-phase signals, t 49 , t 50 , · · ·, t 64 , are chosen as the outputs of the phase generator because they are driven by the same delay cell (d b ) and perform better matched. In addition, the delay cells driving the phases t n for n > 64 can be used as dummy elements with a virtual supply voltage to save more power dissipation. The 16-phase output waveforms provided by the phase generator are shown in Fig. 8 .
Divide-by-15/16 Dual-Modulus Prescaler
The dual-modulus prescaler is the high-frequency building block in the PLL. This circuit shown in Fig. 9 divides the frequency of the VCO output signal by a factor of 15 or 16 depending on the logic value of the control mode (MC1) [15] . It consists of a synchronous divide-by-3/4 counter as the first stage and an asynchronous divide-by-4 counter as the second stage. The circuits in the first stage are fully differential, while the single-ended logic circuits are used in second stage. To reduce the supply noise, an emitter-coupled-logic (ECL)-like differential logic is used in the high-speed stage [16] . The toggle flip-flops are made by true single-phase clocking (TSPC) DFFs of [17] .
VCO
The VCO scheme is shown in Fig. 10 , which is a push-pull LC oscillator. The double cross-connection of an NMOS and a PMOS differential pairs in positive feedback generates a negative resistance to compensate the parasitic parallel resistance of LC tank for oscillation to occur.
Experimental Results
The proposed SSCG circuit was fabricated in a 0.18-μm Nwell CMOS technology. Fig. 11 shows the microphotograph with a chip area of 1.4 × 1.4 mm 2 . This circuit is fully integrated and operates under a 1.8-V supply voltage. The modulation frequency ( f m ) is 33 kHz. With a reference frequency of 200 MHz, the PLL provides a 3.2-GHz output frequency. Figure 12 shows the measured spectrums of the 3.2-GHz output signals without and with the spread ratio of −0.37%. The peak amplitude reduction can achieve around 16 dB. In addition, the measured waveforms are shown in Fig. 13 . As can be seen, the measured jitter performance of the output clock without spreading is shown in Fig. 13(a) , rms jitter of 3.5 ps and peak-to-peak jitter of 19.9 ps. After spread-spectrum operation, the spread clock has rms jitter of 7.5 ps and peak-to-peak jitter of 43.8 ps shown in Fig. 13(b) . Table 2 shows the performance summary of the proposed SSCG circuit with several comparable over the past years.
Conclusion
In this paper, the fractional-PLL-based SSCG circuit with triangular modulation on the LC VCO fabricated in a 0.18- μm CMOS process is presented. The triangular modulation signal is integrated into the fractional divider of the PLL, using a nested fractional topology (CFC and FFC) to achieve the required fractionality. The CFC and FFC are provided for controlling the PLL to synthesize a frequency which is a fraction of the reference frequency. The measured spectrums show that clocking peak amplitude is attenuated and the proposed architecture does achieve the spread-spectrum function as expected.
