This paper presents a DLL based clock multiplier with a novel spur reduction technique. By randomly selecting delay line with pseudo random number generator (PRNG), the proposed scheme reduces the output spur due to delay cell mismatches. Rotational digitally controlled delay line (DCDL) is also proposed for seamless generation of clock edges even at random delay line switching. The clock multiplier is designed in 0.18 µm CMOS process and achieves 5∼11 dB reduction of spur while consuming 169.4 µW for 16 MHz. The core area is 0.608 mm 2 .
Introduction
The clock multiplier is an important building block in clock/frequency generation in wireless communication systems and System-on-chip (SoC) [1, 2] . A phase locked loop (PLL) is widely used architecture for clock multiplication. However, PLL is basically two pole system, one from the loop filter and the other from the oscillator, and so there is a tradeoff between bandwidth and stability. Due to the limited choice of loop bandwidth, the PLL has an inevitable issue of jitter accumulation [3] . On the other hand, the delay locked loop (DLL) is a single pole system since the loop filter voltage instantly change the output delay. Compare to PLL, therefore, the DLL typically has less jitter accumulation and less settling time. [4, 5] .
The DLL-based clock multiplication uses an edge-combining technique [6] from multiple clock edges at the delay chain. Therefore, any mismatches in delay cell due to process variation and inadequate layout design cause spurious tone at output [7, 8] . Assuming the delay of one delay cell is longer than that of the others, this delay difference appears in the output waveform with a period of the reference clock. In frequency domain, the spur will appear at the offset frequency corresponding to the harmonics of the reference frequency around the output frequency. Static phase offset due to current mismatch of charge pump in analog DLL also contributes to the spur level.
Several researches have reported spur reduction techniques for DLL-based clock multiplier [7, 8, 9] . In [9] , a current-splitting charge-pump is adopted to reduce the reference spur by the static phase offset, resulting reference spur level below −39.2, −36.8, and −26.2 dBc for Â2, Â3, Â6 clock multiplication, respectively. In this paper, we introduce delay line scrambling technique to reduce the spur levels due to delay cell mismatches. This paper is organized as follows. In section 2, we describe the proposed DLL-based clock multiplier and circuit implementations. The section 3 and 4 present implementation results and conclusion, respectively.
Proposed architecture
The block diagram of the proposed clock multiplier is shown in Fig. 1 . Reference clock (f ref ) is injected to rotational digitally controlled delay line (DCDL) for multiple clock edge generation. Pseudo random number generator (PRNG) based DCDL decoder randomly selects delay line in rotational DCDL to shape delay cell mismatches. Matrix switch performs reordering of output phases for edge combiner which finally generates multiplier clock, f out . For Â16 clock multiplication, 16 output phases (È o1$16 ) are applied to edge combiner. Phase output È o17 exhibits the same phase of È o1 , and is used for phase detector (PD). PD [10] generates 1b up/ down signal for digital loop filter (DLF) operation. This digital DLL implementation eliminates the charge pump and minimizes the spur from the static phase offset.
Rotational DCDL and matrix switch
The proposed rotational DCDL consists of a ring-shaped rotational delay line composed of 18 delay cells, 18 branch delay lines composed of 4 delay cells, and control switches, as shown in Fig. 2(a) . Each delay cell is controlled by the control voltage from 12b digital to analog converter (DAC), described in the section 2.3. Switch control signal (Sel DCDL ) from DCDL decoder decides f ref injection points and one of the 18 branch delay line. For example, if f ref is applied to the input of the 3 rd delay cell (d 3 ), the 15 th branch delay line (d 15 1$4 ) is selected for total 17 output clock edge generation ( Fig. 2(b) ). If DCDL decoder selects the 10 th delay cell (d 10 ) for f ref injection point, the 4 th branch delay line (d 4 1$4 ) is used for output phase generation. Therefore, total 18 delay line configurations are possible with designed rotational DCDL. Note that unselected branch lines are turned off to save power consumption.
Random selection of one of 18 delay lines scrambles mismatches in delay cells, and consequently reduces the spur level at output. If 12 th delay cell (d 12 ) has a All the outputs of delay cells in DCDL are connected to matrix switch since the points at which the clock edges are output are different for each delay line selection. With a control from matrix decoder, matrix switch rearranges and outputs 17 phases for each selection. For example, in case of Fig. 2(b) , output of d 3 and d 15 4 are used for È o1 and È o17 , respectively, while output of d 10 and d 4 4 are used for È o1 and È o17 , respectively, in case of Fig. 2(c) . 
PRNG
The PRNG consists of 8b linear feedback shift register (LFSR) for sufficiently long pseudo random codes, and delta sigma modulator (DSM) for output bit conditioning and further randomization (Fig. 4) . The PRNG output (OUT PRNG ) has a digital output of 5b for total 18 delay lines selection. To provide only 18 selection code out of 32 possible binary 5b representation, 18 level quantization (right side table in Fig. 4 ) is adopted at DSM quantizer. Fig. 5(a) shows the DLF that consists of 12b integrator, a binary-to-thermometer code converter (B2T) and DSM. After integrating 1b PD output with 1 MHz (= f ref ) clock, the 4 most significant bits (MSBs) of 12b integrator output are directly applied to DAC for coarse tuning. Next 4 bits are converted to thermometer code to improve overall linearity of DAC, and used for fine tuning. The 4 least significant bits (LSBs) are converted to 1b stream with DSM with 16 MHz clock (= f out ) to further improve DAC resolution. Circuit diagram of DAC is shown in Fig. 5(b) .
DLF and DAC

Post layout results
The proposed clock multiplier was designed in a 0.18 µm standard CMOS process and post layout verification was performed after RC parasitic extraction. Output frequency is 16 MHz with 1 MHz reference clock for low power sensor SoC application. Fig. 6 shows a chip layout which has a core area of 950 m Â 640 m (= 0.6 mm 2 ). The total power consumption is 169.4 µW. Fig. 7 shows the output spectrum of the proposed clock multiplier. For performance comparison, intentional 1.5% mismatches are applied to delay cells, which generates −31.8 dBc worst case spur. In order to verify the validity of the proposed scheme more accurately, DCDL noise component obtained from phase Fig. 8 ) is also included. The result shows that spur levels due to the delay cell mismatch are improved by 5$11 dB and now close to the spur levels without mismatch that is limited by the coupling effect from device parasitic capacitance in multiplexer switches for f ref injection. The worst case spur level is improved from 31.8 dBc to 38 dBc. The RMS and peak-to-peak jitter are 1.1 ns and 21.5 ns, respectively. In order to apply the proposed technique to the conventional edge-combining DLL-based clock multiplier, PRNG, decoders, the matrix switch and branch delay lines are additionally required. However, additional power consumption is only 14.78 µW (8.94 µW from the matrix switch and 5.84 µW from PRNG & decoders). Since only 17 delay cells are turned on for each delay line selection, branch delay lines does not consume any additional power. The layout areas of the matrix switch, PRNG & decoders, and branch delay lines are 0.05 mm 2 , 0.03 mm 2 , 0.182 mm 2 , respectively. Table I summaries the performance comparisons with the state-of-the art edge combining DLLs. 
Conclusion
This paper presents a novel architecture of DLL-based clock multiplier for spur reduction. The rotational DCDL and PRNG based randomized delay line selection technique are proposed to reduce the spurs from delay cell mismatches. In the postlayout verification, the proposed technique achieves 5$11 dB spur level reduction including 6.2 dB improvement from the worst case spur. The proposed clock multiplier is implemented in 0.18 µm CMOS process, occupies 0.608 mm 2 and consumes 169.4 µW for 16 MHz generation. 
