An all-digital phase locked loop (ADPLL) with cascaded dynamic phase average (DPA) loop for wide multiplication range applications is presented in this paper. The multiplication factor can range from 4 to 65025 (255 x 255). The proposed architecture involves a minimum of hardware and improves jitter performance to reduce the noise and jitter associated with input reference. The dynamic phase averaging (DPA) loop control employing digital phase estimators (DPE) enhances frequency detection resolution and loop stability. A (Q.R) vector counter and an additional state counter serve as phase estimators. The proposed ADPLL includes cascaded DPA loops: the first stage is low frequency loop and the second stage is high frequency loop. A proto-type chip has been implemented with 0.18µm 1P6M CMOS process that can operate from 2MHz to 500MHz. The input frequency ranges from 5KHz to 50MHz. Thus it not only reduces the cost and design complexity of ADPLL, but also offers particular advantages for wide multiplication range applications.
INTRODUCTION
Numerous applications, such as video graphics card and telecommunication system, require a frequency synthesis which has high multiplication ratio. Quartz oscillators frequently require conversion when operating at low frequency. Several methods exist for realizing frequency multiplication: analog phase locked loop (PLL) [1] [2] and all-digital phase locked loop (ADPLL) [3] [4] . The major components of a PLL circuit, as illustrated in Fig. 1 , comprise a phase-frequency detector (PFD), charge pump, loop filter (generally a 2 nd order RC filter), a programmable divider, and a voltage controlled oscillator (VCO). While the PLL approach offers flexible frequency multiplication, it requires a complex sampled feedforward filter network and multistage inverse-linear programmable current mirror for constant loop dynamics that is independent of multiplication factor 1-to-4096 as indicated in [1] . Moreover, the locking time of analog PLL using PFD and a charge pump is limited by the large time constant of the analog filter. In contrast, ADPLL can provide fast locking owing to binary searching algorithm as indicated in [3] [4] . However, the specific transistor sizing of DCO in [3] come to be with changes in design specifications.
*This work was support by National Science Council of Taiwan, R.O.C., under Grant NSC93-2220-E-009-033.
Thus, efforts at the physical design level remain unsolved. A complete clock generator design using standard cell only as the IP block with portability in [5] [6] [7] can partially solve the problem. A portable clock multiplier generator using digital CMOS standard cells based on delay locked loop is presented in [5] .
However, its multiplication factor is limited to between 4 and 20. Additionally, three large register files are required for storing the history of previous 256 cycles. To generate low jitter clock output, two identical DCOs as utilized, as shown in Fig. 2 , and lead to high power consumption and large silicon area in [6] [7] . Thus, the work proposes an all-digital phase locked loop with a simple structure and a novel frequency control algorithm. Consider a typical all-digital phase locked loop that can be divided into five main parts: PFD, loop controller, loop filter, DCO, and programmable divider (see Fig. 2 ). The function of the programmable divider is simply to slow the DCO output frequency for comparison. The length of programmable divider will be very long and induces noise to DCO's output when multiplication factor is high. The loop controller generates the digital commands to track the DCO output clock based on the results from PFD. Two extra digital pulse amplifier circuits are required to minimize the dead zone of PFD, as indicated in [7] . An average loop filter is necessary to filter out the rippling and produce a smoother digital controlled word with less jumping. This requirement leads to a highly complex and expensive design. A simple method of solving the problem is based on power-of-2 integer operation; this not only simplifies the phase calculation but also significantly reduces circuit complexity.
This work proposes a salient cascaded DPA loop control algorithm for frequency search to simplify the hardware cost and enhance frequency detection's resolution. The proposed method can reduce the noise and jitter associated with input reference by dynamic phase averaging. Rather than using a PFD, a programmable divider and loop filter as in conventional approaches, a DPA loop controller and two digital phase estimators are applied. [7] . Figure 3 shows the proposed approach based on the phase domain operation with a newly proposed DPA loop controller [8] [9] . The result of the digital arithmetic comparator can be used to accelerate or slow DCO clock output. No additional need exists for another loop filter in the proposed structure because the DPA loop controller has achieved similar functionality. The proposed DPE and DPA algorithms for the all-digital phase locked loop have been verified in the 0.18-µm CMOS process with a frequency range of (2 ~ 500) MHz at 1.8V. This demonstrates the effectiveness of the proposed mechanism. The remainder of this paper is organized as follows. Section 2 describes the proposed architecture of cascaded DPA loop controller. It also describes the proposed DPE and cascaded DPA algorithms. Subsequently, implementation and chip simulation results of the all-digital phase locked loop are displayed in Section 4. Finally, Section 5 offers a summary and conclusions. Figure 3 shows the proposed block diagram of the DPA loop control. It consists of four main functional units: a state counter, (Q.R) vector counter, DPA loop controller, and DCO. This new approach does not require the programmable divider because the (Q.R) vector counter not only performs as a phase estimator of DCO output frequency but also works as a programmable divider. Similarly, the state counter performs as a phase estimator of the input reference clock. The DPA loop control algorithm performs adaptive bandwidth control based on average phase error as illustrated in Fig. 4 . During the frequency acquisition, adaptive loop gain control with binary search is applied to achieve fast locking. The state counter operates at the speed of the input reference clock. The counter counts up, initially from zero, at every rising-edge of input reference clock. Similarly, the (Q.R) vector counter operates at the speed of the DCO output clock. The value of the (Q.R) is compared with the input multiplication control word when the value of state counter is power-of-2 integer. The phase sampling period thus is power-of-2 input reference clocks. If the phase comparison remains unchanged, both counters will continue phase accumulation.
ARCHITECTURE AND ALOGORITHM

Architecture of Proposed Cascaded DPA Loop
Otherwise, the phase error signal will be transformed to change current DCO control word (DCW) that the DCO output frequency is adjusted. Meanwhile, both the state counter and (Q.R) vector counter are reset (i.e. zero phase). Therefore, the zero and averaging phase of both counters move according to phase sampling period and the result of phase comparison. Fig.4 . The proposed DPA loop control algorithm. Figure 5 illustrates the structure of the DPA loop controller, and the structure of (Q.R) vector is discussed in the following section. The decision unit performs the digital arithmetic comparisons and control signals for updating the DCW. The decision unit compares the (Q.R) vector counter based on phase sampling period with power-of-2 input reference clocks. The decision unit also controls the frequency acquisition process and fine tuning process. 
Structure of proposed DPA loop controller
Structure of (Q.R) Vector Counter
The length of the proposed (Q.R) vector counter is related to the multiplication control word (namely N1, N2) and the state counter's maximum number in each separate loop. If the maximum input multiplication control word is P, L is formulated as
where   
where T DCO_clk denotes the cycle time of DCO generated frequency. Fig.7 . Structure of (Q.R) vector counter.
Structure of Digitally-Controlled Oscillator (DCO)
High resolution DCO is the key component in low jitter frequency multiplier. To deal with this problem, novel DCV using NAND gates is proposed in [10] for portable delay cell design. It uses the gate capacitance difference of NOR gates under different digital control inputs to establish a digitallycontrolled varactor. The DCO is implemented with standard 0.18-µm 1P6M CMOS cell library. It is separated into two stages: a coarse-tuning stage and a fine-tuning stage. The higher seven bits of the control code are for the coarse-tuning stage, while the lower 9 bits are for the fine-tuning stage. The coarse-tuning stage includes 128 buffer stages for delay-chain selection. Moreover, the number of delay cell is through a 128-to-1 path selector. This selector is implemented using multistage tri-state buffers to reduce the loading effects of coarse-tuning buffers. The coarse decoder of the DCO decodes 7 (=log 2 (128)) bits control code into 128 control signals. This architecture enables the operating frequency of DCO to be easily modified to meet different specifications. The T PHL + T PLH (= T buffer ) of one coarse delay cell is around 135-ps in the 0.18-µm 1P6M CMOS standard cell library.
To improve the frequency resolution of the DCO, 512 digitallycontrolled varactors (DCVs) with capacitance difference ∆C is added following the coarse-tuning stage to increase the resolution. Different types of NAND gates are used. It equals 512 DCVs with capacitance difference ∆C in the fine-tuning stage. Therefore, the proposed NAND gate varactors for fine-tuning stage can improve delay resolution by 512 times compared with a simple buffer design.
SIMULATIONS RESULTS AND CHIP IMPLEMETATION
Post-layout simulations of proposed all-digital PLL are performed in 1.8V, and 0.18um CMOS 1P6M CMOS technology. Simulations have been successfully performed under different multiplication factors. For example, the input reference frequency is 4 MHz, and target frequency is 320 MHz (3.125ns). The multiplication factor is 80 (N=N1 x N2=8 x 10). A proto-type of all-digital PLL with cascaded DPA loop for wide multiplication range has been implemented with the above architecture and algorithm. This chip is designed in 0.18um 1P6M CMOS process. It is a cell-based design for fast tape-out to verify the proposed architecture and algorithm. Figure 9 shows the layout of our proposed low-cost clock multiplier. The chip's size is 935 x 935 um 2 (core: 340 x 340 um 2 ); it is I/O pad limited and the power consumption is 12 mw under 400MHZ.
CONCLUSION
An all-digital phase locked loop with cascaded dynamic phase average loop for wide multiplication range applications has been implemented in 0.18um 1P6M CMOS standard cell library. The proposed novel mechanism can be implemented with two (Q.R) vector counters, two state counters and two DPA loop controllers and two DCOs. Conventional frequency comparison is replaced with a digital arithmetic comparator in the proposed algorithm. The length of the (Q.R) vector counter is determined by the maximum multiplication control word and the number of states in each separate loop. The proposed approach does not need the phase frequency detector, loop filter, and programmable divider. Since all of the circuits can be built in all-digital and cell-based, it has better stability and portability than conventional approaches under different process. This work also designed a prototype chip. The frequency output of the prototype chip from 2MHz to 500MHz at 1.8V. The input frequency ranges from 5KHz to 50MHz. The multiplication factor can range from 2 to 65025 (255 x 255). The proposed all-digital phase locked loop can be treated as soft IP to accelerate turnaround time. Therefore, it is very suitable for system-on-chip applications.
