Abstract-A new portable clock generator with full pull-in range and fast acquisition is presented in this paper, where it can be developed at hardware description language (HDL) to reduce design cycle as well as improve system-level integration simulation. In the proposed design, frequency tracking is performed by the "Prune-and-Search" algorithm, and the digital-controlled ring oscillator is constructed by CMOS standard cells. In order to reduce propagation delay of the loop divider, a novel structure is developed to provide a constant delay at any divider setting. In addition, input jitter can be isolated to avoid coupling by digital processing. Hence, the generated clock output becomes more clean and robust. Based on the proposed methodology, a test chip has been designed and verified on 0.6-m CMOS process with frequency range of (360 800) MHz at 3.3 V and peak-to-peak jitter of less than 60 ps at 800 MHz/3.3 V.
I. INTRODUCTION
W ITH THE increasing performance and decreasing cost of digital VLSI chips, many more requirements and constraints, such as I/O characteristics, off-chip signaling, PCB complexity, and process migration, must be considered in system-level designs. For advances and improvements of digital VLSI process and design technology, all-digital approaches have the capability to meet the above requirements of computer and communication applications in recent years. In modern digital ASICs, an on-chip high-speed clock synthesizer is important and its design issues can be identified as: 1) wide operating frequency and voltage ranges; 2) low jitter characteristics; and 3) short system turnaround time. Hence, the characteristics of a clock generator must not be limited for a target process technology and should be flexible enough to meet different system specs, such as various operating frequencies, process changes, and integrated simulations, etc. In this paper, a portable solution for a frequency synthesizer-based clock generator with both large pull-in range and fast acquisition is presented. The portable design is an important issue in current digital VLSI because of foundry independence and reusability. For synchronous VLSI designs, very often a phase-locked loop (PLL) circuit is exploited to provide an on-chip high-speed clock. Generally speaking, the key characteristics of these PLL-based frequency synthesizers are: 1) pull-in range and 2)
Manuscript received January 2000; revised March 2001. This work was supported by the National Science Council of Taiwan, R.O.C., under Grant NSC89-2215-E-009-053. This paper was recommended by Associate Editor S. Sriram.
The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: cylee@cc.nctu.edu.tw).
Publisher Item Identifier S 1057-7130(01)05230-2.
output jitter. Many PLL/DPLL-based designs were developed in the past [1] - [4] which were mainly based on analog LF and current-controlled/CSA ring oscillators to improve the performance of the PLL. Intensive circuit simulation and layout design to overcome process, voltage, and temperature (PVT) variations should be taken into account before an acceptable performance is reached. This results in a long design cycle as the design moves to different foundry processes or design specifications are changed, even though the remaining digital parts are easily portable. Recently, an all-digital PLL (ADPLL) has been proposed [5] , where the oscillator is controlled by digital commands. The rest of the controller parts can be designed at the HDL level. However, intensive simulation still has to be conducted to ensure that target frequency band remains achievable under PVT variations. Then, based on circuit simulation results, the corresponding oscillator's behavioral model can be described at the HDL level for ongoing design procedure. Specific transistor sizing and layout design of the oscillator are required as the design specification or process changes. Thus, efforts at physical design level remain unsolved. In order to cope with such an issue, a novel approach is developed to make use of the advantages of digital VLSI for a true portable and cost-effective ADPLL-based clock generator solution in this paper. It is different from conventional approaches in the tracking and locking mechanism. In order to operate at a higher speed, a matrix structure for a digital-controlled ring oscillator based on standard CMOS inverters is proposed to ensure that the clock generator can be up to several hundred megahertz. This approach also belongs to IP-based designs and has been verified on silicon using an in-house 0.6-m SPTM CMOS cell library through synthesis tools. According to measured results, it can generate any number of frequencies with the output range from 360 MHz to 800 MHz under 1-ps resolution with an output jitter of ps and ps @ 800 MHz/3.3 V. The supplied-voltage range is V. The rest of this paper is organized as follows. The algorithm for the clock generator is first addressed in Section II. The proposed architecture for portable design is presented in Section III. Circuit designs and analysis of the cell-based ring oscillator are given in Section IV. Implementation and measurement results of the proposed solution are then described and discussed in Section V.
II. THE PROPOSED ALGORITHM
The objective of an all-digital frequency synthesizer-based clock generator is to improve turnaround time as well as integration simulation for system-level IC designs. In this way, a 1057-7130/01$10.00 ©2001 IEEE 
A. Frequency-Search Algorithm
In all digital designs, the frequency-search process is like a searching problem in a database. It is assumed that all available frequencies are a set of and a target frequency is within that set. In order to improve searching efficiency, it only uses the Prune-and-Search algorithm [6] to find a possible solution of a target frequency. In order to trade off both the miss probability and the efficiency of the searching process, a searching window for frequency error must be created to dynamically control available searching range in the Prune-and-Search algorithm. Hence, the searching range is not only dependent of the algorithm, but is also determined by frequency error. In other words, if the frequency error is greater than the searching window, the searching range must be increased to make the searching process more efficient and to ensure that frequency acquisition can be completed. When frequency error is small or the output frequency is approaching a target frequency, the searching range must be limited to make sure that the searching process does not miss its target. Note that a large searching range implies higher searching efficiency, but the miss rate of the frequency search may increase.
It is assumed that a default clock is generated by the clock generator and is equal to . The target clock is denoted by within and the reference clock is where and are programmable constants. If the difference between and is detected, then the status, fast/slow, of a default clock is known. In the next searching cycle, the new frequency of the default clock is changed to and the same searching procedure will be performed again. So the target frequency is equal to (1) Finally, frequency-search schemes are classified into two parts: a coarse search to reduce searching complexity and a fine search to improve searching stability. Both of them belong to the Prune-and-Search algorithm. The worst-case time complexity of a frequency-search algorithm is equal to (2) where is the total number of steps in a coarse search, is the complexity of the last step, and and is the complexity of Prune-and-Search algorithm for fine and coarse searches, respectively. The overall time complexity for the condition of within is also equal to (2) . So the lock-in time is equal to , where is a constant for frequency-search loss.
B. Pull-In Algorithm
Different from conventional approaches, both the phase detector and loop filter are not included in our proposal. To ensure that a target frequency becomes more accurate and robust in applications, a novel scheme is developed to improve both stability and accuracy of the output frequency, where it induces a locking window to meet design requirements. If the difference between and is smaller than the locking window, a target frequency is found. In noisy environments, such as high input jitter, the site of a locking window will be dynamically shifted, which will influence the searching process to determine correct status, as shown in Fig. 1 . In order to cope with this imperfection, it is necessary to ensure that the size of a locking window can tolerate reference variations. Generally speaking, the size of the locking window is proportional to the jitter range. It implies that we need a large locking window in a high-jitter environment to maintain stability; however, accuracy of the target frequency will become degraded. A possible solution to trading off jitter and accuracy is to control the searching procedure to ensure that the target frequency is located at the center of the locking window. 
C. Propagation-Delay Issue
If the target frequency is much higher than the reference clock, the propagation delay of loop divider must be considered carefully to minimize frequency errors. The default clock is rewritten as (3) where , , and are constants, and is a frequency error. If , is equal to . From (3), it can be found that a long wordlength counter is now replaced by three short wordlengths ( , and ) counters. As a result, the propagation delay can be reduced to an acceptable range through decomposition.
III. THE PROPOSED ARCHITECTURE
A block diagram of a double-ring oscillator-based clock generator is shown in Fig. 2 . Its structure is similar to a traditional frequency synthesizer, where an extra path is added to generate the target clock. However, a completely different searching mechanism is developed here. Four major functional modules are allocated in the design:
(1) clock pair for frequency and output generation; (2) a frequency-search unit used to search a target frequency and determine estimation status; (3) a clock controller used to adjust control timing; (4) an offset estimation unit used to minimize mismatch in the clock pair. The "divider N" is used not only to slow down the reference frequency but also to generate control signals for search processes as shown in Fig. 2 . Overall operations are classified into two processes: 1) an active process for cyclically searching a target frequency and 2) preset process for restoring commands. Due to physical constraints of setup time and hold time, the decisions of the controlled status must be made carefully. Hence, a "divider M" is used to slow down the target frequency as well as to enlarge the frequency error. Then all of them can fit physical constraints, and the threshold (the size of a locking window) can be accurately determined without timing violation too. In order to compensate for temperature and supply voltage variations, the searching procedure is activated continuously (recycled) whether a target frequency is found or not. So, in Fig. 2 , ring oscillator #2 is cyclically restarted to search locked commands of a target frequency, and ring oscillator #1 is continuous to the output of a target clock based on searching results from ring oscillator #2. On the other hand, ring oscillator #2, which is operated at a burst mode to keep the searching procedure accurate and to find searching commands, cannot generate a continuous output for target clock. As a result, the output of clock generators can become more stable during the frequency search.
The state transition diagram of the frequency search is shown in Fig. 3(a) and its structure is shown in Fig. 3(b) . The ring oscillator #1 will only change its frequency when the setting of a target frequency is determined and found by ring oscillator #2. Because there are two equivalent ring oscillators in the proposed architecture, an offset estimation is essential to measure mismatch between two oscillators, caused by foundry fabrication. In other words, to overcome the mismatch problem from foundry fabrication, an offset calibration is performed to reduce frequency errors between two equivalent oscillators before beginning the search for a target frequency. Then the locked commands are equal to summations of an offset and searching commands (ideally, searching commands are equal to locked commands) as shown in Fig. 3(c) . If the temperature offset is taken into account, the offset calibration must be conducted to ensure successful locked status.
Other issues, such as compensating for process variations and preventing oscillator frequency discontinuity, are handled during frequency band allocation. As shown in Fig. 4 , all subbands of ring oscillators are overlapped. The degree of the overlapping, which can be determined at the HDL level, is dependent on process parameters. For example, if the process variation is about 10%, the degree of the extended region is set to twice this variation (20%). Although this frequency-allocation scheme will increase missing probability (lock-in time), it can overcome process variations. Hence, an overall bandwidth of ring oscillators is always continuous and a wide frequency band can be achieved also.
IV. CIRCUIT DESIGNS

A. Digital-Controlled Ring Oscillator
A cell-based digital-controlled oscillator is shown in Fig. 5 . It is a matrix structure based on CMOS inverters (inverter matrix), whose characteristics are determined by both the combination and number of inverters. Note that the dynamic control range is determined by the combination of inverter matrix, and the frequency resolution is dependent of scales of the parallel-inverter structure (delay bank). Both the inverter matrix and delay bank are allocated to perform coarse and fine searches, and the resolution of the clock generator is determined by the minimum scale of the delay bank. By means of operating different numbers of inverters in parallel-inverter structures (delay banks), the proposed ring oscillator can generate different frequencies. The delay bank in Fig. 5 is analyzed by an RC model [7] , whose delay scale for the mth delay path can be expressed by (4) where is the output resistance, is the gate capacitance, and are the MOSFET width and channel length, respectively, is one of delay path, and is the total number of paths in the delay bank. Hence, the delay matrix owns paths to pass signals with different delays. By properly assigning and in (4), a linear delay model can be obtained as shown in Fig. 6 . To ensure the proposed architecture can generate any target frequency, both temperature and voltage variations have to be taken into account. Equation (4) also implies that DCO resolution is proportional to a factor of inverter size . By SPICE circuit simulation, the proposed DCO with delay bank, , can provide 1-ps resolution for an in-house 0.6-m SPTM CMOS cell library. From [7] , it is shown that the output frequency of inverter chains is dependent of and of inverters. However, inverter delays are functions of temperature, supply voltage, loading capacitance, etc. For simplification without losing generality, the proposed ring oscillator is equivalent to the inverter-chain structure [8] as shown in Fig. 6 , and it is assumed that and for . In addition, a certain frequency error, , is allowed in the design specifications. Hence, the output frequency is allocated within , and the size of a locked window must be less than frequency error to produce target frequency. The output frequency of the cell-based digital controlled oscillator is then modeled as (5) (6) and is the total number of cascaded inverters, is the model of inverter bank and represents the delay of the th path in delay paths in (4). Based on (4) and (6), we can build a corresponding HDL model of the DCO to perform system-level integration simulation. Note that (6) must satisfy the following inequality of frequency errors: (7) In (7), for any given and , can always be found to satisfy the above constraints. So, the bandwidth of digital-controlled ring oscillators is proportional to , and its resolution is determined by . Both the and parameters are controlled by frequency-tracking module to search proper assignments (solutions) based on coarse and fine searches.
B. Loop Divider
The propagation delay of the loop divider must be reduced to enhance frequency search accuracy and to achieve higher clock rates. This is because the multiplication factor is proportional to the total length of the loop divider. Here this delay-reduction process is done in the time domain. Based on (3), a large " " is now replaced by "
," three short wordlength counters instead of a long wordlength counter. For implementation, the combination of (3) is easy to be realized and its structure is shown in Fig. 7 . A counter " " is activated after counting " " which are not autorestarted after counting C. Hence, the overall propagation delay is equal to count " ," instead of " ." For examples, if " " is 8000, " " can be and " " will be 20. Generally, the range of " " is to prevent timing violation of "Masker" in Fig. 7 . In addition, this scheme does not need any special circuit to achieve small propagation delay. By properly selecting " ," and " ," the propagation delay can always be reduced to an acceptable level.
V. IMPLEMENTATION AND DISCUSSION
A prototyping chip for a cell-based clock generator has been designed and fabricated in 0.6-m CMOS SPTM process. First, the controller is described by Verilog-HDL and digital-controlled ring oscillators are designed at gate-level with timing information from an in-house standard cell library. A behavioral model of the oscillator is then generated to complete the whole simulation with other blocks. Then source codes are synthesized to generate gate-level netlists and schematics for further simulations and verifications. These gate-level netlists are verified with original codes to check their behaviors and timing. In addition, area and timing of whole design must be carefully considered to meet requirements at this step. After timing and functions have been verified correctly, automatic placement and routing (APR) tool is exploited to complete physical layout.
Measured results show that the operating frequency of this test chip ranges from 360 to 800 MHz at 3.3 V. The working supply voltage range is V. Fig. 8 Fig. 9(a) and (b) . Because the proposed DCO and PADs are operated at full swing, its spectrum contains some high-frequency components. The histogram of maximum output frequency at 800 MHz/3.3 V( , 200 MHz output test) is shown in Fig. 10 , whose jitter is 60 ps. A chip microphoto of the proposed design is shown in Fig. 11 and its summary is listed in Table I . Note that power dissipation is mainly from the DCO, which consumes more than 85% of the total power consumption. Simulation shows that more power dissipation can be saved if minimum-size inverters (which are not available in our current cell library version) are exploited for the DCO.
From this research, it is found that the portable clock generator can be divided into two parts, namely the controller and the DCO. A frequency detection and acquisition scheme is performed in the controller, which is synchronized to the input ref- erence clock. As a result, this controller can be described by a synthesizable HDL code and becomes a soft-IP. The oscillation frequency is mainly achieved by the DCO whose maximum frequency is determined by a target cell library. If the target clock rate is higher than the DCO maximum frequency, only the DCO cell is needed to redesign as in [5] . Then an HDL behavioral description model can be set up and combined with a controller IP for HDL-level simulation. Thus, the design cycle for an on-chip high-speed clock generator, which is needed in many ASIC designs, can be reduced quite a bit. Moreover, an HDL simulation model of the portable clock generator can be combined with other system modules for system-level integrated simulation, making it very suitable for system-on-a-chip design.
VI. CONCLUSION
To exploit advances of digital VLSI, a portable high-speed on-chip clock generator with large pull-in range and low output jitter has been presented in this paper. Based on an in-house 0.6 -m CMOS SPTM standard cell library, a test chip has been designed, fabricated, and measured. Results show that the output frequency of the proposed solution can be up to 800 MHz, which is sufficient for many ASIC designs fabricated by 0.6-m CMOS process. In addition, the design cycle can be reduced significantly and system-turnaround time can be improved during technology migration. As a result, the completely portable, IP-based, and fully integrated clock generator solution becomes available for digital VLSI system designs. 
