Abstract-This paper describes a dual-loop delay-locked loop (DLL) which overcomes the problem of a limited delay range by using multiple voltage-controlled delay lines (VCDLs). A reference loop generates quadrature clocks, which are then delayed with controllable amounts by four VCDLs and multiplexed to generate the output clock in a main loop. This architecture enables the DLL to emulate the infinite-length VCDL with multiple finite-length VCDLs. The DLL incorporates a replica biasing circuit for low-jitter characteristics and a duty cycle corrector immune to prevalent process mismatches. A test chip has been fabricated using a 0.25-m CMOS process. At 400 MHz, the peak-to-peak jitter with a quiet 2.5-V supply is 54 ps, and the supply-noise sensitivity is 0.32 ps/mV.
A Dual-Loop Delay-Locked Loop Using Multiple
Voltage-Controlled Delay Lines
Yeon-Jae Jung, Seung-Wook Lee, Daeyun Shim, Wonchan Kim, Changhyun Kim, Member, IEEE, and Soo-In Cho
Abstract-This paper describes a dual-loop delay-locked loop (DLL) which overcomes the problem of a limited delay range by using multiple voltage-controlled delay lines (VCDLs). A reference loop generates quadrature clocks, which are then delayed with controllable amounts by four VCDLs and multiplexed to generate the output clock in a main loop. This architecture enables the DLL to emulate the infinite-length VCDL with multiple finite-length VCDLs. The DLL incorporates a replica biasing circuit for low-jitter characteristics and a duty cycle corrector immune to prevalent process mismatches. A test chip has been fabricated using a 0.25-m CMOS process. At 400 MHz, the peak-to-peak jitter with a quiet 2.5-V supply is 54 ps, and the supply-noise sensitivity is 0.32 ps/mV. Index Terms-Clock synchronization, delay-locked loop, duty cycle corrector, replica biasing, voltage-controlled delay lines.
I. INTRODUCTION

F
OR high-performance microprocessors and memory ICs, the use of phase-locked loops (PLLs) or delay-locked loops (DLLs) is essential to minimize the negative effects caused by skews and jitters of clock signals. In applications where the frequency multiplication is not required, a DLL is a natural choice since it is free from the jitter accumulation problem of an oscillator-based PLL. Conventional DLLs, however, suffer from the problem of their limited delay range since DLLs adjust only the phase, not the frequency.
We propose a new dual-loop DLL architecture that allows unlimited delay range by using multiple voltage-controlled delay lines (VCDLs). In our architecture, the reference loop generates four evenly spaced clocks, which are then delayed with controllable amounts by four VCDLs and multiplexed to generate the output clock in the main loop. The selection and delay control in the main loop permit the DLL to emulate the infinite delay range with a multiple of finite-length VCDLs. Moreover, a fully analog control technique can be applied to exploit the established benefits of conventional DLLs such as low skew and low jitter. To reduce supply-noise sensitivity further, a new low-jitter scheme is employed in a replica biasing circuit, which compensates the delay variation of a delay line against the injected supply noise. Finally, a duty cycle corrector immune to process mismatches is also used. This paper is arranged as follows. In Section II, following a brief overview of conventional DLLs, the proposed architecture Publisher Item Identifier S 0018-9200(01)03028-1. is described with design concepts and various building blocks. Section III describes circuits for low-jitter scheme and duty cycle correction. Section IV discusses the prototype chip implementation and shows experimental results. Section V concludes this paper with a summary.
II. ARCHITECTURE
A. Limited Range Problem of Conventional DLLs
A simplified block diagram of a conventional DLL [1] is outlined with its lock-failure cases in Fig. 1 . In the normal condition, the DLL forces the output clock ( ) to be aligned with the input reference clock ( ) through the negative feedback loop, which comprises a voltage-controlled delay line, a phase detector, a charge pump, and a loop filter. The clock buffer (CLK-BUF) is inserted to provide the chip-wide clock. Although this simple architecture offers many design flexibilities, the main problem in the conventional DLL of Fig. 1(a) is that the delay time of the VCDL ( ) has a minimum and a maximum boundary. Therefore, the DLL has states in which it does not work, as shown in Fig. 1(b) . When has a maximum delay and the leads the , DN pulses are generated but the VCDL can not produce any more delay. On the other hand, when has a minimum delay and the lags the , UP pulses are generated but the VCDL cannot reduce any more delay. These lock-failure cases arise from the facts that the range of is limited and the initial value of is not known at loop startup. An additional loop startup control circuitry may solve this problem and the DLL acquire lock. Unfortunately, the delay time of the clock buffer and following clock distribution tree ( ) 0018-9200/01$10.00 © 2001 IEEE deviates from the value at the simulation stage according to temperature and voltage variations [2] . When the variation of is excessive, the DLL loses the lock and falls into the lock-failure cases in Fig. 1(b) .
A DLL relying on quadrature phase mixing [3] has been proposed to overcome the limited range problem of the conventional DLL. The phase mixing technique using quadrature clocks provides unlimited phase shift capability. However, phase mixing uses two small slew-rate clocks to obtain linear results. Therefore, this approach has the disadvantage of the increased dynamic noise sensitivity and jitter. In the semidigital DLL [4] , a digitally controlled phase interpolator uses internally generated 30 -spaced clocks through the dual DLL architecture. Although noise sensitivity issues on the phase interpolation could be alleviated by smaller interpolation intervals, inherent digital nature causes dithering around zero phase error due to continuous control-bit updates. A digital DLL architecture with infinite phase capture ranges [5] is also not free from the same dithering problem and requires a large chip area for fine delay control. Fig. 2 shows a block diagram of the proposed dual-loop DLL architecture [6] . This architecture is based on two loops: the reference loop and the main loop. The reference loop is locked at 180 phase shift through the conventional DLL architecture. Since the reference loop VCDL is composed of four main delay cells, each delay cell generates a 45 phase shift at locked condition. All delay cells including delay buffers are differential elements commonly controlled by the output of the charge pump. The delay cell named "3" means three parallel-connected delay cells, so that the load balance between 0 and 180 clock is preserved. The reference loop provides two differential clocks spaced by 90 to the main loop. To cover the entire 360 phase range, clocks from the reference loop are partially inverted and inputted to four sets of VCDL in the main loop. Each main loop VCDL is composed of three delay cells and generates low swing internal clocks-, , , and . These clocks experience the analog delay time control by two kinds of four control voltages generated from two main loop charge pumps. The multiplexer selects one of four clocks as and this clock feeds the clock buffer whose function is to convert low swing to full CMOS-level as well as provide the chip-wide output clock, . The drives the phase detector which compares it to the reference clock. The output of the phase detector is used by two charge pumps and four loop filters to control the delay time of each main loop VCDL. Four-to-one clock switching is implemented by the window finder and the state decoder block. The window finder monitors the boundary where the selected is switched and forces the state decoder to update the two-bit selection code at the switching event. The selection code not only controls the clock selection at the multiplexer but changes the configuration of two charge pumps and four loop filters to accommodate the clock switching. Duty cycle correction (DCC) is employed to remove the duty cycle imperfections of the input clock and the output clock . Finally, although two input clocks, and , can be merged into one clock input, lower jitter clock source is preferred as the , if possible, since it determines the jitter characteristics of the whole DLL.
B. Proposed Dual-Loop DLL
In this architecture, the clock selection scheme enables the output clock to cover the entire phase range (modulo ). Furthermore, seamless clock switching is possible by optimizing the main loop VCDL delay control scheme. Moreover, the phase locking is achieved by fully analog control in all loops, so that we can apply low-skew and low-jitter techniques, established in conventional DLLs.
C. Reference Loop Design
The objectiveness of the reference loop is to provide quadrature clocks to the main loop. Since the main loop uses these multiphase clocks as references, the phase distribution in the output clocks should be preserved against a possible harmonic lock. The reference loop phase detector depicted in Fig. 3(a) has the capability to detect and escape up to the second harmonic lock. This design is made of two level-sensitive AND/NAND logic which requires 45 and 90 clocks as well as 0 and 180 clocks. At one period lock, clocks and UP/DN output waveforms are shown in Fig. 3(b) . The phase detector asserts their UP and DN outputs for equal duration due to 45 clock in order to avoid a dead-zone problem, although the phase offset of the reference loop gives negligible effects on the offset of the main loop output clock. At the second harmonic lock as shown in Fig. 3(c) , the phase detector detects that the loop is in the harmonic lock due to 90 clock and asserts only UP output to escape the harmonic lock. By limiting the delay range of a delay line, there is no possibility of harmonic lock over third since the reference loop is composed only of delay cells with no additional delay elements such as the clock buffer.
D. Main Loop Design
The main loop design is focused on the selection control and delay control of the main loop VCDL to achieve the infinite delay range by using four finite-length VCDLs. Fig. 4(a) shows the conceptual timing diagram of the main loop VCDL selection control. Assuming clock is selected as , the moves in the movable range according to the output of the main loop phase detector. Other clocks remain fixed at the initial phase relationship spaced by 90 . When the rising edge of the coincides with that of (or ) clock, "select up" (or "select down") is generated and then is changed to (or ) clock. Now (or ) clock acts as a new selected clock in a right-shifted (or left-shifted) movable range. Thus, clock switching at the quadrant boundaries can be repeated in this manner, to cover the entire phase range. Fig. 4(b) shows a block diagram of the selection control logic. Since the passes through the MUX stage, a MUX replica is required for delay matching between the and all internal clocks. Therefore, clock waveforms in Fig. 4(a) are validated. In the window finder, one inverter-one NAND pair makes the window which is bounded by rising edges of two input clocks. Thus, four windows are generated. Sampled values of these windows by the enable the window finder to find which window the belongs to. If the found window is the "select up" or "select down" region, UP or DN signal is generated, respectively. Then, the state decoder updates two-bit selection code to change the in one clock cycle. Although clock switching occurs immediately after the switching event, there is the possibility of the small delay difference in the since the rising edge of old may have a different time position with that of new after clock switching. This delay difference can be represented as a switching jitter at the lock state.
The delay control of the main loop VCDL should be optimized between two conflicting conditions, delay range and power consumption. More delay cells mean larger delay range but their power consumption is proportional to the number of required delay cells. Furthermore, a larger delay causes a larger jitter. Intuitively, we apply a single control scheme as shown in Fig. 5(a) , where only the rotates and other clocks remain fixed in phase space. Thus, clock switching occurs at the quadrant boundaries. Unfortunately, since the required delay range is from 90 to 90 , this control scheme consumes the same number of delay cells per VCDL as those in the reference loop. In order to reduce the number of required delay cells, a differential delay control scheme is employed. The differential control means that when the rotates counterclockwise, all other clocks rotate clockwise with their phase relationship fixed. If all clocks move with same speed, the required delay range is from 45 to 45 , as shown in Fig. 5(b) . However, if the must rotate in the opposite direction after switching due to the delay fluctuation of the reference clock or the clock buffer, there is the problem of losing the lock since the delay range of a VCDL was already exhausted. In Fig. 5(c) , we adopt a differential delay control with 3 speed difference, where the moves three times faster than other clocks, so that 3/4 of delay cells in the single delay control case satisfy the required delay range, 67.5 to 67.5 . Since 3 speed difference provides a shared region in the available delay range of two neighboring clocks, seamless clock switching is possible in any direction without losing the lock with three delay cells per VCDL. Fig. 6 shows the configuration of the main loop phase detector, charge pumps, and loop filters. Outputs of the phase detector are connected to the charge pump1 (CP1) directly and to the charge pump2 (CP2) with inversion. Thus, if the CP1 generates an increasing control voltage for a VCDL which generates the , the CP2 generates a decreasing control voltage for all other VCDLs. As a result, two substantially identical charge pumps are used for the differential delay control scheme. Three times speed difference is implemented by the fact that the CP1 has one loop filter and the CP2 has three loop filters. In case of clock switching, the selection code alters the connection between charge pumps and loop filters. Consequently, charge redistribution occurs between three loop filters except a loop filter for the new . This charge redistribution proceeds rapidly since two different voltages converge into one value. The fast VCDL control voltage change prevents possible dithering around the clock switching phase. Fig. 7 shows one example of the main loop VCDL control procedure starting at the unlock state. Let us assume the should be near 180 in phase space to acquire the lock. Initially, assuming the selection code is "00,"
clock is selected as the . The rotates counterclockwise in phase space according to outputs of the phase detector. All clocks excluding the rotates clockwise with one-third speed compared to that of the . Before the delay range of the VCDL generating the is reached at a limit, the is changed to clock. Thus, the selection code is "01." All clocks except the new settle near their original phase positions with -phase space by the charge redistribution of loop filters. After clock switching, the still moves counterclockwise to be switched to clock. Since this "10" state is near the lock state, the DLL can acquire the lock by a minor delay control. However, let us assume the delay time of the must decrease due to the delay fluctuation of the reference clock or the clock buffer. Similarly in the delay increase case, before a VCDL delay range is exhausted, the is returned Fig. 7 . Example of the main loop VCDL control procedure. to clock, "01" state. In result, the proposed DLL covers the entire phase range and remains at the lock state in any direction switching by optimizing the control schemes of multiple VCDLs. Therefore, since this architecture makes it possible to emulate the infinite-length VCDL by using multiple finite-length VCDLs, the DLL overcomes the problem of conventional DLLs, described by the limited delay range and the initial phase relationship constraint.
III. LOW JITTER SCHEME AND DCC
A. Low-Jitter Scheme
The jitter performance of the DLL is degraded by various noise sources, typically in the form of supply and substrate noise in high speed and highly integrated circuits. To reduce the jitter, the loop bandwidth should be set as high as possible but must have an upper limit for stability issues. Thus, low-jitter DLL designs strongly depend on the delay characteristics of a delay line with supply-noise injection. In order to design the delay line with low supply-noise sensitivity, the replica biasing for the delay control must be considered in noisy environment. The replica biasing circuit, which consists of a half-replica of a differential delay cell and an operational amplifier (op-amp), sets the low swing level of the delay cell to the reference voltage, . In the conventional replica biasing, the tracks the supply variation with the same amount. Unfortunately, this is not the optimal solution. The variation of the op-amp gain and the tail-current source distorts the delay characteristics of the delay cell. The delay equation of this case is described by (1) where delay time of a delay cell; load capacitance; swing voltage of the delay cell; current of the tail-current source. For a positive supply variation of , since is positive and negative, greatly decreases. In the design depicted in Fig. 8(a) , an additional reference voltage generator is attached to the replica biasing circuit. The reference voltage generator is composed of one transistor and two resistors and generates the reference voltage, , in the nominal supply condition. When there is a supply variation of , the reference voltage generator produces a predetermined variation of , which is a reduced swing compared to , as shown in Fig. 8(b) . The reduced swing compensates the delay variation due to the aforementioned variations induced by supply noise. Thus, supply-noise sensitivity can be minimized. For a given supply noise, the desired is a function of across transistor as follows: The sensitivity of over to process variations should be analyzed to guarantee a reliable operation. For example, the sensitivity to the threshold voltage variation of the transistor can be obtained by (3), shown at the bottom of the page. In (3), means of transistor . The sensitivity value is in the order of with a of 100 mV. Similar analyses with other process parameters also show that the predetermined is kept nearly constant under moderate process variations. This replica biasing circuit is commonly applied to all VCDLs of the reference loop and the main loop to achieve the low-jitter characteristics through the whole DLL.
B. Duty Cycle Corrector
The duty cycle of clock signals within the DLL deviates from its ideal value of 50% due to various asymmetries in signal paths and voltage offsets in an off-chip generated reference clock. For applications in which the timing of both edges of the clock is critical, a duty cycle corrector (DCC) is required to maximize timing margins. A DCC [3] in Fig. 9(a) As the clock frequency is increased, tighter bound is placed on the performance of the DCC. Even worse, process mismatches between transistors work as a serious error factor in the DCC especially under deep-submicron technology. Although process mismatches plague all devices, special care must be paid to the duty detection stage since near-ideal performance of this stage can remove the duty cycle distortion caused by the mismatches of all other nonideal blocks. The proposed duty detection block is based on two stacked source-coupled pairs configuration, as shown in Fig. 9(b) . The source-coupled pair is immune to device mismatches due to its current steering capability, i.e., since for fairly large input signals, the source-coupled pair conducts the current set by the tail-current source through only one branch, various mismatch effects in transistors can be hidden. The common-mode problem of this approach is solved by the transistors in boxed area, comprising the self-biasing technique [7] , which enables the output common mode to be dynamically adjusted by input clocks. Two transistors with source and drain tied are added to eliminate the load imbalance caused by the self-biasing circuit. Fig. 10 shows the simulated mismatch sensitivity characteristics of the DCC with the proposed duty detection stage over typical process mismatch parameters, mV m and m [8] . Under 50% duty cycle of the input clock, the duty cycle error is less than 2 ps, which guarantees a robust operation against process mismatches. 
IV. EXPERIMENTAL RESULTS
The test chip has been fabricated using a 0.25-m five-metal CMOS process. The threshold voltages in this process are 0.57 V (nMOS) and 0.55 V (pMOS). The gate-oxide thickness is 5.8 nm. Fig. 11 shows the layout of the prototype chip. The active area of the DLL occupies 0.13 mm .
Waveforms depicted in Fig. 12 shows two-bit selection code with the reference clock input grounded, while running the input clock at its nominal frequency of 400 MHz. In this configuration, the main loop phase detector always asserts DN signals. Therefore, the selection code is continuously updated in accordance with sequences of "00," "01," "10," and "11." This means the infinite times rotation of the output clock throughout the full 0 -360 range. Fig. 13 (a) and (b) shows the jitter histograms of the DLL clock output at 400 MHz. Fig. 13(a) shows 6.7 ps RMS and 54 ps peak-to-peak jitter characteristics with a quiet power supply. With a 300-mV 2.5-MHz square-wave supply noise, the peak-to-peak jitter increases to 150 ps, as shown in Fig. 13(b) . The ratio of the peak-to-peak jitter to the RMS jitter is well maintained in spite of supply-noise injection. Supply-noise sensitivity is measured to be 0.32 ps/mV. Table I summarizes the DLL performance characteristics. The DLL operates from 150-to 600-MHz frequency range with a 2.5-V supply. Static phase error between the reference clock and the output clock of the DLL is less than 20 ps. Operating at 400 MHz, the DLL dissipates 60 mW.
V. CONCLUSION
We have described a dual-loop DLL architecture that allows the unlimited delay range by using multiple VCDLs. The reference loop generates four evenly spaced clocks without a possible harmonic lock. Clock selection in the main loop enables the DLL to cover the entire phase range and seamless clock switching is achieved by optimizing the main loop VCDL delay range control. Thus, this architecture can emulate the infinite-length VCDL with multiple finite-length VCDLs. To obtain low supply-noise sensitivity, the low-jitter scheme generates a reduced swing voltage compared to supply noise for the delay compensation of a delay line. Finally, a duty cycle corrector presents a high immunity to process mismatches with the help of two stacked source-coupled pairs configuration. A prototype fabricated using 0.25-m CMOS technology achieves 54-ps peak-to-peak jitter and 0.32-ps/mV jitter supply-noise sensitivity.
