Abstract-An ultra wide-range delay-locked loop (DLL) has been fabricated in 65nm CMOS technology. The proposed leakage delay unit (LDU) can easily generate a large propagation delay to reduce the difficulties to build up the high-speed digital counter in the cycle-controlled delay unit (CCDU) for a very lowfrequency operation. The proposed DLL circuit can operate from 500 KHz to 1 GHz, and the power consumption is 1.8mW @1GHz with very small active area (0.01mm 2 ).
INTRODUCTION
Delay-Locked Loops (DLLs) and Phase-Locked Loops (PLLs) are widely used in high-speed microprocessors and memory interfaces to eliminate the clock skew. To meet the specifications in different applications, the DLLs are desired to achieve wide frequency range especially in low-power system-on-a-chip (SoC) with dynamic voltage and frequency scaling (DVFS). Traditionally, DLLs are often designed with the charge pump-based architecture [1, [5] [6] [7] . However, the charge pump-based DLLs suffer from serious leakage current problem in 65nm CMOS process and the jitter performance becomes unacceptable. As a result, the low leakage CMOS process is often needed when implementing the charge pumpbased DLLs in 65nm CMOS process. But if the low leakage CMOS process is used, the circuit performance will be degraded, too. Hence, the all-digital DLLs [2, 4] which use robust digital control code to control the digital controlled delay line (DCDL) can avoid the leakage current problem and become more and more popular now.
The low supply voltage in 65nm CMOS process also makes it difficult to design a wide-range delay line. As a result, the analog DLL [5] use a multi-band voltage controlled delay line (VCDL) to cover the wide-range operations. However, in this analog DLL, extra I/O pins are needed to specify the desired frequency band and the required ratio of charge-pump current. The two-stage delay line is proposed in mixed-mode DLL [7] to achieve the wide-range operations. The coarse-tuning delay stage which uses path selector with delay cells can provide a large delay for wide-range operations, and the high resolution delay line is achieved by adding the voltage-controlled delay line after the coarsetuning delay stage. However, since many delay cells are used in the coarse-tuning delay line, the area and power consumption are also increased.
The cycle-controlled delay line architecture is proposed in digital DLL [4] to save the area cost when designing a widerange DLL. The cycle-controlled delay line uses the ring oscillator architecture to generate a large delay. However, since the next stage coarse-tuning delay unit must have a delay controllable range larger than the delay step of previous cycle-controlled delay line unit, the ring oscillator in the cycle-controlled delay line must operate in a very high speed. As a result, it is very difficult to design the high speed counter in the cycle-controlled delay line especially when the number of bits is increased. Thus if ultra-wide operation range is required, it is difficult to use the cycle-controlled delay line architecture to provide the required delay in low-frequency operation.
In order to overcome these problems in 65nm CMOS process, a novel delay cell which used the transistor leakage current to generate an extreme large delay is presented in this paper. The proposed delay cell can reduce the bit number required for cycle-controlled delay line stage and thus makes it possible to build a DLL with ultra wide operation range from kHz to GHz with lower power consumption and smaller chip area. As a result, the proposed DLL is very suitable for wide-range clock deskew applications in SoC era. This paper is organized as follows. Section II describes the overall circuit operation in the proposed DLL. The detail circuit implementation is discussed in Section III. Section IV shows the experimental result and the performance comparisons. Finally, Section V concludes with a summary.
II. OVERALL CIRCUIT DESCRIPTION
The block diagram of the proposed ultra wide-range alldigital delay locked loop (ADDLL) is shown in Fig. 1 . It is composed of the phase detector (PD), the controller and the digital-controlled delay line (DCDL). The digital controlled delay line is composed of four delay units: leakage delay unit (LDU), cycle-controlled delay unit (CCDU), coarse delay unit (CDU) and fine-delay unit (FDU). The reference clock (Ref_clk) is passed through the delay line and then outputted as Out_clk. The phase detector detects the phase relation between the reference clock and the output clock, and then it outputs up and down control signals to the DLL controller. The DLL controller changes the control code of the DCDL according to the PD's output to eliminate the phase error between the reference clock and the output clock. And when the phase error between the reference clock and the output clock is eliminated, the DLL is locked. When the DLL is used in clock deskew applications, the clock-tree buffer delay is added after the output clock and before the phase detector so that the clock skew can be cancelled.
In the proposed DLL architecture, the proposed leakage delay unit (LDU) is used to provide a large delay in the DCDL, and therefore the operating range of the DLL can be extended to a very low frequency. And in the conventional shift-register controlled DLL, the sequential search are often used to find the proper control code and resulting in a long lock-in time. The sequential search scheme is not suitable for wide-range DLL, and therefore the binary search scheme is used in the DLL controller to shorten the lock-in time of the DLL.
III.
CIRCUIT IMPLEMENTATION When DLL is in high-speed operation, the LDU unit is not needed, thus the reference clock is bypassed to the output of the LDU. In the LDU unit, each leakage delay cell has two inverters with one-bit control. When the control signal "FAST[n]" is high, it means that the n-th leakage delay cell is in the fast mode. Oppositely when "FAST[n]" is low, it means the n-th leakage delay cell is in the slow mode. Fig. 2(b) shows the schematic of the proposed leakage delay cell and the timing diagram of this delay cell is shown in Fig. 2(c) . The proposed leakage delay cell use the transistor leakage current in 65nm CMOS process to generate an extreme large delay. In Fig. 2 (b) when "FAST" is high, the behavior of the leakage delay cell looks like two cascading inverters, and the delay from "IN" to "OUT" is very small. When "FAST" is low and signal "IN" has rise transition, the delay from "IN" rise transition to "A.INT_OUT" fall transition is still very small. However, the delay from "A.OUT" fall transition to "B.INT_OUT" rise transition is very large. This is because both pull-up and pull-down networks are switched off. Thus the leakage current in the always-off PMOS transistor charges the node "B.INT_OUT". And after a long delay time, the node "B.INT_OUT" is charged to high. Similarly, when "FAST" is low and "IN" has fall transition, the delay from "IN" fall transition to "A.INT_OUT" rise transition is very large, and the delay from "A.OUT" rise transition to "B.INT_OUT" fall transition is very small. As a result, when "FAST" is low, the delay from "IN" to "OUT" is very large.
The charging speed can be tuned by adjusting the width of the always-off PMOS transistor. The charging speed is also influenced by process, voltage, and temperature (PVT) variations. In the proposed leakage delay unit (LDU), the maximum delay is generated when FAST[31:0] is 32'h0 and the minimum delay is generated when FAST[31:0] is 32'hFFFF_FFFF. Fig. 3 (a) shows the architecture of the proposed cyclecontrolled delay unit (CCDU), and Fig. 3 (b) shows the detail circuit of the positive edge triggers cycle-controlled delay line. The proposed CCDU is composed of the positive edge trigger cycle-controlled delay line and negative edge trigger cycle-controlled delay line, SR latch and 2-to-1 multiplexer.
The input signal LDL_out triggers these two edgetriggered cycle-controlled delay lines at positive and negative edge respectively. As the trigger signal comes, the inner counter will start counting upward until it counts up to the input count value from the DLL controller. While the inner counter matches the count value, the signal "S" and the signal "R" are generated by these two edge-triggered cyclecontrolled delay lines. These two signals with a SR-latch can generate an output clock with 50% duty cycle as shown in Fig.  3 (c) . When DLL is in high frequency operation, the DLL controller will sent zero counter value to the CCDU, and then the signal "count_is_0" is pulled high, and the input signal "LDL_out" is bypassed to the output of the CCDU. In the proposed CCDU, the ring oscillator with the digital counter can generate a large delay for covering the one delay step of previous leakage delay unit in PVT variations. Fig. 4 shows the circuit of the proposed coarse delay unit (CDU). It is composed of 31 delay cells, 31 AND logic gates and 32 multiplexers. The proposed circuit can generate 32 different delays, and the minimum delay is one multiplexer propagation delay. Unused delay cells can be turned off for reducing power consumption. The total delay controllable range of the CDU should cover the delay step of the previous cycle-controlled delay unit in PVT variations. Fig. 5 shows the proposed fine delay unit (FDU). It is composed of N cascading buffers and (N-1) digital-controlled varactor (DCV) cells [3] . For better resolution and linearity of the delay line, DCV cells are used in fine tuning stage. The total delay controllable range of the FDU should cover the delay step of the previous coarse delay unit in PVT variations. 6 shows the timing diagram of proposed DLL. The proposed DLL takes one clock to initialize the internal circuit. After that, binary search scheme is used in the DLL controller to find the proper control codes in each of the four delay units of the proposed delay line. The phase detector detects lead or lag between the reference clock (Ref_clk) and output clock (Out_clk), and then it outputs the up or down signals to the DLL controller to update the control code of the delay line. After the phase error between the reference clock and output clock is cancelled, the DLL is locked.
IV. EXPERIMENTAL RESULT
The proposed ultra wide range DLL is designed and fabricated on a standard performance (SP) 65nm CMOS technology. Fig. 7 shows the layout of the proposed DLL, and its active area is 0.01 mm 2 . The operation range of the proposed DLL ranges from 500 kHz to 1GHz in typical case. The power consumption with 1.0V supply voltage is 1.8mW at 1 GHz, and is 0.11mW at 500 kHz. Each leakage delay cell can provides 67.2 ns delay in slow mode, and therefore the number of bits in the digital counter of the cycle-controlled delay unit can be reduced. As a result, the operation frequency of the DLL can be easily extended to a very low frequency. The performance comparisons with recent wide-range DLLs are given in the Table I . The analog DLL [6] is not suitable for low-frequency operation since it has a very narrow delay controllable range in the delay line. The analog DLL [5] can be used in wide-range operation, but it needs extra I/O pins to select the desired frequency band and the ratio of charge-pump current. In mixed-mode DLL [7] , if it is applied to a very low frequency operation, there will be too many delay cells in the path selector. Hence it is not possible to use this architecture in ultra wide range operation. In digital DLL [4] with cycle-controlled delay unit, it can be used in low-frequency operation. But if ultra wide operation range is required, it is difficult to extend the delay controllable range of the cycle-controlled delay line to a very low frequency operation. As a result, the proposed DLL with leakage delay cells can work well even in ultra wide frequency range operation.
V. CONCLUSION
In this paper, a novel leakage delay cell implemented with 65nm CMOS technology is presented. The proposed leakage delay cell with cycle-controlled delay unit can easily achieve ultra wide frequency range operation. The frequency operating range of the proposed DLL is from 500 kHz to 1 GHz. It also achieves smaller chip area and lower power dissipations than previous wide-range DLLs [4] [5] [6] [7] . As a result, it is very suitable for wide-range clock deskew applications in SoC era. 
