Abstract-A novel three-stage architecture programmable digital delay line (DDL) with a picosecond resolution, l/-ls range, and sub-picosecond jitter performance is proposed. Through circuit simulation, a dynamic range of l/-ls is obtained in the first stage using to-bit counters operating at a frequency of 1 GHz.
INTRODUCTION
Delay lines are defined as circuits used to delay signals by a pre-selected time constant. The most superior delay lines are optical based and is capable of providing high delay resolution in the order of femtoseconds. Although the range of a single optical delay line is limited, it can be extended by cascading multiple units but at the cost of a complex setup [1, 2] . On the other hand, solid state delay lines such as CMOS delay lines have the upper hand in terms of system simplicity. However, when cascading multiple CMOS delay lines such as multiple inverter chains together, linearity and resolution issues arise due to device mismatch, delay offset and delay uncertainty at the output of each stage, which in turn affects the linearity and the jitter performance [3] .
Delay lines find many applications in industry and science.
The most common application is the digitization of short time intervals in Time Interval Measurement (TIM) circuits [4, 5] .
In addition, delay lines also find application in range imaging [6] . In the computer industry, a digitally Tapped Delay Line (TDL) is used as a shift register to move, delay, and store data at precise time windows [7] . Moreover, CMOS delay lines are used in on-chip time measurements and the synchronization of a CPU with its interfaces [5, 8] .
There is always a trade-off between delay resolution and delay range of CMOS delay lines, thus limiting their application [3] .
To overcome this limitation, several CMOS delay line architectures have been proposed using one or two stages [4, 5, [9] [10] [11] . The range is extended by making use of a coarse counter that counts reference clock periods between two electrical pulses, usually referred to as START and STOP pulses. The coarse delay is changed by a user via programming these two pulses. Subsequently, higher resolution delay steps are achieved using an interpolator which subdivides and resolves the fractional parts of the clock period into smaller time windows. Even though several researchers have shown significant improvements in terms of the delay resolution [9, 10, [12] [13] [14] [15] , dynamic range [4, 12, 14, 16, 17] , or jitter performance [9, 18, 19] , none of these works has proposed a delay line circuit that embodies the three attributes of long delay range, picosecond time step with sub-picosecond jitter perfonnance in a single circuit.
Motivated by this research gap, the aim of this paper is to demonstrate a new architecture of a digital delay line which fulfills all of these parameters in a single CMOS circuit. The proposed design is based on a three-stage architecture that is explained in the subsequent section. The results and discussion are presented in Section III. Finally, Section IV summarizes and concludes this paper. 
:
,
I :-ds. ____ �Tw-�_u word. This word corresponds to the amount of delay required.
When the counter's count in the delay generator is equal to the value at Tdc'S input, a STOP pulse is generated. The duration between START and STOP pulses is equal to the desired coarse delay. The inverted output of the delay generator is used to trigger a pulse width generator. Using the same mechanism as the delay generator, the pulse width generator generates a pulse with width equal to the value that is programmed through the T w pins. The specifications of the three-stage digital delay line are listed in Table I . The proposed architecture can be integrated as part of any System-on-Chip (SoC) as the estimated layout area for all three delay stages is only 300xSOOflm 2 .
III.
SIMULATION RESULTS AND DISCUSSION
The proposed 3-stage CMOS delay line is designed using a 0.13flm CMOS process. The power supply voltage is I.2V.
978-1-4799-1731-0/15/$31.00 ©2015 IEEE 46 The maximum delay, T demax, for a 2ns input pulse is approximately 997ns, as shown in Fig. 6 .
According to the technology used, the maximum achievable resolution using a simple minimum size inverter is approximately 23ps. However, cascading inverters in a chain will not produce a linear delay step increment between subsequent inverter stages due to the complex parasitic capacitance network at the input of an inverter chain structure
[21]. Nevertheless, 46 inverters are used in the MRDL stage to obtain approximately 97Sps of delay. Fig. 7 shows the simulation output of the maximum delay generated by the MRDL stage. Ideally, the MRDL stage should generate a maximum delay of 1000ps. Although this exact value was not attainable despite many circuit modifications made, the lack of 2Sps is compensated in the proceeding fine delay stage.
The fmest resolution is attained using the FRDL stage. This stage is designed using a custom-made DLL. The DLL is controlled using analog signals in order to generate a I picosecond step. To achieve sub-picosecond jitter performance and extend the delay range linearly to 27ps, 2 NAND gate based delay elements were incorporated for the DLL. The delay is controlled from zero to 27ps by varying Vb p (see Fig.   S ) from 0.04V to IV. As shown in Fig. 8 , the achieved maximum delay range of the FRDL stage is 27ps. The lock time of the DLL is only 10 cycles. Fig. 8(a) shows that the output pulse has a delay of 122ps when Vb p is IV. As Vb p is decreased to 0.98SV, the delay changes to I2 Ips, as shown in Fig. 8(b) . Fig. 8( c) shows that a maximum delay of 27ps is generated when Vb p =0.04V. Using an ideal input reference clock, the peak-to-peak and RMS jitter values obtained in simulation are 0.8ps and O.OISps, respectively. Linearity analysis is also considered for the three delay stages, as illustrated in Fig. 9 . Since the estimated layout area for all three stages is only 300x5001lm 2 , this design can be integrated with other SoC circuits that require these delay specifications.
