Abstract-In most digital-to-time converter (DTC) based applications, apart from maintaining low integral non-linearity (INL), it is also required of the system to achieve a wide frequency translation range. To achieve this performance, we present a dualphase direct digital synthesizer (DDS) based DTC with phaselookahead mechanism. The proposed technique of variable phaseadvancement enhances the frequency translation range, without excessive power consumption. A 5-GHz digital phase locked loop (DPLL) with switched loop, incorporating this DDS based DTC, is implemented in CMOS65 nm-LL technology. The proposed DDS based DTC is able to perform fractional shift upto ±80 MHz with 100 MHz reference clock, using 3 mW of power from 1.2 V supply. A simple look-up table based foreground-calibration of phase-to-amplitude converter (PAC) in DDS improves the peak INL of the DTC to 0.25 ps. Hence, with the proposed DTC and a proportional-integral-derivative (PID) controller based loop, we are able to achieve a low-jitter fractional-N DPLL with fastest settling time of 1-µs reported until now for fractional-N PLLs. 
I. INTRODUCTION
In current generation digital phase-locked loop (DPLL) architectures, digital-to-time converters (DTCs) have been extensively utilized for efficient fractional division. The DTC, as a standalone building block, generates the desired delay in a finite delay-range, based on the programmed control word. However, fractional-N PLLs [1] - [4] operate with the DTC emulating the property of infinite-delay-range over the time, by generating the desired frequency offset using continuously incrementing output from a phase accumulator. Figure 1 shows a DTC+Accumulator generating a constantly rotating phasor, which leads to the required frequency translation of the DTC input (clk in). In the DTC+Accumulator implementation, as the time progresses, the accumulator output (dtc word) increments with the programmed frequency control word (FCW) at the sampling clock rate (T s ), thus generating an incremental delay (τ d ) in the DTC input signal. When the accumulator output overflows, the DTC signal also undergoes a phasewraparound, as shown in Fig. 1(b) . The DPLL performs better in terms of spur rejection if the DTC possess inherent phasewraparound feature, which effectively provides a seamless infinite-delay-range emulation [5] . Figure 2 highlights that a DTC has perfect phase-wraparound as an inherent feature, if the gain (or) endpoints of the transfer characteristics are welldefined in the system. The transfer characteristics in Fig. 2 highlights that a DTC implemented with two or more phases of input signal, has to be calibrated only for nonlinearities, since the gain is deterministic with known endpoints of the transfer curve. On the contrary, a DTC receiving single phase of incoming signal as an input, has to be calibrated for both gain and nonlinearities in the transfer characteristics. The drawback is that the DTCs requiring frequency-dependent gain calibration have convergence time in the order of tens of microseconds, thus slowing down the system speed. The perfect phase-wraparound feature of a DTC can help in overcoming the above mentioned problem and achieving instantaneous frequency switching. A direct digital synthesizer (DDS) based DTC, working on this calibration-free infinite delay-range property, acts as a simple plug-n-play fractional divider in a DPLL, as shown in Fig. 3 . We have shown a lock-time enhanced DPLL architecture incorporating DDS based DTC in [6] , which uses a switched loop with variable phase-detection and proportional-integralderivative (PID) controller based finite state machine (FSM) [7] to achieve a record low settling time. Fig. 3 : DPLL block diagram with DDS based DTC as a "plug-n-play" block for fractional division [8] .
For attaining low jitter, most of the DDS based DTCs incorporate harmonic-rejection, polyphase mixing and/or antialiasing filtering [9] , which result in high power dissipation. Even with these filtering techniques, the DTC jitter does not reach sub-picosecond range, and the frequency translation range remains limited to the Nyquist rate.
In this work, we have proposed a dual-phase DDS based DTC architecture with a phase look-ahead mechanism, to achieve a wide frequency-translation range. Instead of powerconsuming high-order filters, the proposed architecture uses time and phase synchronized DDS array for undesired spurs rejection. While the usage of a DDS array for uncorrelated noise rejection has been discussed in [10] , this work uses the concept of multi-phase DDS with phase-advanced information for additional attenuation of highly correlated spurs. In addition, by carefully choosing the output sampling edge, we are able to achieve a DTC with very low INL, without even using any bandpass or polyphase filtering techniques.
Towards developing a DDS based DTC system having widerange frequency modulation with low jitter, we discuss suitable DTC variants and concept of multi-phase DDS approach in Section II and Section III, respectively. The frequencyrange enhancement techniques for dual-phase DDS based DTC system have been discussed in Section IV. Section V analyzes the shift in alias-frequency with the dual-DDS array, and Section VI highlights the reasons for spur-origin in the DTC implementation. Section VII sets forth the basis of the switched-loop DPLL system which is capable of achieving fastest-reported settling time with its novel PID controller. Section VIII presents the performance of proposed DDS based DTC as a fractional divider in the aforementioned 5-GHz DPLL, implemented in CMOS65 nm-LL technology. Section IX shows further improvement in the DTC linearity with a calibrated pre-distortion applied to the DDS look-up table (LUT). The performance comparison of DTC variants in Section X manifests that the proposed dual-phase DDS based architecture operates with state-of-the-art INL of 0.25 ps with optimal power consumption, without the need of background calibration.
II. CHOICE OF DTC ARCHITECTURE DTCs are generally implemented using either a digitally controlled delay line (DCDL) or a phase interpolator (PI), as shown in Fig. 4(a) . A DCDL, using single input-phase for delay generation, has to be calibrated for both gain and non-linearities due to PVT mismatches and random variations. A phase interpolator, on the other hand, uses two-phases of input signal to generate an intermediate-phase. Therefore, gain in the case of a phase-interpolator is fixed and only nonlinearities in the transfer-characteristics needs to be calibrated. Due to a deterministic gain, the perfect phase-wraparound property inherent to a phase interpolator easily fulfills the infinite delay-range requirement on DTC. A DCDL, on the contrary, needs continuous feedback from the system towards background calibration, especially to avoid drastic step-change during wraparound from maximum to minimum DTC control word value. In PLLs involving DCDL based DTCs, the convergence of the DTC-gain calibration loop also affects the settling response of the main loop. This convergence time of the calibration loop deteriorates further, when the DPLL has to lock to a near-interger channel (i.e. small fractional control word) since the correlation loop and the main-loop disturb each other [11] . Though [11] proposes a variablepreconditioned least-mean square (LMS) algorithm for fast calibration in a DTC, the convergence speed still remains in the order of 40 µs. Thus, towards a fast-settling DPLL design, usage of phaseinterpolator based DTC as a fractional divider turns out to be an easier option. Most popular variants of phase-interpolators are (i) Contention-based PI and (ii) Current-weighted PI, as shown in Fig. 4(b) . A contention-based PI involves multiple inverters sharing a common output which leads to an additional short-circuit error degrading the INL [1] . Inspite of a shortcircuit error suppression technique shown in [1] , INL of the DTC still remained limited to 1.4 ps. Another tradeoff of concern is that the edge-rate degraded in contention-PI for better linearity (to avoid effect of time-varying nonlinear resistance) results in a lower noise immunity [12] . The pipelined-PI in [1] , for instance, highlights that if additional interpolator stages are added for finer resolution (i) intrinsic delay increases and (ii) INL degrades, since any phase imbalance is propagated and could be amplified in subsequent stages. In other variants of PI also, for instance in [13] employing polyphase-filtering based PI, the trade-off between power and linearity is visible.
This work explores a fractional-divider architecture based on a current-mode PI, with the aim of achieving low-jitter, low-power and instantaneous fractional frequency generation in the DPLL feeedback path. Equation (1) [14] shows that the non-linearities in the employed DTC directly reflects as fractional spurs at the DPLL output. Therefore, improving INL of the proposed DTC is of paramount important.
where, L = In-band spur level.
The linearization techniques of (i) nonlinear code mapping in PI, (for instance, octagonal-rotator in [15] ) and (ii) dualreferenced PI [12] , shown in Fig. 4 (c), are widely used to improve the DTC INL performance. In the implemented design, LUT of the dual-phase DDS is pre-distorted to equalize the nonlinearities in the DTC. With this linearization technique, the proposed DTC based fractional divider is able to achieve a low rms-jitter of 0.19 ps, without impacting the settling response of the employing DPLL loop.
III. DDS BASED DTC SYSTEM OVERVIEW
In an offset DPLL system, the DDS generates the fractional frequency (ω f rac ) using which the DTC, acting as a simple phase-rotator, shifts the frequency (ω LO ) of incoming oscillator signal. This section highlights the characteristics of singlephase DDS and DDS-array based DTC which could be used as a standalone phase rotator, in contrary to DCDL based DTCs needing calibration feedback from the external system.
A. Single-phase DDS based DTC
A conventional DDS based DTC architecture in Fig. 5 consist of (i) a phase accumulator, (ii) a phase to amplitude convertor (PAC) implemented with a read only memory (ROM) followed by (iii) a digital-to-analog converter (DAC) and (iv) a mixer, as a fractional frequency divider in the PLL. In Fig. 5 , based on the programmed frequency controlled word (FCW) and sampling clock (f ref ), the DDS based DTC system modulates the incoming quadrature digitally controlled oscillator signal (QDCO), for removal of fractional frequency (f f rac ) component from the PLL feedback path. A single-phase DDS based DTC architecture has restricted performance in terms of (i) limited fractional frequency range due to aliased component, and (ii) increased output jitter due to harmonics generated with large quantization step.
B. Multi-phase DDS based DTC
To mitigate the issue of harmonics and aliased component restricting the fractional frequency range, we proposed a multiphase DDS based DTC architecture in [8] as an improvement over a conventional DDS architecture. The architecture in Fig.  6 uses multiple DDSs with its ROM being phase-advanced and sampling-clock being delayed to generate an interpolated waveform analogous to a second-order hold response. The summed-output from multi-DDSs arrangement reduces harmonics and aliased components by avoiding steep transition using phase-advanced information in the system. Though the phase-shifted DDS array is able to extend the frequency modulation range beyond the Nyquist rate of sampling clock, the layout complexity increases while trying to avoid the noise generated from cross-talk and delay imbalance at multiple DDSs output. To circumvent the need of generating a matched layout for a multi-phase DDS based DTC, this work derives frequency-modulation range enhancement technique for a dual-phase DDS based DTC system. A bruteforce approach of using a fixed phase-shift in the ROM and clock of a DTC system, analogous to Fig. 6 with DDS3 cell removed, still has limitation in terms of achievable fractional frequency range. The concept behind the limitation is that a fixed phase advancement in the DDS-ROM would not lead to the interpolation at correct intermediate-phase for all possible values of output frequency range.
IV. PERFORMANCE ENHANCEMENT OF DUAL-PHASE DDS
BASED DTC ARCHITECTURE A dual-phase DDS based DTC could cover a maximum frequency modulation range, if the phase look-ahead based waveform interpolation is applied at the correct time instant in the DDS-generated waveform. However, for different ranges of DDS generated frequencies, the ROM requires different amount of phase-advancement for correct look-ahead interpolation. Thus, instead of a fixed phase-advancement, the dualphase DDS based DTC needs a variable phase-shift in the additional DDS, with the phase-shift value depending on the programmed FCW.
With the aim of introducing correct interpolated points in the DDS generated waveform for the required frequency range, this work proposes advancing the FCW input to DDS2-ROM shown in Fig. 7(a) , rather than hard-coding fixed phaseadvanced values in the ROM. In the proposed modifications to dual-phase DDS based DTC, while the input address word for DDS2-ROM is advanced by FCW/2, the DDS2-clock (clkb) is also phase-shifted by 180
• . With the suggested operating principle, the output of both the DDSs are exactly time and phase interleaved, as shown in Fig. 7(b) . The in-phase relation between the two DDSs cause the output of corresponding mixers to be additive in nature. The drawback still pertaining to the architecture in Fig.  7 is that the outputs of both the DDSs are switching alternatively, causing an instantaneous phase change in-between them. Figure 8 shows that at each sampling instant of DDSclock (clk), output of one DDS switches its phase from lagging to leading with respect to the other. This phenomena results in an increased output jitter (≈3.4 ps) for fractional frequency range (> 45 MHz with f clk =100 MHz) near the Nyquist rate. As a low-jitter technique for frequency modulation range enhancement, this work proposes a variable phase-advanced dual-phase DDS based DTC architecture in Fig. 9 , with a multiplexer employed for fixed time-interleaving between the two DDSs. The multiplexer, with DDS clock (clk) as the select signal, combines the output of two DDSs with exact time interleaving of T s /2 (where T s is the DDS clock period). Thus, the time-interleaving of phase-shifted DDS output with a multiplexer avoids the instability inherent to the current-mode summation of DDSs output. 
V. ALIAS-FREQUENCY SHIFT WITH THE PROPOSED DTC
The dual-phase DDS based DTC in Fig. 9 [16] .
Let an ideal signal required to be generated from a DDS be represented as x(t), and the actual sampled output signal of DDS be denoted by x i [n], where, 'i' refers to the path index in a DDS array and T s is the DDS clock period. The sampled output generated by DDS1 and DDS2 can be written as,
The Fourier transform of (2) is given by
(3) Equation (3) is incomplete because DDS2 not only operates on a phase-shifted clock, but also has the PAC-ROM phaseadvanced by FCW/2 value. This phase-advancement translates into T s /2 advancement in time-domain, thus, a modified fourier transform for DDS2 can be represented as (4) Summing the fourier transform of dual-phase DDS output gives
where,
Equation (5) suggests that the odd-ordered image replica components get cancelled with the proposed DTC implementation, which can also be observed from Fig. 10 and simulated response in Fig. 12 . Hence with a first-order aliased component rejection, the frequency modulation range of DDS based DTC extends beyond the Nyquist rate.
VI. DUAL-PHASE DDS BASED DTC IMPLEMENTATION
The dual-phase DDS based DTC implementation in Fig. 11 uses a cascade-arrangement with current-mode signaling for DAC, mixer and succeeding current-mode logic (CML) divider in the DPLL feedback path. The current-mode cascaded arrangement prevents jitter increment due to transconductance non-linearities, and reduces power consumption with current reuse mechanism [5] . The 8-bit current-steering DAC is implemented using segmented architecture with lower 4-bit binaryweighted DAC and upper 4-bit thermometer-weighted DAC. The DDS involves a sign bit to complement the DDS output for generating frequency modulation in both (f LO + f f rac ) and (f LO − f f rac ) range, where LO is the input from QDCO.
A major concern in a DDS based DTC implementation is the existence of spurious tones due to input-phase mismatches and non-linearities in the mixer input transistors. For instance, (6) highlights that unequal gate-drain capacitance (C GD ) of mixer switches results in feedthrough of oscillator input (LO) signal, thus generating spur of magnitude V x at fractional frequency (f f rac ) offset from the desired DTC frequency.
where C mix + is the total node capacitance at the drain of mixer switching transistor. Equation (7) shows spur-generation at 2f f rac offset from the DTC frequency. This spur occurs due to phase-mismatches (ǫ) in the oscillator input (LO), leading to incomplete rejection of the image-component signal.
Equation (6)- (7) highlight that interconnect matching is crucial to avoid in-band spur-generation at the DTC output. Apart from the interconnect and device mismatches, the transconductance non-linearities of the mixer-switches results in generation of spurious tones at 4f f rac offset with respect to the output frequency. Figure 12 highlights the presence of spurious tone at 4f f rac offset from the desired signal. The spur-power level also governs the INL shape and magnitude, as observed from Fig. 13 . 
VII. SWITCHED-LOOP DPLL OVERVIEW
The DPLL [6] in Fig. 14 targets a low lock time-jitter product, by employing loop gain switching in the feedforward path and lookahead based phase interpolation in the feedback path. This architecture involves switching between different subsystems based on the phase error state-dependent switching rule shown in Fig. 15 . Figure 16(a) shows that starting from a large phase error (φ err ) magnitude, the loop traverses through activation of a linear phase frequency detector (PFD) with a DCO clock counter followed by switching to inverter based delay line. The deadzone in a single inverter (φ err2 ) is avoided by activating bang-bang phase detection (BBPD). To improve the settling time, the BBPD is activated initially with a FSM emulating an additional PID controller in the loop, as shown in Fig. 16(b) . With the presence of a linear PFD, the system remains linear-time invariant (LTI); and becomes non-linear time variant (NLTV) or time invariant (NLTI) while switching to BBPD with or without FSM.
When the hybrid phase detector enters bang-bang phase detection mode, the FSM gives a high initial derivative gain (K D init ) as correction on phase-error sign reversal. This derivative correction should be large enough to reduce the phase error below the value corresponding to an inverter delay (φ err 2 ). If the BBPD asserts similar phase error sign in consecutive cycles, the FSM activates another integrator (K I F SM ) in the loop to achieve fast frequency tracking until the phase error sign changes. At every phase-error sign reversal, the FSM activates derivative gain for immediate phase alignment of the reference and feedback clock. The derivative gain (K D ) is reduced with each phase error sign reversal, assuming that the loop is undergoing settling process. When the derivative gain becomes 0, the FSM is removed from the loop to avoid chattering in the settled state. The fast-locking features in the loop's feedforward path stand ineffective in improving the settling response, if the system anyway has to spend time for calibrating the fractional divider in-between frequency switching. Towards this requirement, the DPLL employs the proposed DDS based DTC which inherits a calibration-free operation and an instantaneous frequency switching. VIII. MEASUREMENT RESULTS The switched-loop DPLL incorporating the proposed DTC is implemented in CMOS 65 nm-LL technology with the chip micrograph as shown in Fig. 17 . The DPLL has a frequency range of 4.8-5 GHz, while operating with 100 MHz reference clock and 2 MHz loop bandwidth in the settled state. The DTC operating directly at 5 GHz DPLL output consumes 3 mW power, with 2 mW being consumed by current-weighted PI and 1 mW being consumed by dual-phase DDS array. The measured spectrum of DPLL output in Fig. 18 highlights the presence of spurious tones at f f rac , 2f f rac and 4f f rac offset as discussed in Section VI. The magnitude of the spurs generated in DDS based DTC system is amplified by the loop bandwidth of the PLL. The DTC spur-level based on (1) impacts its INL response and thus the DPLL output jitter. The proposed DDS based DTC with a phase-lookahead mechanism is able to achieve frequency-modulation in the range of ±80 MHz with the INL of 1.6 ps, as shown in Fig.  19 . Figure 20 presents jitter histogram of the DPLL output with the RMS jitter of 1 ps. The DPLL jitter is in the range of 0.8 ps-2 ps, for the DDS output range of ±80 MHz, depending on whether the generated spurious tone is located in-band or out-of-band. In addition, being free from the calibration-loop convergence requirement, the DPLL with the proposed DTC achieves a fast lock time of 1 µs, as shown in Fig. 21 .
IX. INL IMPROVEMENT WITH CALIBRATION
To improve the INL at the DTC output, it is essential to cancel out the sinusoidal variation resulting from spur- genenration at f f rac , 2f f rac and 4f f rac offsets. For this purpose, a foreground calibration based technique involving a strategic pre-distortion is applied to DDS-ROM with 10-bit phase-wordlength or address-lines.
For verifying the effect of foreground calibration applied to the DDS-ROM, the measured INL in Fig. 19 is regenerated in simulation by modeling C GD mismatches in the mixer switches and phase-variations in the oscillator signal. The resultant INL after DDS-ROM pre-distortion in Fig.  22 highlights the effectiveness of the foreground-calibration technique. The peak INL of the DTC improves from 1.6 ps to 0.25 ps, which also reflects as peak-to-peak jitter improvement from 1.2 ps to 0.5 ps at the DTC output. (In the implemented fractional-N DPLL, only one stable edge out of N-edges at the DTC output is sampled by the system as the feedback clock. Therefore, the DTC jitter is measured by observing the most stable edge out of N-edges being repetitively overlapped.) X. PERFORMANCE COMPARISON Table I shows the performance comparison of different variants of DTC architecture. The calibrated dual-phase DDS based DTC system is able to achieve a competitive INL of 0.25 ps with optimal power consumption, when compared to the GHz-domain phase interpolators allowing infinite delayrange over time. As illustrated in this work, the proposed DDS+PI system finds its usage in fractional-N DPLLs which doesn't require all the edges of PI output to be stable, and samples only one edge out of N-edges of the DTC output. For such applications, conventional GHz-domain interpolators are an overkill with inclusion of power-consuming filtering blocks for reducing jitter at all the output edges.
While the DCDL based DTCs achieve low INL with low power consumption, this performance is achieved with the limitations of (i) low operational frequency range and (ii) time-consuming background calibration slowing down the employing system's response. Thus for applications demanding frequency-translation with fast settling response, the proposed DDS+PI system turns out as a preferred solution over accumulator+DCDLs with limited range. For instance, the DPLL incorporating the proposed fractional divider has a settling-time of 1 µs, while DCDL based DPLLs need convergence-time of tens of microseconds for inital decision of coefficients used in the calibration technique. 
XI. CONCLUSION
A dual-phase DDS based DTC system with phaselookahead mechanism has been presented in this work. The proposed system achieves an extended frequency translation range, beyond the Nyquist rate of the DDS sampling clock. This DDS based DTC architecture, employed for fractional frequency shift in the feedback path of 5 GHz DPLL, is implemented in CMOS65 nm-LL technology with power consumption of 3 mW. Without any background calibration requirement by this infinite delay-range DTC, the DPLL could achieve a fast settling response of 1 µs. Further linearization of the DTC has been shown with pre-distortion of the DDS-ROM based on the estimated nonlinearity. With this foreground calibration technique, the succeeding phase-interpolator achieves the best reported INL of 0.25 ps, thus improving the jitter performance of the DPLL employing this system.
