INTRODUCTION
The growing market of wireless sensor networks (WSN) and internet-of-things (IoT) calls for solutions to drastically reduce the cost and increase battery life. The RF PLL is an essential block of an IoT node, but it consumes a large portion of the overall power (20-35%) [1] . With technology scaling, intensive digital approaches such as the use of an all-digital PLL (ADPLL) are now becoming a reality [2] . The key to a true ultra-low power ADPLL (i.e., at the sub-mW level) is to cut down the high power dissipation of the time-to-digital converter (TDC), which usually dominates the power of the ADPLL next to the digital-controlled oscillator (DCO).
The finite time resolution of the TDC contributes mainly to the in-band phase noise of the ADPLL. The system specification of Bluetooth Smart indicates an intrinsic delay of transistors in nano-scale CMOS (e.g., around 10-30 ps in 40-nm CMOS) to be sufficient for achieving the required time resolution of the TDC. Thus, the most important problem left is to improve the power-efficiency of the TDC, which is the main purpose of this work.
The high power dissipation of a TDC comes from: (1) required dynamic range (DR) of the TDC that has to cover at least one period of DCO clock; (2) high operation rate of the TDC receiving the DCO clocks.
In this paper, a detailed implementation of a digital-to-time converter (DTC) to assist the TDC for an ultra-low power ADPLL is presented and validated by chip measurements. The DTC is used to reduce the required DR of the TDC. The power dissipation is further minimized by snapshotting, which leads to a power consumption that is only a fraction of present stateof-the-art TDCs.
II. ULTRA-LOW POWER ADPLL Fig. 1 shows the system diagram of an ultra-low power (ULP) ADPLL, where the dashed box highlights the focus of this paper, i.e., a DTC-assisted TDC. The DTC is a critical block for the realization of a fractional phase detector in an ULP ADPLL. In Fig.1 , the output frequency of the PLL, termed as a variable clock (CKV) is set by the product of the frequency command word (FCW) and the frequency reference clock (FREF). A divider-by-2 [2] is adopted to reduce the rate of the feedback path clock at the cost of doubling the required fractional phase detection range, that is, the period of CKV/2. The fractional phase difference of CKV/2 and FREF is largely deterministic and thus predictable in the steady-state operation [3] . This is exploited by introducing a DTC that delays the rising edge of FREF to maximally align the delayed reference signal FREFDLY with the next rising edge of CKVD/2. Thus the TDC only needs to cover phase noise (typically a few ps) plus time residue due to the quantization error and non-linearity of the DTC, rather than the full clock period of CKVD/2. The reduced DR of the TDC helps to save substantial power. Moreover, the DTC is free of sampling cells (e.g. D-flip-flops), which makes it more power efficient as compared to the TDC with an equivalent DR and time resolution. The 4-bit TDC is implemented based on a pseudo-differential structure. Compared to DTC-only bang-bang phase detector, the combination of DTC and TDC offers fast settling and better phase noise performance.
III. DIGITAL TO TIME CONVERTER
The dynamic range of a DTC is designed to cover the whole period of CKV/2 over process, voltage and temperature (PVT) variations and the resolution of it is determined by the requirement of ADPLL's in-band phase noise. Hence, a 6-bit DTC is designed. The system diagram of the DTC is depicted in Fig. 2 (a) , where the core DTC is composed of 64 delay stages; FREF clock is fed to all stages via clock feeders (CF) and is gated by the decoder output bits EK0-63, while the signals EB0-63 are applied to enable the delay elements (DE). The delay control bits determine the effective delay of the rising edge of FREF from its input to the DTC output, FREFDLY. Inside each delay stage (Fig. 2 (a) ), two switches (controlled by EKi, and EBi) decide whether this stage will be a clock feed-in point of FREF or it will be connected to the previous stage. For example, in Fig.2 (b) , when the i th stage is chosen as the feed-in point, the succeeding stages' CFs will be bypassed and only the unit delays (DE) are counted, so the effect delay of DTC equals to the sum of the identical offset (Doffset) by CF and the total delay of the succeeding stage delay elements and the delay element of the feed-in point. [4] is employed as CF to either pass the signal from the preceding stage (EKi=0), or to feed in the FREF clock (EKi=1). Pc is on when FREF is low as a preset state, while Nc serves as a pull-down switch when EKi and FREF are both high. For the delay elements, two cascaded inverters are utilized and gated so that they are switched off if they are not active on the propagation path. Doing so reduces unnecessary power loss.
IV. DIGITAL CALIBRATION ALGORITHM
The DTC control bits (DTCctrl) are calculated based on the conversion gain of DTC (KDTC) and the fractional part (PHRF) of the accumulated FCW (see Fig.3-(a) ), and their relationship is illustrated in (1), where KDTC is the ratio of the resolution (Dt) of the DTC to the time period of CKV/2. The quantization residue of the DTC is removed from the TDC result and has no impact on phase noise of ADPLL but the resolution of the DTC is sensitive to PVT variations, and the inaccurate KDTC makes the predicted reference phase (FREFDLY) either lag or lead the ideal predicted case, and the deviation of phase error (PHE) is termed as PE. Hence, the least-mean-squared (LMS) calibration based on phase error [3] is applied, as shown in Fig.3 (a) . Assumption here is made that the calibration scheme works at the steady state of a type-II PLL, features a statistic zero-mean PHE between the variable oscillator signal (CKV) and a reference signal (FREF). Any deviation caused by the 1/KDTC error is observed via PHE or the fractional part (PHF) of PHE.
Equation (3) indicates that finding the minimum meansquare error is to find the correlation between e and the variable value PHRF, and this is implemented in Fig.3 (a) and (b). The background calibration is to adaptively estimate the variation of the reciprocal of the DTC conversion gain, 1/KDTC, and to minimize the error e (in (2)) related to the fractional phase error detection, in the way that the error e correlated to PHE by inaccurate 1/KDTC is minimum when the derivative of the mean-squared error E(e 2 ) to 1/KDTC is zero. The diagram of 1/KDTC calibration is depicted in Fig. 3 (b) , comprising of an estimating error block, an IIR filter and an accumulator. The predicted phase by the DTC conversion gain is a saw-tooth waveform, whose slope is sharper than that of the ideal phase if DTC conversion gain is underestimated and vice versa when overestimated. Furthermore in order to simply the hardware implementation, the sign of e is utilized instead of e itself, as well as the sign of fractional vale (PHEF) of PHE. The IIR filter is good for fast convergence and the accumulator is for the correlation function.
This algorithm is fully arithmetic and verified in RTL level together with the whole system of an implemented ADPLL, as depicted in Fig. 3(c) . The multiplier (1/KDTC) is thus calibrated and it converges to the desired value, and PHF is taken to Fig. 3(a) . DTC and TDC with phase error detection as part of the DTC conversion gain (KDTC) calibration loop. monitor the calibration, which is consequently minimized, indicating the locked-loop at the same time.
V. EXPERIMENTAL VERIFICATION.
The DTC and TDC circuits have been fabricated in a 40 nm CMOS process, together with remaining components for an ultra-low power ADPLL. Thanks to custom layout of the DTC and the TDC, parasitic capacitors are minimized to decrease their impact on the time performance. Also the compact layout helps to reduce the power consumption and mismatch. The diephoto of the chip is shown in Fig. 4 , where DTC+TDC occupy a chip area of 0.0034 mm 2 .
The measured results of the DTC reveal a time resolution of 22.3 ps, the peak DNL is 2.2 LSB and the peak INL is 1.7 LSB, all without calibration (see Fig. 5 ). Relatively high DTC non-linearity is contributed by the small device geometry for minimizing power consumption. The TDC resolution can be estimated by the in-band closed-loop phase noise floor in a locked state. To avoid any non-ideal effects of the DTC and to suppress the noise contribution from the DCO, the phase noise at an integer channel is measured and the loop bandwidth is maximized to wider than 1 MHz. In Fig. 6 , a measured in-band phase noise of -95.3 dBc/Hz corresponds to a worst case TDC resolution of 22 ps.
The post-layout simulation of DTC is shown in Fig. 7 . The peak point of both DNL and INL curves are due to the different metal path of the middle two stages in the layout. Compared to the simulation results, the time resolution (27 ps) of DTC is the The measured complete power of DTC+TDC is 43 W. Post-layout simulations help estimate that 1/3 of the power consumption is from the DTC, and 2/3 from the TDC. Table I provides the performance comparison with state-of-the-art TDCs for low energy radios. A TDC FoM is defined in Table I . Thanks to the assistance of the DTC and snapshotting technique [2] , this work achieves the lowest dissipated power while maintaining the required TDC dynamic range, leading to a state-of-the-art FOM of 0.49 fJ.
VI. CONCLUSION
An ultra-low power DTC with fully digital calibration has been proposed and demonstrated in 40 nm CMOS. The use of phase-prediction and DTC assistance is verified to reduce the required dynamic range and power consumption of the TDC. The calibration of DTC conversion gain is discussed and verified. The designed DTC-assisted TDC was also integrated with an ADPLL and overall system results have been demonstrated to validate the system performance of the DTC/TDC pair.
