Abstract: This study provides an in-depth review of the principles, architectures and design techniques of CMOS time-to-digital converters (TDCs). The classification of TDCs is introduced. It is followed by the examination of the parameters quantifying the performance of TDCs. Sampling TDCs including direct-counter TDCs, tapped delay-line TDCs, pulse-shrinking delay-line TDCs, cyclic pulse-shrinking TDCs, direct-counter TDCs with interpolation, vernier TDCs, flash TDCs, successive approximation TDCs and pipelined TDCs are studied and their pros and cons are compared. Noise-shaping TDCs that reduce in-band noise below technology limit are investigated. These TDCs include gated ring oscillator TDCs, switched ring oscillator TDCs, relaxation oscillator TDCs, ΔΣ TDCs and MASH TDCs. The performance of sampling and noise-shaping TDCs is compared. The direction of future research on TDCs is explored.
Introduction
The advance of CMOS technology has resulted in a sharply increasing time resolution and a rapidly deteriorating voltage resolution. As a result, time-mode circuits where information to be processed is represented by time variables, that is, the difference between the time of the occurrence of two digital events, rather than the nodal voltages or branch currents of electric networks offer a technology-friendly means to combat challenges such as deteriorating voltage accuracies and shrinking dynamic ranges encountered in design of mixed analogue-digital circuits. Time-to-digital converters (TDCs) that map a time variable to a digital code are the most important building blocks of time-mode circuits. Although the deployment of TDCs in particles and high-energy physics for time-of-flight measurement in nuclear science dates back to 1970s [1, 2] , their applications in digital storage oscillators [3, 4] , laser range finders [5] , analogue-to-digital converters (ADCs) [6] [7] [8] , audio signal processing [9] , medical imaging [10] , positron emission tomography [10] , instrumentation [11] , infinite impulse response (IIR) and finite impulse response (FIR) filters [12, 13] , anti-imaging filters [14] , all digital frequency synthesisers [15] [16] [17] [18] [19] , multi-Gbps serial links [20] , programmable band/channel select filters for software-defined radio [21] , laser-scanner-based perception systems [22] , to name a few, emerged only recently. Various architectures and design techniques have appeared to improve the performance of TDCs, an in-depth examination of the principles and design techniques of CMOS TDCs, however, is not available. The goal of this review paper is to provide readers with a comprehensive treatment of the principles, architectures and design techniques of TDCs, and a critical assessment of the pros and cons of each class of TDCs. The remainder of this paper is organised as follows: Section 2 provides a loose classification of TDCs. The key performance indicators of TDCs are depicted in Section 3. Section 4 investigates sampling TDCs. Noise-shaping TDCs are dealt with in Section 5. Section 6 explores the direction of future research on TDCs. This paper is concluded in Section 7.
Classification of TDCs
TDCs can be loosely classified into sampling TDCs and noiseshaping TDCs. The former digitise a time variable using a highfrequency reference clock or delay lines, whereas the latter suppress the quantisation noise of TDCs using system-level techniques such as ΔΣ modulation and multi-stage-noise-shaping (MASH). Sampling TDCs can be further classified into single-shot TDCs and averaging TDCs. The former digitise a time interval in a single measurement, whereas the latter digitise a time interval using the average of the results of a set of successive measurements to minimise the effect of random error in measurement so as to achieve a better accuracy. It can be shown that the precision of averaging TDCs is inversely proportional to the square root of the averaged results of measurement [23] . A large number of TDCs fall into the category of sampling TDCs. These TDCs include direct-counter TDCs, direct-counter TDCs with interpolation, tapped delay-line TDCs, pulse-shrinking delay-line TDCs, cyclic pulse-shrinking delay-line TDCs, vernier TDCs, pulse-stretching TDCs, flash TDCs, successive approximation TDCs (SA-TDCs) and pipelined TDCs. Sampling TDCs have the common characteristic that they digitise time variables directly in an open-loop manner with no suppression of quantisation noise. For each time variable, there is a corresponding digital code. Thus, a one-to-one mapping between a time input variable and a corresponding output digital code exists. The resolution of these TDCs is lower-bound by quantisation errors. Noise-shaping TDCs, on the other hand, suppress in-band noise. A number of noise-shaping TDCs such as gated ring oscillator (GRO) TDCs, switched ring oscillator (SRO) TDCs, relaxation oscillator (RO) TDCs and MASH TDCs emerged. Although the in-band noise of noise-shaping TDCs is lower than quantisation noise, there is no one-to-one mapping between input time variables and digital output codes.
Characterisation of TDCs
The performance of TDCs is characterised by a number of parameters, among them, resolution, precision, linearity, voltage and temperature sensitivities, conversion time and range are the most important [24] .
The resolution of a TDC is the minimum time interval that the TDC can quantise ideally. In reality, the imperfections of TDCs such as non-linearity and clock jitter lower the resolution.
The precision of a TDC, often known as single-shot precision, is quantified by the standard deviation of measurement errors Δ 1 and Δ 2 when measuring a constant time interval. When a time variable specified by START and STOP pulses is measured using a reference clock and a counter, as shown in Fig. 1 , since the reference clock is asynchronous with START and STOP, single-shot measurement errors Δ 1 and Δ 2 exist and their value is uniformly distributed in [0, T c ]. Their standard deviation thus provides an effective measure of the measurement error in a statistical sense. It can be shown that the precision of averaging TDCs is inversely proportional to the square root of the averaged results of measurement [23] .
The non-linearity of a TDC is the deviation of the time-to-digital transfer characteristic of the TDC from that of an ideal TDC. For delay-line TDCs, it is caused by the difference between the delay of delay stages arising from process variation. The non-linearity of the delay-line TDC is usually quantified by differential non-linearity (DNL) quantifying the mismatch of the delay of adjacent delay stages and integral non-linearity (INL) quantifying the accumulative effect of the mismatch of the delay of the delay stages over the entire delay line. Since the resolution of the TDC is measured using least significant bit (LSB), both DNL and INL are typically quantified in LSB. Alternatively, the effect of the non-linearity of the TDC can be depicted using signal-to-noise-plus-distortion ratio (SNDR) over a specific frequency range. SNDR is obtained from the frequency response of the TDC by computing the ratio of the power of the signal to that of the noise and harmonic tones over a specific frequency range. SNDR is widely favoured over signal-to-noise ratio when quantifying the performance of TDCs as the lower bound of the dynamic range of TDCs is often dictated by in-band harmonic tones rather than in-band noise.
The voltage and temperature sensitivities of TDCs quantify the effect of supply voltage fluctuation and temperature variation on the time accuracy of the TDCs with units ps/V and ps/°C, respectively. They are typically obtained by varying the supply voltage by ±10% of its nominal value, temperature over the range −40 ∼ 100°C and measuring the resultant change of the time delay. Typical voltage sensitivity is 10-100 ps/V and temperature sensitivity is 0.1-0.01 ps/°C.
The conversion time of a TDC is the amount of the time that the TDC needs to complete the digitisation of a time variable. For a delay-locked loop (DLL)-stabilised delay-line TDC, the conversion time consists of one period of the reference clock, the lock time of the DLL, the propagation of D flip-flop (DFF) samplers and the time of thermometer-to-binary conversion. An alternative measure of the conversion time is throughput measured using the number of the samples that the TDC digitises per second. For applications such as laser range measurement, conversion time is typically not of a concern. However, for applications such as an all digital phaselocked loop (ADPLL), conversion time directly affects the time constant of TDC-based phase-frequency detector subsequently the loop stability of the phase-locked loop (PLL).
The range of a TDC is lower bound by the resolution and upper bound by the maximum time interval that it can digitise. For a tapped delay-line TDC, the lower bound is the time delay of one delay stage t and the upper bound is given by Nt where N is the number of the delay stages of the delay line. Since INL deteriorates with the increase in the number of the delay stages, the value of N is set by the maximum allowable INL.
Although power consumption is not an explicit performance indicator of TDCs, it is often a determining factor that affects the architecture, resolution, conversion time and dynamic range of TDCs. Quite often, trade-offs between performance and power consumption are made.
Sampling TDCs
Sampling TDCs digitise time variables with no suppression of quantisation noise. The resolution of these TDCs is lower bound by quantisation noise. Noise-shaping TDCs whose in-band noise is lower than quantisation noise are investigated in Section 5.
Direct counter TDCs
A direct-counter TDC quantises a time variable T in by counting the number of the cycles of a high-frequency reference clock within the duration of T in , as shown in Fig. 1 . The counter is started by START, synchronised with the reference clock and stopped by STOP. The random assertion of START and STOP results in quantisation errors Δ 1 and Δ 2 where Δ 1 , Δ 2 ∈ [0, T c ] are uniformly distributed in [0, T c ]. Direct-counter TDCs feature a large dynamic range lower bound by T c and upper bound by the size of the counter. In addition, they enjoy a superior linearity as the linearity is only determined by the stability of the frequency of the reference clock [2] . The resolution of direct-counter TDCs can be increased if Δ 1 and Δ 2 are further quantised using interpolation, as to be seen shortly. Since the quantisation errors Δ 1 , Δ 2 are caused by the asynchronisation of the reference clock with START and STOP, the quantisation errors can be reduced using delay-line TDCs to be studied in the next section. It should also be noted that when T in is small, the timing jitter of the reference clock cannot be neglected. Both Δ 1 and Δ 2 will be affected by the jitter of the reference clock [25] .
Tapped delay-line TDCs
Tapped delay-line TDCs shown in Fig. 2a where each delay stage has the same propagation delay improve the resolution from T c of directcounter TDCs to one buffer delay t [26] [27] [28] . The operation of tapped delay-line TDCs is briefly depicted as follows: the START signal propagates through the delay line while the STOP signal enables the D-flipflops to sample the output of the delay stages at the rising edge of STOP. In the example shown in Fig. 2b ,
thermometer-to-binary converter is often needed to convert D N … D 2 D 1 to a binary code. The dynamic range of a tapped delay-line TDC is lower-bound by one buffer delay t and upper-bound by the total time delay Nt of the delay line. The non-linearity of the delay-line TDC is determined by the mismatch of the delay of the Fig. 2 Delay-line TDCs [26, 28] a Configuration b Operation. X N , …, X 2 , and X 1 are sampled at the rising edge of STOP. The result is given by: X N … X 2 X 1 = 0 … 0111111 delay stages rising from the effect of process spread. Since INL of a tapped delay-line TDC deteriorates with the increase in the number of delay stages of the line, delay-line TDCs with a long delay line should be avoided. In addition to non-linearity, the delay of delay lines is also affected by supply voltage fluctuation and temperature variation. To minimise these effects, delay-locked loops used by Rahkonen and Kostamovaara [28] to stabilise the delay of delay stages two decades ago and shown in Fig. 2 are now a standard technique to minimise these effects. Since in this case START and X N are phase-aligned in the lock state, we have t = T c /N where T c is the period of START and N is the number of the delay stages. It becomes evident that t is independent of the delay of the delay stages and is only determined by the period of the input and the number of the delay stages of the delay line in the lock state. TDCs with DLLs locked to a reference also emerged [29] . Since STOP is asynchronous with START, the sampling of the output of delay stages might occur in the meta state of the DFFs, resulting in a long propagation delay subsequently a long TDC conversion time [30] .
Tapped delay-line TDCs are also known as flash TDCs because of their identical operation principle. A flash TDC digitises a time variable by comparing the edge of STOP to the time-displaced edge of START, as shown in Fig. 3a [31, 32] . In tapped delay-line TDCs, the displaced edges of START are connected to the data node of the arbiters, whereas in delay-line flash TDCs, they are routed to the clock node of the arbiters. The sampling of tapped delay-line TDCs takes place only at the rising edge of STOP, whereas delay-line flash TDCs samples STOP at each rising edge of the output of the delay stages. To improve resolution, vernier flash TDCs shown in Fig. 3b , similar to vernier TDCs, can be employed. The resolution of flash TDCs can be improved if the delay line is removed, as shown in Fig. 3c [33] . These TDCs operate based on the time offset of the arbiters caused by device mismatches and therefore termed sampling offset TDCs. Since the offset time of the arbiters differs, typically by 2-30 ps [31] , these TDCs need to be calibrated prior to their operation [32] . The preceding flash TDCs employ balanced arbiters, that is, arbiters with a zero offset time between their two inputs. Alternatively, a flash TDC can be constructed with unbalanced arbiters with a gradually increased offset time, as shown in Fig. 3d [34, 35] . Flash TDCs with parallel delay elements shown in Fig. 3e also emerged recently [36, 37] .
It was shown that the resolution of tapped delay-line TDCs is one buffer delay. Since the buffer is typically realised using two cascaded static inverters, one buffer delay is thus equal to two gate delays. The resolution of delay-line TDCs can be lowered to one gate delay using a pseudo-differential architecture [38] . To further improve the resolution to sub-buffer delay, interpolation between the rising edges of adjacent delay stages can be utilised. The active interpolation approach proposed in [39] (Fig. 4a) uses the weighted sum of the differential output voltages of v 1 and v 2 . If w = 1, that is, M1 = ON and M2 = OFF, v o is determined by v 1 only. Similarly, when w = 0, M1 = OFF and M2 = ON, v o is determined by v 2 . When both M1 and M2 are ON and their currents are determined by w. The preceding active interpolation consumes static power. Interpolation can also be implemented using passive networks such as resistor networks ( Fig. 4b ) with the drawback of non-negligible static power consumption as the resistance of the resistors needs to be small in order to meet speed requirements [40] . The phase interpolation method proposed in [41] (Fig. 4c ) uses hierarchical trees to increase time resolution without static power consumption. Another technique to achieve a sub-gate resolution is to use an array of DLLs [42] . The time variable is measured by a primary delay-line TDC. The gate delay of each delay stage of the primary DLL is further measured by a secondary DLL to obtain a sub-gate resolution. This approach, however, is less attractive in terms of power and silicon consumption because of the need for multiple DLLs.
Pulse-shrinking delay-line TDCs
It was shown earlier that the resolution of tapped delay-line TDCs is one buffer delay unless inter-stage interpolation is used at the cost of additional silicon and power consumption. Rahkonen and [34, 35] e Flash TDCs with parallel delay elements [36, 37] Kostamovaara showed that the resolution can be made below one buffer delay without interpolation if the tapped delay line is replaced with a pulse-shrinking delay line where the width of the propagating pulse decreases uniformly across the stages, as shown in Fig. 5 [28, 43] . Note that the outputs of the delay stages are sampled by RS-flipflips, rather than D-flipflops in delayline TDCs studied earlier. In addition, these RS-flipflops are reset by RESET at the beginning of the measurement. Fig. 5 shows the simplified schematic of a pulse-shrinking delay stage consisting of a current-starving inverter and a generic static inverter. When a pulse of width T in passes through the pulse-shrinking delay stage, the pulse width is reduced because of the controlled slow discharge of the load capacitor. The amount of time shrinkage can be adjusted by varying J. The per-stage shrinkage is determined as follows: a Fig. 4 Inter-stage interpolation using a Weighted sum of the voltages of adjacent stages [39] b Resistor networks [40] c Hierarchical trees [41] Fig. 5 Pulse-shrinking delay-line TDCs [28, 43] Pulse width is reduced by ΔT in each stage. To calibrate the TDC, Cal Req is asserted and the capacitor is fully charged. Calibration clock of known width T c is routed to the delay line by the MUX on the arrival of Cal En . If the width of the pulse X n is not zero, the capacitor will be discharged via the NMOS transistor. This process ends when the width of X n becomes zero pulse of known pulse width T c is applied to the TDC. The starving current J is tuned by a DLL in such a way that the pulse just vanishes when it reaches the output of the last delay stage. We therefore have T c = NΔT where ΔT is the shrinkage of the pulse width per stage and N is the number of the delay stages. If N is sufficiently large, ΔT can be made adequately small. Since the RS-flipflops will be triggered as long as the pulse width of X j where j = 1, 2, …, N is not zero and the amount of the pulse-shrinkage per stage is ΔT, the resolution of the calibrated pulse-shrinking delay-line TDC is given by ΔT, much smaller than the average propagation delay t = (1/2)(t PHL + t PLH ). The calibrated pulse-shrinking delay-line TDC can then be used to digitise an input time variable in a similar way as that of conventional delayline TDCs with the digitised output from the DFF samplers. It should be noted that since a large N is needed for a better resolution, the mismatch between the delays of the pulse-shrinking delay stages will affect the non-linearity of the TDCs in a similar way as that of delay-line TDCs.
Cyclic pulse-shrinking TDCs
Cyclic pulse-shrinking TDCs can be considered as a special class of direct-counter TDCs and pulse-shrinking delay-line DTCs. They use skewed delay stages, that is, delay stages that have a larger propagation delay as compared with the delay of the remaining stages of a delay line, as shown in Fig. 6 , to reduce the width of the propagating pulse [44] [45] [46] . In a cyclic pulse-shrinking delayline TDC, a counter is used to record the number of the round trips that an input pulse T in makes before it vanishes. The content of the counter when the pulse vanishes provides the digital code of the width of the input time variable. Since the amount of cycle-to-cycle time shrinkage remains unchanged, these TDCs exhibit a perfect linearity. One drawback of the cyclic pulse-shrinking delay-line TDC is that an input pulse can be applied only after the previous one diminishes completely [40] . Since only one skewed delay stage was used in the design in [44] , the rate of pulse width shrinkage per around trip is rather small, resulting in a long conversion time. The effect of temperature on the delay could be as high as ± 25% over 0-100°C and the shrinking pulse width prevents techniques such as DLLs to be used to minimise PVT effect [4] , thermal compensation that minimises the effect of temperature on the delay is needed [5] . Pulse-shrinking delay-line TDCs is effective only if pulse width is sufficiently large as compared with pulse-shrinkage per around trip.
Direct counter TDCs with interpolation
The resolution of direct-counter TDCs can be increased if Δ 1 and Δ 2 are further quantised using interpolation. In this section, we examine interpolation techniques for digitising Δ 1 and Δ 2 . 1) Direct-counter TDCs with tapped delay-line interpolation: It was shown earlier that the resolution of direct-counter TDCs is lower bound by quantisation errors Δ 1 and Δ 2 distributed uniformly over [0, T c ]. To increase the resolution of direct-counter TDCs, one needs to further digitise Δ 1 and Δ 2 . This is known as interpolation. Δ 1 and Δ 2 can be digitised directly using tapped delay-line TDCs to improve the resolution from T c of direct-counter TDCs to one buffer delay. To improve the resolution of interpolation, long delay lines are needed. The length of tapped delay-line TDCs, however, is upper bound by INL, which deteriorates with the increase in the number of delay stages. The achievable resolution of direct-counter TDCs with tapped delay-line interpolation is therefore rather limited.
2) Direct-counter TDCs with 2-step tapped delay-line interpolation: The resolution of tapped delay-line interpolation can be improved using two-step interpolation without employing a long delay line. In the approach proposed in [47] , Δ 1 and Δ 2 are first interpolated using two DLL-stabilised tapped delay lines of M stages that give resolution T c /M where T c is the period of the clock. Each two consecutive stages of the tapped delay lines are then interpolated using N-tap parallel interpolators to achieve a total of MN interpolation steps per clock period. The resolution is now T c /(MN) instead of T c /M of one-step interpolation. In [30] , the first-level interpolation is performed using a multi-phase sampling, whereas the second-level interpolation is carried out using vernier delay lines to achieve 24 ps resolution with a 160 MHz reference clock. A similar approach was used in [48] .
3) Direct-counter TDCs with time-stretching and interpolation: The asynchronisation of START and STOP with the reference clock might result in small Δ 1 or Δ 2 . In this case, tapped delay-line interpolation techniques will yield a poor result. To overcome this drawback, Δ 1 and Δ 2 can be first stretched using a time stretcher and then digitised. Both analogue interpolations that uses a time-to-amplitude converter (TAC) to convert Δ 1 and Δ 2 to a large voltage variation and then digitise the voltage variation using ADCs [23, 24, [49] [50] [51] and digital interpolation that first stretches Δ 1 and Δ 2 to KΔ 1 and KΔ 2 with K≫1 using the dual-slope approach of Nutt [52] and then digitises KΔ 1 and KΔ 2 using tapped delay-line TDCs [53, 54] exit. TAC is realised by discharging a precharged capacitor with a constant current from the start of T in to the end of T in . The voltage drop of the capacitor at the end of T in is directly proportional to T in and is digitised using an ADC, typically a flash ADC in order to meet time constraints [10, 50, 54, 55] . As pointed out in [50] , analogue time stretching provides a good single-shot precision of a few picoseconds but suffers from poor stability typically 10-30 ps/C. This method also suffers from high power consumption because of the use of flash ADCs. The following methods have been proposed for pulse stretching:
1. Charge-pump pulse stretching: Raisanen-Ruotsalainen et al.
showed that the quantisation errors Δ 1 and Δ 2 can be first stretched and then digitised, yielding a reduced quantisation error, as shown in Fig. 7 [55] . This approach was later used by many other researchers [4, 56, 57] . Pulses T 1 and T 2 are generated at the rising edge of START and STOP, respectively, with their falling edges aligned with the next rising edge of the reference clock. Pulse stretching starts with the assertion of the reset (RST) command that precharges C 1 and C 2 to V DD . The discharge of C 1 and C 2 is controlled by J 1 and J 2 , respectively. Since J 1 = NJ 2 and C 2 = MC 1 with M, N > 1, the discharge of C 1 is faster than that of C 2 . Discharge is initiated by T 1 . v o is set to HIGH and will remain HIGH until v c2 = v c1 . Since v c2 drops slower, it will take k cycles of CLK to establish v c1 = v c2 . The number of the cycles is recorded by the counter. The content of the counter provides the digital code of the quantisation error Δ 1 . The same process is followed when quantising Δ 2 . To determine k, from Δv c1 = Δv c2 where Δv c1 and Δv c2 are the voltage drop of C 1 and C 2 from V DD , respectively, and noting Δv c1 = (J 1 /C 1 )T 1 and Δv c2 = (J 2 /C 2 )T c k, we arrive at k = MN(T 1 /T c ) or equivalentT 1 = kT c = MNT 1 whereT 1 is the stretched version of T 1 . It is evident that T 1 is stretched by MN times. A notable advantage of the dual-slope pulse stretching approach is the reduced effect of power, voltage and temperature (PVT). This approach suffers from a speed penalty because of the slow discharge of C 2 . The need for a voltage comparator and two constant current sources also undermines its compatibility with technology scaling. 2. Regeneration pulse stretching: Abas et al. utilised the regenerative mechanism of SR-latches to stretch narrow pulses, as shown in Fig. 8a [58] [59] [60] [61] . A key advantage of this approach is its fast response and ability to amplify a small time variable. Since the gain of the amplifiers is set by the characteristics of the latch, it is strongly subject to the effect of PVT. Other drawbacks include a small input range and the poor linearity of the gain. Lee and Abidi [62] improved the input range by inserting two delay units to generate an unbalanced re-generation mechanism, as shown in Fig. 8b . A drawback of Lee-Abidi time amplifier is that the delay mismatch of the buffers might be significant if the input time difference to be amplified is small, resulting in a nonnegligible error. This drawback can be removed by using unbalanced active charge pump loads proposed in [63] .
3. Delay-lock loop pulse stretching: The DLL-based time amplifier proposed by Rashidzadeh et al. and shown in Fig. 9 uses a closed-loop approach to amplify time while minimising the effect of PVT [64, 65] . Two inputs are fed to two delay lines of the same number of delay stages, but different stage delays. The waveforms at nodes A and B are phase-aligned. Since φ A = φ in1 + 2π(t 1 / T ) and φ B = φ in1 + 2π(t 2 /T ), we have from φ A = φ B that φ in1 − φ in2 = 2π/T(t 1 − t 2 ). Since the overall propagation delay of delay line 1 is Nt 1 while that of delay line 2 is Nt 2 , we have T out = (N − 1) (t 1 − t 2 ). The DLL must be locked in order to provide precision amplification, limiting the use of this technique for high-speed applications. In addition, although the rest of the delay cells are the replicas of the first delay line, PVT does affect the delay of the delay cells. 4. Nakura pulse stretching: In [66] [67] [68] , a closed-loop time amplification scheme shown in Fig. 10 was proposed. The two pulses whose time difference is to be amplified propagate in the opposite directions in two separate delay lines having the same number of delay cells. The delay of the delay cells can be toggled between t 1 and t 2 by control signal X, specifically, t = t 2 if X = 1 (M5 and M6 are OFF and the delay t 2 is set by control voltages V p and V n ) t = t 1 if X = 0 (M5 and M6 are ON) and with t 2 = nt 1 where n is an integer. The ratio of t 1 to t 2 is controlled by the DLL to minimise the effect of PVT. Before v in1 and v in2 meet, the delay of the cells ahead of v in1 and v in2 is t 1 . When v in1 and v in2 collide, the delay of the cells ahead of v in1 and v in2 becomes t 2 . It can be shown that t o1 = Mt 1 + Nt 2 and t o2 = Nt 1 + Mt 2 . As a result, T out = t o1 − t o2 = (M − N )t 2 + (M − N )t 1 . The location where v in1 and v in2 meet is clearly determined by T in and t 1 , specifically, T in = (M − N )t 1 . The amplifier completes time amplification when v o1 and v o2 reach the end of the delay lines. The minimum time interval between two time amplifications is given by Mt 1 + Nt 2 + T in or Nt 1 + Mt 2 . In addition, the time gain is directly proportional to t 2 − t 1 and the length of the delay lines. 5. Dual-slope pulse stretching: Lee et al. [69] showed that a large time gain can be obtained by cascading multiple time amplifiers of gain of 2, as shown in Fig. 11 . The capacitors at nodes 1 and 2 are pre-charged prior to the arrival of inputs A and B. When A arrives while B = 0, C 1 will be discharged by two pull-down paths, one provided by M1 and the other provided by M3 while no charge of C 2 is retained. When B arrives, since M4 is switched off because of the drop of V 1 , only one pull-down path provided by M2 exists to discharge C 2 . V 2 drops at approximately half the rate of that of V 1 provided that all transistors have the same dimensions, as shown in Fig. 11 . The gain of the time amplifier is approximately 2 if all transistors have the same dimensions. Since the proper operation of the time amplifier requires that V 1 drops below the threshold voltage of M4 so that M4 can be switched off prior to the arrival of B, the application of this amplifier for small T in is rather difficult simply because C 1 cannot be drained completely. In addition, the gain of the amplifier is subject to PVT effect.
Vernier TDCs
TDCs with a sub-gate resolution can also be obtained using vernier delay lines where START and STOP propagate in two separate delay lines of the same number of delay stages but different perstage delays, as shown in Fig. 12 [70] [71] [72] . Since t 1 > t 2 , STOP signal in STOP-line will catch START signal in START-line provided that the lines are sufficiently long and Δt = t 1 − t 2 is not overly large. The time at which a catch-up takes place is determined from T catch = Nt 1 = Nt 2 + T in . For a given T in , N = ΔT/(t 1 − t 2 ). Clearly, the dynamic range of vernier TDCs is upper-bound by the length of the lines and lower-bound by t 1 − t 2 . To minimise the effect of PVT on the delay, DLL-stabilised vernier TDCs shown in Fig. 12 can be deployed.
To increase the dynamic range of vernier TDCs, two-level vernier TDCs consisting of a coarse vernier line and a fine vernier line with the former having a large delay difference Δt c and the latter having a small delay Δt f with t f ≪t c can be utilised [73] . Alternatively, cyclic vernier TDCs shown in Fig. 13 can be deployed [29] . Two delay lines having N 1 and N 2 stages are employed with their delays t 1 and t 2 set by the two DLLs: t 1 = 1/(M 1 N 1 f o ) and t 2 = 1/(M 2 N 2 f o ), respectively, where f o is the frequency of the reference clock and M 1 and N 1 are integers. START and STOP are fed to two cyclic loops; each consists of a delay stage and a NAND2 gate. Note that the delay of START cyclic loop is t 1 and that for STOP is t 2 . The loop delays are t 1 + t AND2 and t 2 + t AND2 . The loops are enabled on the arrival of START and STOP, and Y f samples X f at each cycle. A phase coincidence is detected when the sampled value changes from 1 to Fig. 10 Time amplifier proposed by Nakura et al. [66] [67] [68] Top: configuration. Bottom left: wave forms. Middle left: DLL to precisely control the ratio of t 2 to t 1 . The number of the delay cells in the two paths of the ratio control DLL is set such that the delay of the two delay lines is the same Fig. 11 2x-time amplifier [69] This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) 0. Once this occurs, Y f catches X f . Clearly the cyclic operation of the two delay loops removes the need for two long delay lines. In addition, although t NAND2 is comparable with t 1 and t 2 , since we are only concerned with the difference of the delay of START and STOP loops, t NAND2 plays no role in the accuracy of the TDC.
Successive approximation TDCs
Successive approximation is an effective means to perform analogue-to-digital conversion with low power consumption. SA-TDCs typically consists of a time comparator, a successive approximation register (SAR) and a digital-to-time converter (DTC), as shown in Fig. 14a [74] . The DTC maps a digital code D out to a pulse of width T f with the pulse width proportional to the value of D out ideally. The widths of T in and that of T f are then compared by the time comparator that compares with the widths of T in and T f . When their difference ΔT is sufficiently small, time-to-digital conversion is completed and the output of the TDC is given by the SAR. The key component of SA-TDCs is the DTC that can be implemented using either a capacitor array (Fig. 14b) [74] [75] [76] [77] or a delay line (Fig. 14c) [78, 79] . In delay-line DTCs, the input code selects the location of the output using a multiplexer, whereas in capacitor-array DTCs, the load capacitor of the current-starving inverter is digitally tuned by the output of the SAR. Delay-line DTCs enjoy a good linearity and low power consumption, but suffer from a small range. Capacitor-array DTCs, on the other hand, are silicon and power greedy especially when the number of bits is large.
Pipelined TDCs
Pipelined TDCs that provide a better throughput have also been proposed [80] [81] [82] . As compared with other TDCs, pipelined TDCs typically require more hardware simply because of the pipelined operation of these TDCs. Since in applications such as bio-electronics where power consumption is of a critical concern or wireless communications where a stringent constraint is typically imposed on SNDR, pipelined TDCs are less attractive for these applications. Table 1 tabulates the key performance indicators of sampling TDCs. It is seen that direct-counter TDCs enjoy a large dynamic range, but suffer from high quantisation noise and a low conversion speed. Direct-counter TDCs with interpolation have a much better resolution as compared with direct-counter TDCs, but suffer from a low conversion speed and high power consumption. Delay-line TDCs provide a better resolution and enjoy low power consumption, but suffer from a small dynamic range. Pulse-stretching TDCs enjoy low power consumption, but suffer from a long conversion. The same with pulse-shrinking TDCs. Vernier TDCs offer a good resolution at the cost of higher silicon and power consumption. Cyclic vernier TDCs removes the drawbacks of conventional vernier TDCs. Pipelined TDCs offer a superior resolution, a high conversion rate, low power consumption. Their range is rather small. Flash TDCs exhibit comparable performance as delay-line TDCs do. Sampling offset flash TDCs provide the best resolution among all flash TDCs. SA-TDCs offer a good resolution and low power consumption. As the choice of TDCs is largely dictated by applications, the familiarisation of the advantages and limitations of sampling TDCs will enable designers to make a better informed decision when choosing TDCs for a specific application. 5 Noise-shaping TDCs
Performance comparison of sampling TDCs
It was shown that sampling TDCs quantise time variables directly with their resolution lower bound by the quantisation noise that is distributed uniformly over the entire spectrum of the TDCs. Increasing sampling frequency lowers the quantisation noise uniformly over the entire spectrum rather than a specific frequency range of the TDCs. As a result, high power consumption is inevitable. Since in many applications such as ADPLLs where TDCs function as a phase detector, we are only interested in the performance of TDCs over the loop bandwidth of the ADPLLs, systemlevel approaches such as noise-shaping obtained from ΔΣ operations can be utilised to reduce the quantisation noise of TDCs well below that of sampling TDCs over a specific frequency range [86] . These TDCs are termed noise-shaping TDCs [6, [87] [88] [89] [90] . Recent advance in TDCs utilises the intrinsic advantages of both resolution-enhancing techniques such as interpolation in sampling TDCs and frequency-dependent noise suppressing techniques such as ΔΣ operation of noise-shaping TDCs simultaneously to improve the resolution of TDCs [91] [92] [93] . In this section, we investigate noise-shaping TDCs.
Gated ring oscillator TDCs
TDCs shown in Fig. 15 digitises time variables using a GRO [6] . The operation of GRO is briefly depicted as follows : At the assertion of START, gating signal EN is set to logic-1 and the ring oscillator is activated. The number of the cycles of the oscillator within the duration of START = 1 is recorded by the counter. When STOP is asserted, EN is reset and oscillation is terminated. The phase of the oscillator remains unchanged during this period of time. Since the frequency of the oscillator is constant during T in , the number of the oscillation cycles of the oscillator during T in is proportional to T in and provides the digital representation of T in . Furthermore, at the end of (k − 1)th sampling period, the phase of the oscillator is held unchanged during T in = 0 (neglecting leakage and disturbances coupled to the output nodes of the oscillator), allowing the residual phase of (k − 1)th sampling period, denoted by e f (k − 1), to be carried over to kth sampling period and become the initial phase of kth sampling period, denoted by e i (k), that is, e i (k) = e f (k − 1). The net phase accumulation in kth sampling period is thereby given by
where K vco is the voltage-to-phase gain of the oscillator. If we let e(k) be the residue phase (quantisation noise) in phase k, that is, e(k) = e f (k), we have e i (k) = e f (k − 1) = e(k − 1). As a result
The first-order noise-shaping of quantisation noise is evident from (1) . It can be shown that 0 ≤ e(k) ≤ 2π . To reduce the quantisation noise 2π to 2π/N where N is the number of the stages of the oscillator, the transition of the output of each delay stage can be utilised, as shown in Fig. 15 [6] . To avoid the countering error caused by sampling the outputs of the oscillator at transitions, cross-coupled buffers shown in Fig. 15 are typically used at the output of the delay stages.
Since the output nodes of the oscillator during the OFF-state (T in = 0) are floating, noise-shaping that originates from the continuity of the phase of the oscillator during the OFF-state of the oscillator is sensitive to the leakage of the pn-junctions at the output nodes and charge-sharing during the turn-off of the gating transistors [87] . Although the effect of charge sharing can be mitigated if the gating transistors are small in dimension, this is at the cost of limiting the oscillation frequency [88] . The floating nature of the holding states also makes the phase of the oscillator vulnerable to disturbances such as switching noise and cross-talks [89] . GRO-TDCs also suffer from count-missing caused by the premature reset of the counter by STOP while edge-detection and state-to-phase logic is still in action [89] . This problem can be mitigated by inserting a delay block between STOP and the counter, as shown in Fig. 15 . The power consumption of GRO-TDCs can be reduced if asynchronous counters are used, as demonstrated in [89] .
Switched ring oscillator TDCs
It was shown in the preceding section that GRO-TDCs are sensitive to charge leakage, charge injection, and switching noise. To eliminate this drawback, Konishi et al. showed that freezing the state of the oscillator in the off-state can be replaced by introducing another oscillation state of the oscillator. The oscillator oscillates between two different oscillation states and becomes a switched ring oscillator (SRO), as shown in Fig. 16 [87, 90] . Since the charge of the capacitors at the output nodes of the oscillator at the end of (k − 1)th phase is carried over in its entirety to the next kth phase, first-order noise-shaping characteristic intrinsic to GRO is preserved in SRO. Unlike GRO, since the frequency of the oscillator in each of the two oscillation states is well defined, there is no floating node. As a result, the issues associated with floating nodes such as leakage, switching noise, and charge injection encountered in GRO-TDCs vanish.
Relaxation oscillator TDCs
Cao et al. [95] [96] [97] showed that the first-order noise-shaping characteristic of GRO-TDCs is also possessed by RO TDCs, as shown in Fig. 15 Gated ring oscillator TDCs a GRO TDCs [6, 94] b Waveforms Fig. 17 . To illustrate this, assume C 1 is being charged while C 2 is discharged at t 1 . v c1 rises linearly with time while v c2 = 0. At t 2 where v c1 = V H , Q = 1 and Q = 0. C 1 starts to discharge and v c1 = 0 when fully discharged while C 2 starts to be charged. When v c2 = V H , Q = 0 and Q = 1. This process repeats. Since the charge of the capacitors is held unchanged when T in = 0, the residual phase of the oscillator in (k − 1) sampling period becomes the initial phase of kth sampling period, yielding the first-order noise-shaping. Since a distinct characteristic of ROs is their low sensitivity to PVT effect [98] , RO TDCs open a door for the realisation of ultra-low-power TDCs for applications such as passive wireless microsystems [99] .
MASH TDCs
MASH is an effective means to obtain high-order noise-shaping without sacrificing stability [86] . Fig. 18 shows the 1-1 MASH-TDC proposed by Cao et al. [97] . T in2 is generated with its START the first rising edge of the counter clock Q 1 and its STOP the same as that of T in1 . This quantisation error propagation method was introduced by Konishi et al. in [100] and further developed in their subsequent work [88, 101] . Quantisation error propagation is realised using a resettable DFF shown in Fig. 18 . Routing v o1 to the clock input of the DFF ensures that the DFF is triggered at the first rising edge of v o1 while connecting T in1 ensures that T in2 will have the same falling edge as T in1 does. To minimise the effect of the metastability of DFFs, two DFFs can be cascaded, as shown in Fig. 18 [101] . Cascading three DFFs was also used to further reduce quantisation noise [88] .
ΔΣ TDCs
Although ΔΣ configurations are effective in achieving a superior SNDR, their time-mode realisation is rather difficult because of the lack of time-mode integrators needed to achieve high-order ΔΣ modulators. As a result, ΔΣTDCs are often realised using a partial time-mode partial voltage-mode approach, more specifically, integrators are voltage-mode while quantisers are voltage-controlled oscillator (VCO)-quantisers [102] . Recently, time-difference accumulators that function as time integrators emerged [19] . The core of time-difference accumulators is a time register that holds an input time variable indefinitely and releases the held time variable on the arrival of a triggering signal, as shown in Figs. 19a and b [103] . The operation of the time register can be briefly depicted as follows: assume C 1 is fully charged initially, when v in1 arrives, v c1 starts to drop. When v in2 arrives, the gated delay cell enters its hold state and v c1 remains unchanged indefinitely if no leakage. When T r is asserted, the gated delay cell is re-activated and v c1 starts to drop again. If we assume that the two gated delay cells are identical, then the disrupted discharge process of C 1 is the same as that without disruption. It follows that T o = T in , that is, the time variable T in is stored by the time register and read out on the arrival of the triggering signal T r . The preceding time register can be utilised to construct a time adder, as shown in Fig. 19c [103] .
. By reversing the order of the inputs of the second time register, we arrive at T o = T in1 + T in2 . The preceding time adder was utilised in [19] to construct a time integrator to form a first-order time-mode ΔΣ modulator, as shown in Fig. 20 . The time accumulator consists of two back-to-back connected time adders to perform time accumulation. ΔT = T in − T f is performed using a time adder. ΔT is integrated over the sampling period by the accumulator. The output of the accumulator is digitised by the TDC and the output of the TDC is converted to time variable T f using a DTC. Since the time accumulator as a first-order integrator shown in Fig. 20 , it will provide 20 dB/dec noiseshaping, confirmed by the measurement results in [19] .
Comparison of noise-shaping TDCs
Since the resolution of noise-shaping TDCs varies with the order of noise-shaping, the performance matrices used for sampling TDCs such as resolution and precision are generally not used to depict noise-shaping TDCs. The following figure-of-merit (FOM) widely used to quantify the performance of noise-shaping ADCs on the basis of the amount of power per conversion step [104] [105] [106] [107] [108] is used to quantify the performance of noise-shaping TDCs
where BW is the bandwidth of the signal to be digitised, N is the effective number of bits and P is the power consumption of the TDC. N is obtained from N = (SNDR − 1.76)/6.02. Table 2 compares the performance of recently reported noise-shaping TDCs.
As compared with Table 1 , it is observed that the power consumption of sampling TDCs is generally higher as compared with that of noise-shaping TDCs. This is because the former achieve a high resolution using power greedy complex configurations such as vernier delay lines or multi-level interpolation, whereas the latter lower in-band noise and distortion using feedback with only a moderate increase in power consumption. Noise-shaping TDCs are therefore more attractive for low-power applications. The input frequency of noise-shaping TDCs is generally much lower than that of sampling TDCs because of oversampling constraints. To digitise high-frequency time variables, sampling TDCs are the preferred choice and are more attractive for high-speed applications.
Future research of TDCs
TDCs are one of the most studied mixed-mode systems and numerous architectures have been proposed to improve resolution, precision, conversion time and reduce power consumption. A number of stiff challenges, however, are yet to be overcome in order to improve their performance. To improve the resolution of TDCs, advanced design techniques such as cyclic vernier delay lines, multi-level interpolation and time amplification that are effective in reducing quantisation noise can be deployed simultaneously with noise-shaping techniques such as GRO, SRO or ΔΣ. The former lower the quantisation noise by spreading it over the entire frequency spectrum of the TDCs, whereas the latter reduce in-band quantisation noise and distortion through noise-shaping and oversampling. To achieve high-order noise-shaping, time-mode ΔΣ modulators are the choice. The recent emerge of time-mode accumulators has fueled the interest in searching for high-order time-mode integrators needed for high-order time-mode ΔΣ modulators without deploying MASH whose performance is degraded because of mismatches. Since VCO-based quantisers are multi-bit quantisers, the feedback of time-mode ΔΣ TDCs is a multi-bit DTC and its non-linearity critical affects the performance of the TDCs manifesting as both in-band harmonic tones and a rising noise floor, DTCs with a superior linearity or time-mode dynamic element matching techniques especially those with noise-shaping characteristics are to be developed. As the deployment of TDCs in mixed-mode signal processing is to combat the performance degradation caused by technology scaling, mixed-mode building blocks such as voltage comparators, voltage-mode integrators, charge pumps etc. whose performance scales poorly with technology should be avoided.
Conclusions
A comprehensive treatment of the principles, architectures and design techniques of CMOS TDCs has been provided. Key performance matrices of TDCs such as resolution, precision, nonlinearity, voltage and temperature sensitivities, conversion time and conversion range have been examined. Sampling TDCs that digitise time variables have been studied with an emphasis on their advantages and limitations. It has been shown that the resolution of sampling TDCs is lower bound by quantisation noise as its quantisation noise is uniformly distributed over the entire spectrum of the TDCs. The resolution of direct-counter TDCs is the period of the sampling clock. Resolution can be improved to one buffer delay if tapped delay-line interpolation is used to further digitise the quantisation errors of direct-counter TDCs. Although the longer the tapped delay line, the better the interpolation resolution, the non-linearity of tapped delay lines deteriorates with the increase in the length of the delay lines. To overcome the poor nonlinearity of long delay lines, two-level delay-line interpolation can be used. Resolution can be further improved to below one buffer delay by using pulse-shrinking delay lines, vernier delay lines, pulse-stretching TDCs or offset-sampling flash TDCs. Cyclic configurations are also favoured in improving linearity and minimising silicon cost. Noise-shaping TDCs that achieve a better resolution over a specific frequency range have also been investigated. Unlike sampling TDCs, GRO-TDCs offer an intrinsic characteristic of first-order noise-shaping. The noise-shaping of these TDCs, however, is undermined by charge leakage and cross-talk in the phase-holding state. Switched ring oscillator TDCs outperform GRO-TDCs by eliminating the effect of leakage, charge sharing and switching noise in the phase-holding stage. RO TDCs offer the key advantage of low sensitivity to PVT effect. Time-mode ΔΣ TDCs that utilise time accumulators as time integrators remove the need for voltagemode integrators whose performance scales poorly with technology. High-order time integrators, time quantisers with low quantisation noise and DTCs with a superior linearity critical to noise-shaping TDCs are yet to be developed.
Acknowledgment
The author is grateful to reviewers for their invaluable comments and suggestions. This paper could not be in its present form without the comments and suggestions of the reviewers.
References

