Abstract-A time-to-digital converter (TDC) often consists of sophisticated, multilevel, sub-gate delay structures, when time intervals need to be measured precisely. The resolution improvement is rewarding until integral nonlinearity (INL) and random jitter begin to limit the measurement performance. INL can then be minimized with calibration techniques and result postprocessing. A TDC architecture based on a counter and timing signal interpolation (the Nutt method) makes it possible to measure long time intervals precisely. It also offers an effective means of improving the precision by averaging. Traditional averaging, however, demands several successive measurements, which increases the measurement time and power consumption. It is shown here that by using several interpolators that are sampled homogeneously over the clock period, the effects of limited resolution, interpolation nonlinearities and random noise can be markedly reduced. The designed CMOS TDC utilizing internal systematic sampling technique achieves 3.0ps rms single-shot precision without any additional calibration or nonlinearity correction. Many techniques have been developed for the time interval digitization, perhaps the most common of which, as presented in Fig. 1(a) , uses the constant propagation delays of identical successive delay elements [17] . The start signal propagates in the delay line and the stop signal registers the state of the delay line, which reveals the number of LSBs between the start and stop. The resolution, τ1 in this case, is limited by the gate delay, which is the main technology-dependent parameter. A Vernier delay line Fig. 1(b) uses slow and fast delay elements, τs and τf respectively, in order to reach sub-gate delay resolution [17] , [18] . The stop-edge propagates in the faster delay line and reaches the slower start-edge of every element after an interval of τs-τf., which is the resolution (LSB) of the Vernier method. Again, the register result reveals the number of LSBs between the start and stop. A sub-gate delay resolution can be achieved by dividing the propagation delay by means of passive resistors as described in Fig. 1(c) [19] . Another approach, presented in Fig. 1(d) , connects the delay elements in parallel and creates differences into their delays (τ1-τ7) with unit capacitor scaling [20] . Several other efficient measurement methods and combinations of these have recently been proposed for time digitizing and have been shown to be able to achieve picosecond-level measurement performance [21]- [28] .
I. INTRODUCTION
time-to-digital converter (TDC) measures the time interval between two or more timing signals and presents the result in digital form. For the sake of simplicity, the timing signals are often called start and stop signals. High precision TDCs are used in many applications such as laser distance measurement [1] , [2] , high energy physics [3] , [4] , timing parameter verification of high-speed circuits and components [5] , [6] , medical imaging [7] , [8] , single-photon detectors [9] , [10] and Raman spectroscopy [11] , [12] . The use of time-to-digital converter techniques is increasing as traditional analogue signal processing is challenged by modern scaled IC-circuit technologies, which favour signal processing in the time domain. The critical analogue circuit blocks can be often replaced with a TDC-based architecture in all-digital PLLs [13] , [14] and in analogue-to-digital conversion (ADC) [15] , [16] Many techniques have been developed for the time interval digitization, perhaps the most common of which, as presented in Fig. 1(a) , uses the constant propagation delays of identical successive delay elements [17] . The start signal propagates in the delay line and the stop signal registers the state of the delay line, which reveals the number of LSBs between the start and stop. The resolution, τ1 in this case, is limited by the gate delay, which is the main technology-dependent parameter. A Vernier delay line Fig. 1 (b) uses slow and fast delay elements, τs and τf respectively, in order to reach sub-gate delay resolution [17] , [18] . The stop-edge propagates in the faster delay line and reaches the slower start-edge of every element after an interval of τs-τf., which is the resolution (LSB) of the Vernier method. Again, the register result reveals the number of LSBs between the start and stop. A sub-gate delay resolution can be achieved by dividing the propagation delay by means of passive resistors as described in Fig. 1(c) [19] . Another approach, presented in Fig. 1(d) , connects the delay elements in parallel and creates differences into their delays (τ1-τ7) with unit capacitor scaling [20] . Several other efficient measurement methods and combinations of these have recently been proposed for time digitizing and have been shown to be able to achieve picosecond-level measurement performance [21] - [28] .
As in ADCs, there are several error sources in TDCs that give rise to measurement uncertainty. The high resolution conversion structure is necessary in order to minimize quantization error. Relatively small variations in the delay element delays (differential nonlinearity (DNL)), can accumulate to a high measurement error (integral nonlinearity (INL)) when the timing signals propagate in a long delay line, for example. Thermal noise and noise in the supply and delay adjusting voltages also create random jitter in the critical time measurement signals. The effect of the error sources become more serious as the dynamic range of the TDC increases. For this reason, the direct conversion architectures shown in Fig. 1 , with only start and stop timing signals, are not typically used in wide-range TDCs aiming at high (ps-level) precision.
Instead of digitizing the time interval between the start and stop signals directly, the edges of a precise reference clock can be exploited. A simple counter can count the reference clock edges between the timing signals, providing a wide total measurement range at a "low price". In order to achieve subclock period resolution, an interpolator digitizes the time interval between the timing signal and the next/previous reference clock edge. Hence, two high-resolution interpolators are needed for the start and stop signals, but the dynamic range of the interpolators needs to cover only one reference clock cycle. The low jitter and stable reference clock can also be used for stabilization of the TDC under different operating conditions in this measurement technique, often called the Nutt method [29] .
This paper reviews first how the interpolation errors get produced within the interpolation cycle and how the systematic errors repeat themselves identically in different clock cycles. It is shown that the measurement error for a certain time interval depends on the location of the start signal within the reference clock cycle. This makes averaging with varying start locations an effective method for reducing the measurement error. This is not always possible, however, due to the single-shot character of the measurement or otherwise limited measurement time.
Generally, ps-level measurement resolution is often achieved with modern high speed technology and sophisticated, sometimes multi-level, interpolation structures. The nonlinearities (delay element mismatch) often limit the measurement range and performance and force to utilize lookup tables for INL compensation and calibration steps to ensure high enough accuracy in the measurement. A totally different approach is, however, presented in this study for high performance time digitizing. A stable, wide range, moderate resolution measurement architecture is created, where the interpolation channel can be easily duplicated. In the proposed approach a bunch of parallel measurement channels measure in the flash mode the same time interval between the start and stop signals with delayed sampling that covers the full clock period of the TDC (or several clock periods). Thus the proposed TDC can employ internal systematic averaging to minimize the effect of measurement errors, even in a single measurement shot.
Section II goes through the operation and characteristics of a Nutt-based TDC and explains how averaging can be used to enhance the performance of the method. Section III presents the TDC developed here on the basis of internal systematic averaging. The full measurement results prove the effectiveness of the concept demonstrating 3.0ps single shot measurement precision. 
II. TIME INTERVAL DIGITIZATION BASED ON A COUNTER AND TIMING SIGNAL INTERPOLATION

A. Operation
The use of a counter together with timing signal interpolation, also known as the Nutt method, is a well-known method for long-range, high-resolution time interval measurement [29] , [30] . The counter calculates the periods of a known frequency reference clock between the timing signals start and stop. The measurement range can be extended easily by increasing the width of the counter. The interpolators resolve the time intervals between the timing signals and the nearest reference clock edges (Tst and Tsp in Fig. 2 ) with high resolution (τlsb). Hence, the dynamic range of the interpolator needs to be only 1 reference clock cycle time (τref). The estimate of the time interval, Tm, is formed by combining the result of the counter, C, and the interpolator results, Rst and Rsp,
B. Realization A Nutt-based TDC in which the interpolation is based on a delay line, is a common approach on account of its stability, simplicity and effectiveness. The example structure, as presented in Fig. 3 , relies on the constant propagation delays of matched delay-adjustable digital cells connected in series. The reference clock signal propagates in the delay line and its rising edge creates time samples Φ0… Φ7 for the interpolation. The edge of the reference clock arrives at the beginning of the delay line at the same time as the previous edge leaves the chain. A phase detector detects these two signals and controls a charge pump to adjust the delay line control voltage Vctrl in the case of delay offset due to temperature or supply voltage change, for example. This delay-locked loop structure (DLL) locks the delay line delay to the reference clock cycle time τref and forces the interpolation resolution τlsb to a known fraction of τref. The start and stop timing signals connection to register clock inputs store the state of the delay line at the moment of their arrival, and the interpolation result can be decoded from the registers. The counter in Fig. 3 counts the number of full reference clock cycles between the timing signals, as stated above. Its counting, however, needs to be synchronized to the results of the interpolator in order that the results are compatible in all cases [30] - [32] .
C. Measurement uncertainty
Several error sources create measurement uncertainty in the Nutt-based time digitization. Quantization error results in finite measurement resolution when an analogue quantity is converted to a discrete value. In the presented architecture, Fig.  3 , the resolution corresponds the delay element propagation delay, τlsb, which is highly technology dependent.
Differential nonlinearity (DNL) describes the deviations of the quantization steps (resolution) from the ideal value of 1 LSB. Nonhomogeneity in the silicon process parameters or random variations in the layout and noise sources which systematically interfere with signal propagation, such as systematic crosstalk or supply voltage noise, create static delay differences in the delay elements, seen as non-homogenous measurement resolution.
Integral nonlinearity (INL) is a consequence of
accumulation of the errors in the resolution (DNL). When the reference signal propagates through the delay line, Fig. 3 , the delay deviations of individual elements sum and cause nonlinearity in the interpolation. The total delay line delay, including the DNLs, is nevertheless locked to the reference clock cycle time with a delay-locked loop (DLL). Hence the total sum of the DNLs over the delay line is 0, which sets the INL after the last element also at 0. The delay line in Fig. 3 is common and hence the INL is quite similar for both interpolators. The differences between start and stop interpolator INLs result mainly from the registers threshold variations. Jitter in the timing signals and interpolation phases causes random result variation. The jitter is caused by thermal noise, substrate noise and noise in the control and supply voltages. In the DLL-based architecture the effect of the reference clock jitter is low, usually below 1 ps [33] . The jitter of the delay elements, however, accumulate during the signal propagation and the maximum jitter is expected at the end of the delay line (in F7 in Fig. 3 ). The systematic interpolation errors, quantization error and interpolator INL, repeat themselves identically in different clock cycles. Also the random delay line jitter repeats its accumulation in every τref. Usually, the timing signals are asynchronous with respect to the reference clock, which means that the start has an equal probability of arrive at any location within τref. Hence, all the errors in the start and stop interpolations vary, when the same time interval is measured several times. It is important to note, that in the Nutt based TDC, due to the asynchronous character of the timing signals, also the systematic interpolation errors achieve random-like nature. Hence, the architecture is linear by nature (the expected value of the linearity error is zero) [30] , [34] .
The measured values, Tm, vary around the mean with a certain statistical variation that can be described with the standard deviation value σ, usually called the precision. The σ-value varies when the time interval changes, and hence rootmean-square (rms) value, σrms, also known as single-shot precision, can be used to indicate the precision within a certain measurement range. In the Nutt-based measurement architecture,
, (2) where σq= τlsb/√6 defines the rms effect of quantization, σinl-st and σinl-sp are the standard deviations of INLs in the start and stop interpolators and σjitter is the rms effect of jitter [35] .
D. Averaging
In the Nutt based TDC the results, Tm, vary around the mean, expected value, when the same time interval is measured several times. The same time interval can be measured A times, for example, and an average value Tm-ave can be calculated from the results. The variation between the averaged results becomes smaller, which means that the averaging improves the precision of the measurement.
A samples can be collected simply just by repeating the measurement A times (simple random sampling method, SRS), which improves the precision by √A. In this method the time position of the start hit is random within τref in each of the separate measurements. Hence, the collected samples may overlap or gather just to a certain part of the interpolator, which limits the precision improvement. A more representative sample group of the errors within τref forms if the A samples would be collected of the whole nonlinear region not just with equal probability but also evenly. This, however, demands systematic sampling methods.
The averaging, in fact repeating the same measurement many times, multiplies the measurement time, demands more resources for calculating the result, increases the power consumption and may prove impossible in many cases. Hence, a TDC architecture providing high precision with a single-shot is still needed (would be preferred). 
III. A TDC BASED ON SYSTEMATIC INTERNAL AVERAGING
This section presents a TDC that uses systematic internal (on-chip) averaging, whereupon the need for many successive measurements is replaced by the use of many parallel measurement channels, which provide interpolation results from all over the interpolation region. The goal was to minimize the interpolation error by multiple-sampling of non-related errors, which would partly compensate each others and thus improve the single-shot precision. The idea is not totally new [23] , [26] , but here the realization is totally different and the multi-sampling is combined to the linear Nutt based architecture, which makes ps-level precision possible in a wide measurement range.
The TDC was realized with 0.35µm CMOS technology, and the IC layout of the complete TDC is shown in Fig. 4 . The layout shows the timing core, the measurement registers and the decoding logic which converts the raw measurement data into binary words. The size of the TDC part is 2.6mm×6.6mm, including pads. The power consumption with a 3.3V supply voltage and 300kHz measurement rate is 215mW.
A. Operation & architecture
The TDC developed here, the architecture of which is shown in Fig. 5 , uses the DLL delay line interpolation method explained in section II. The external oscillator, fref=220MHz, provides a low, stable jitter reference signal for the measurement. The phase detector and charge pump adjust the delay line delay control voltage Vctrl, until the signals in the beginning and the end of the delay line, CLK+, are simultaneous, which stabilizes the delay line against PVTvariations. The delay element, presented in Fig. 6 , consist of two parallel delay-adjustable (current starved) inverters, with outputs that are combined with smaller inverters in operating in the opposite direction. This structure provides high resolution for the interpolation even though the resolution is based on the gate-delay principle, i.e. τlsb~inverter delay. The differential reference signal propagates through two parallel inverters which are in the opposite phase, which improves the nonlinearity and immunity to noise as compared with a singleended structure. The two small inverters maintain the phase difference between the two propagating signals, while the 64-element delay line creates 64 successive time phases of the reference signal with ~71ps resolution. The timing core in Fig. 5 , including delay line and 7-bit counter, is common to all the measurement channels. The counter provides the total measurement range up to 581ns. The counter output signals ΦCTR and the rapidly changing interpolator time phases, Φ0… Φ63, are wired to totally 256 parallel and identical measurement channels (interpolation registers). The power consumption is minimized by using AND-gates between the delay line and the measurement channels. The 7-bit counter begins counting when the rising edge of the start signal reaches the IC. At the same time, the AND-gates let the interpolation phases to the measurement registers. The timing signals, storing the state of the delay line, are delayed by τ2, in order that the interpolation phases have settled to register inputs. The counter counting is disabled and the interpolation phases go back low, after the last measurement channel, #256, has registered the delay line state. The proposed architecture averages internally by providing 128 samples of the time interval between single start and stop input pair. Half of the measurement channels in Fig. 5 , every second one, i.e. 128 in total, store the timing core state when the start signal occurs and the other half are for the stop signal. If all the 128 channels would register the timing signal at the same time, the measurement uncertainties would not vary and the averaging would not improve the precision. Hence, a buffer between every pair of measurement channels, delay τ3, will shift the sampling moment in the direction of the interpolation region, which provides variation in the interpolation error. The absolute value of the non-calibrated buffer-delay τ3 affects to the total sampling time. Here the sampling of a single timing signal takes 128×~200ps≈25ns and hence covers over 5τref, so that also the reference clock jitter will also be averaged. The measurement channel, in Fig. 7 , consists of 64+7 registers (latches) which store the state of the delay line and the counter value when the clock input (E) goes high. Latches were used instead of flip-flops and their dimensions were minimized in order to minimize the size and power consumption of the TDC. The compatibility of counter and interpolator results is verified with dual edge counter synchronization structure, presented in details in [31] .
The two adjacent start and stop measurement channels in Fig.  5 give one estimate for the time interval with 13-bit dynamic range and τlsb≈71ps resolution (1) . The internally averaged total result can be calculated just by summing the results of 128 channel pairs, which creates a 20-bit value with 0.56ps LSB size (LSBAVE=τlsb/128). The 2 LSB bits of the result do not provide improvement to the performance and can hence be removed, in order to decrease the width of the data bus, for example.
B. Interpolator nonlinearity
The interpolation DNL was estimated by collecting interpolation result histograms for 10M asynchronous measurements. The number of hits gathered into each interpolation slot reveals the DNL. The time samples for which DNL is positive (LSB is wider) get more hits than average, for example. The TDC DNL variation for all 256 measurement channels (128 start-and 128 stop-channels), including the 64 interpolation slots, is shown in Fig. 8 . The cross is the mean value for the slot DNL when all 256 channels are averaged. The crosses describe the delay element delay variation around the mean value of τlsb~71ps, which varies very little because the transistor sizes in the delay line are large. The last slots get less hits than average, probably due to nonhomogeneous layout. The max, min and σ values for the averaged delay element DNLs are 5.4ps, -41.5ps and 5.7ps respectively. The line over each cross describes the DNL fluctuation (max-min) between the 256 measurement channels. Different channels can have a totally different DNL even though the interpolation slot is created by the same delay element. The variation between the measurement channels, σ~9ps for every slot, results mostly from variations in the interpolation register thresholds. The register transistor sizes were minimized in order to achieve small size, low input capacitance and low power consumption, which at the same time will increase the variation in the time domain operation.
INLs were calculated for each interpolation slot in every interpolation channel based on the DNL data. Again, the crosses in Fig. 9 show the average INL for each interpolation slot and the line over each cross describes how much the INL varies between the 256 interpolation channels (max-min). The INL is at its maximum near the end of the interpolation cycle, because the last slots are shorter than average. The most important parameter from the precision point of view is the INL variation.
σinl-st and σinl-sp in different measurement channels fluctuated in the range 17.5ps -24.8ps. When the INL in each interpolation slot is known, the INL error can be subtracted from the measurement results. The INL data were stored in an INL lookup table (INL-LUT) for use in connection with the precision measurements. But, as shown in below, by using the systematic internal averaging approach, this compensation is actually not needed for ps-level single shot precision. 
C. Crosstalk and temperature drift
As explained before, also the systematic error sources get randomized in the Nutt-based TDC architecture, which creates more variation to the measurement result but makes the mean, expected result linear. The crosstalk between the timing signals near each other, however, still creates some systematic nonlinearity in the measurement result. The high slew rate timing signals, when they reach the IC, create noise in the supply voltages. The short time supply ringing is systematic and does not cause nonlinearity if the ringing due to start signal has ended before the stop arrives. With short time intervals, however, the noises of both timing signals combine and create static nonlinearity. The cross-talk error, shown in Fig. 10 , was measured by comparing the measurement results with those achieved using another TDC of known nonlinearity. With time intervals between 8ns… 581ns, the nonlinearity was less than the margin of error of the measurement setup, ±4ps.
Another reason for varying errors in accuracy results from temperature changes. The delay-adjustable delay line controlled by DLL keeps the resolution constant but the reference clock has some temperature drift and the start and stop signal input paths (input-cells, logic, wires and register thresholds) may also entail differences which vary with the temperature. When a constant time interval (100ns) was measured while the temperature of the measurement board was changed from -40ºC to +60ºC the total drift in the averaged measurement result over the whole temperature range was only 5ps.
D. Precision
The precision was measured by means of a pulse generator, power splitter and coaxial cables of various lengths. The differences in cable length provided a jitter-free time difference between the timing signals when the same pulse was fed to both cables. Fig. 11 shows an example of single-shot result distribution with and without INL-LUT, when the same time interval was measured 12800 times.
The precision measured at different time intervals is shown in Fig. 12 interpolations are the same and get subtracted in the total result calculation (1). The traditional style of using two interpolation channels yielded σrms≈42.0ps, which can be also calculated with (2) . The second curve from the top shows the precision variation also without internal averaging but when using an INL-LUT, which fixes the INL errors and leaves only the precision variation due to quantization. Hence the rms precision follows τlsb/√6, σrms-LUT≈30.6ps. The rms effect of random jitter, σjitter, in (2), can be calculated to be 9.8ps. Fig. 12 also shows the precision and its variation when the internal averaging was used. The two curves in the lower part are precisions with and without an INL-LUT. This architecture collects 128 measurement results for the time interval between single start and stop signals. In the case of random start signal location between the measurements used in averaging (SRSmethod), the expected improvement in precision would be √128≈11.3 at the expense of 128 times longer measurement time. In the proposed design, the interpolation samples are collected evenly over the nonlinear interpolator. The systematic internal sampling method improves the precision by 13.9 in a single measurement shot, so that the rms values for σrms- This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2019.2932156, IEEE Transactions on Instrumentation and Measurement interpolators, which solve the locations of the timing signals within the reference clock cycle. The counter makes the long measurement range possible and the interpolators define the measurement precision. A remarkable feature is that the interpolation errors vary with asynchronous timing signals. The quantization error, interpolator INL and random jitter can be noticed as result variation around the mean value, which changes linearly with the time interval. The result variation can be decreased by making many measurements with the same time interval and calculating the average result. The TDC developed here exploits internal averaging in order to achieve picosecond-level single-shot precision with relatively modest delay line resolution (~71ps). The measurement architecture was kept as simple as possible, and the measurement channels consist of simple registers, so that the number of channels can be easily multiplied. 256 measurement channels were integrated into the same circuit. Half of them sample the error-free time location for the startsignal and the other 128 channels similarly for stop. The sampling time covers several reference clock cycles and hence averages also the jitter of the external clock.
The prototype and the measurement results prove the efficiency of the TDC concept based on internal systematic averaging. An rms single-shot precision of 3.0ps was reached in a wide measurement range without manual calibrations or look-up tables, and this performance was attained with a quite robust 0.35µm standard CMOS technology. The nonlinearities do not set limit for the performance enhancement of the concept, as is often the case, and even better, below 1ps precisions, can be expected just by scaling the technology. More modern technology reduces the result variation in the averaging (especially σq), makes it easier to add even more measurement channels, reduces the circuit size and economizes on the power consumption. Calculation of the average result could also be integrated, which would reduce data transmission and increase feasible measurement rates. The developed architecture is compared to other high performance integrated wide range TDCs in Table I . The proposed straightforward concept offers stable operation, wide range, high linearity and high precision without technology limitations, external calibrations or look-up-tables. 
