

# Chen, Haochang and Li, David Day-Uei (2018) Multi-channel, Iow nonlinearity time-to-digital converters based on 20nm and 28nm FPGAs. IEEE Transactions on Industrial Electronics. ISSN 0278-0046 , http://dx.doi.org/10.1109/TIE.2018.2842787

This version is available at https://strathprints.strath.ac.uk/64051/

**Strathprints** is designed to allow users to access the research output of the University of Strathclyde. Unless otherwise explicitly stated on the manuscript, Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Please check the manuscript for details of any other licences that may have been applied. You may not engage in further distribution of the material for any profitmaking activities or any commercial gain. You may freely distribute both the url (<u>https://strathprints.strath.ac.uk/</u>) and the content of this paper for research or private study, educational, or not-for-profit purposes without prior permission or charge.

Any correspondence concerning this service should be sent to the Strathprints administrator: <a href="mailto:strathprints@strath.ac.uk">strathprints@strath.ac.uk</a>

# Multichannel, Low Nonlinearity Time-to-Digital Converters Based on 20 and 28 nm FPGAs

Haochang Chen <sup>D</sup> and David Day-Uei Li

Abstract—This paper presents low nonlinearity, compact, and multichannel time-to-digital converters (TDC) in Xilinx 28 nm Virtex 7 and 20 nm UltraScale field-programmable gate arrays (FPGAs). The proposed TDCs integrate several innovative methods that we have developed: 1) the subtapped delay line averaging topology; 2) tap timing tests; 3) a direct compensation architecture; and 4) a mixed calibration method. The code density tests show that the proposed TDCs have much better linearity performances than previously reported ones. Our approach is cost-effective in terms of the consumption of logic resources. To demonstrate this, we implemented 96 channel TDCs in both FPGAs, using less than 25% of the logic resources. The achieved least significant bit (LSB) is 10.5 ps for Virtex 7 and 5.0 ps for UltraScale FPGAs. After the compensation and calibration, the differential nonlinearity (DNL) is within [-0.05, 0.08] LSB with  $\sigma$  DNL = 0.01 LSB, and the integral nonlinearity (INL) is within [-0.09, 0.11] LSB with  $\sigma$  INL = 0.04 LSB for the Virtex 7 FPGA. The DNL is within [-0.12, 0.11] LSB with  $\sigma$ DNL = 0.03 LSB, and the INL is within [-0.15, 0.48] LSB with  $\sigma$ INL = 0.20 LSB for the UltraScale FPGA.

*Index Terms*—Carry chains, field-programmable gate arrays (FPGA), multichannel TDCs, time-of-flight, time-to-digital converters (TDC).

#### I. INTRODUCTION

**T** IME-TO-DIGITAL converters (TDCs) are extremely high-precision stopwatches. They are key components in many electronics systems and industrial products such as alldigital phase-locked loops, time-of-flight (ToF) mass spectrometers, and LIDAR or three-dimensional (3-D) ranging devices [1]–[6] used for robotics, self-driving vehicles, and solar photovoltaic deployment optimization. TDCs are also widely applied in space sciences [7], biomedical applications, such as positron emission tomography and fluorescence lifetime imaging microscopy (FLIM) [8]–[12], nuclear and particle physics [13], [14], and quantum communications [15].

Manuscript received December 20, 2017; revised March 3, 2018, April 17, 2018, and May 9, 2018; accepted May 11, 2018. Date of publication June 20, 2018; date of current version November 30, 2018. This work was supported in part by Strathclyde Institute of Pharmacy & Biomedical Sciences and the Royal Society under Grant IE140915, and in part by the Engineering and Physical Sciences Research Council under Grant EPSRC: EP/M506643/1, U.K. All results can be fully reproduced using the methods described in this paper. (*Corresponding author: David Day-Uei Li.*)

The authors are with the Faculty of Science, University of Strathclyde, G4 0RE Glasgow, Scotland (e-mail: haochang.chen@strath.ac.uk; david.li@strath.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIE.2018.2842787

TDCs can be realized through analog or all-digital methods [16]. Recently, all-digital topologies have become very popular, and they can be implemented in application-specific integrated circuits (ASIC) [17], [18] or field programmable gate arrays (FPGA) [14], [19]–[34] with a subnanosecond resolution. ASIC-based TDC is a mature solution because of its better precision and linearity [35]. However, ASIC approaches tend to be more suitable for large-scale application specific or general purpose commercial developments. Compared with ASIC approaches, FPGAs are able to provide greater flexibility with a lower cost and a shorter developing cycle. With the rapid advances of FPGA technologies, powerful design environments, and a wide variety of applications, FPGAs are suitable for design verification, scientific experiments, and high-end instruments, and have become an ideal platform for integrated system design.

The timing resolution is a primary parameter for a TDC. For the simplest counter-based TDCs, the resolution is limited by the frequency of the driving clock [16]. To break the limitation and achieve a picosecond resolution, the Vernier delay line [33] and tapped delay line (TDL) [20], [32], [34] architectures have been presented and widely applied. The TDL has become a mainstream method for implementing FPGA-TDCs recently [14], [19], [36], since it can be easily realized by using the carry chain modules in FPGAs. The resolution of TDL-TDCs is mainly determined by the manufacturing process of FPGAs, and it has been improved from 200 to 3.9 ps root mean square (rms) since 1997 to 2017 [34], [37].

The wave union method [19], multichain averaging method [26], [38], matrices of counters [32], and two-dimensional (2-D) Vernier structure [31] have been presented to break this "process-related" limitation as well. These methods can achieve a better resolution than raw-TDL TDCs, however, they usually require much more logic resources, and have higher system complexity or a much longer dead time.

The nonlinearity of a TDC is another vital parameter, since it can influence the measurement precision directly. The nonlinearity can be characterized by the differential nonlinearity (DNL) and the integral nonlinearity (INL) based on the code density tests [16], [35], expressed as follows:

$$\mathsf{DNL}[k] = \frac{(W[k] - Q)}{Q},\tag{1}$$

$$INL[k] = \sum_{n=0}^{k} DNL_n$$
(2)

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/

respectively, where W[k] is the binwidth of the *k*th bin and *Q* is the ideal code binwidth. By optimizing the circuit design and layouts, ASIC-based TDCs can achieve DNL < 1 LSB (least significant bit) and INL < 1 LSB [17], [35]. Compared with ASIC-TDCs, FPGA-TDCs usually show worse linearity performances. For TDL-TDCs, the clock skews and the poor uniformity of carry chains [16], [24] are the main culprits for the nonlinearity, missing codes, and bubble problems. It is difficult to remove them completely. To reduce the nonlinearity, the dual-phase [28], downsampling [36], TDL reorganization [24], [25] and tuned-TDL [29] methods have been presented recently. A ones-counter encoder was reported to remove the bubbles in FPGA-TDCs [37]. However, these methods still cannot enhance the linearity up to the level comparable to ASIC-TDCs [17].

The demand for multichannel TDCs has been growing strongly especially for applications such as ToF measurements for 2-D and 3-D ranging, LIDAR, time-resolved spectroscopy, and fast-FLIM [11], [12], [39]–[41] that require real-time acquisition. ASIC-based multichannel designs are reliable and competitive in the aspects of the resolution, linearity, and power consumption. The latest FPGAs have also great potential for implementing multichannel TDCs, as they provide a massive amount of logic and IO ports with fast and flexible development tools. Many multiple-channel TDCs have been reported based on both ASIC and FPGA devices in the last few years. For most ASIC multichannel TDCs, the number of channels achieved is around tens of channels [8], [42], [43]. Several designs with hundreds, even to a thousand of channels TDCs were reported specifically for fast FLIM and 3-D ranging applications [6], [39], [44]. However, the targeted specifications of these TDCs are not aimed for high linearity, but are limited due to system requirements such as low power consumption and a higher fill factor. FPGA-based multichannel TDCs [21], [25], [45]-[47] are able to implement more than ten or even hundreds of channels within a single FPGA, however their linearity cannot compete with ASIC-TDCs. Most multichannel TDCs published earlier are not able to achieve a large channel number, a high resolution, and high linearity simultaneously.

Various procedures are required for calibrating process, voltage, temperature (PVT) variations and nonlinearities [48], [49]. In an FPGA, the influence of voltage jitters can be negligible since the power noise has been effectively restrained [23]. The temperature variations will influence the delay speed of a TDL resulting in LSB variations. Several methods were reported [28], [38], [45] to compensate them by using look-up table methods or correcting temperature coefficients. The static nonlinearities are mainly caused by the nonuniformity of TDLs and clock distributions. The bin-by-bin calibration techniques [14], [21], [45], [49] can be used for correcting nonlinearity offsets, whereas the binwidth calibration techniques [22], [30] are for reducing the nonlinearity of the binwidth.

Chen *et al.* [30] presented a missing-code free FPGA-TDC through combining the direct-histogram architecture and the tuned-TDL method. Although this method improved the linearity greatly, it is not suitable for multichannel applications due to the larger consumption of resources for implementing histogram counters. To achieve a TDC with 1) *high linearity*,



Fig. 1. Block diagram of the sub-TDL TDC in a Virtex 7 FPGA.

2) a long measurement range, and 3) low consumption of digital resources, we present several new methods (and implement multichannel TDCs) listed as follows.

- 1) A sub-TDL averaging topology is presented to achieve fast removals of the bubbles and zero-width bins and preliminary suppression of the nonlinearity.
- A unique tap timing test based on the sub-TDL topology is proposed to calculate the actual tap timings in a TDL.
- A new direct histogram compensation architecture and a mixed calibration method are developed to boost linearity with minimum logic resource cost.
- To demonstrate our approaches, we implemented 96channel TDCs in both the Virtex 7 XC7V690T and the UltraScale XCKU040 FPGAs.

# **II. DESIGN AND ARCHITECTURE**

Since the Virtex 7 FPGAs are different from UltraScale FP-GAs in the arrangements of logic modules, we will describe the proposed approaches, but with different configurations and methods selected for implementing our multichannel TDCs. We will demonstrate how the proposed methods achieve to improve the linearity by comparing with traditional topologies.

# A. TDL-TDC and Sub-TDL Averaging Topology

As shown in Figs. 1 and 2, TDLs were implemented with the cascaded carry chain modules, CARRY4 in Virtex 7 and CARRY8 in UltraScale shown in Fig. 2(a) and (b), with the carry output ports as the taps and a column of sampling D-flip-flops (D-FFs). The input port of the TDL can be connected to photon sensors such as single-photon avalanche diodes (SPAD) and photomultiplier tubes. When a new photon event is detected, the hit signal with a 0-to-1 or 1-to-0 transition will propagate along the TDL. The states of the hit signal in a TDL are registered by the D-FFs at the taps sampled by the clock. The states are assembled and represented as thermometer codes (1 111 000... or 0 000 111...), before being converted to one-hot



Fig. 2. Block diagram of the carry chain and the TDL implemented in (a) Virtex 7 and (b) UltraScale FPGAs

codes (0 001000...) by the thermometer code edge detectors (T2OH) shown in Fig. 1. The one-hot codes will then be converted to normal binary codes as the fine codes by the OH2BIN converters. To construct the histogram, the fine codes are used as the addresses of the memory. With coarse and fine code structures, the FPGA-TDC is able to achieve a longer measurement range.

The carry chain module contains a series of multiplexers (MUXs) as the basic delay elements of a TDL, as shown in Fig. 2. The CARRY4 module contains four MUXs in Xilinx 7 (both Virtex 7 and Kintex 7) FPGAs, and the CARRY8 contains eight MUXs in new 20 nm UltraScale and 16 nm UltraScale+ FPGAs. Traditional TDL-TDCs splice all carry outputs to a single thermometer code directly. However, the dedicated fast lookahead carry logic architecture in the CARRY modules contributes to significant nonlinearity, missing codes, and serious bubble problems due to the mismatch in the propagation delay along the delay lines [27], [37]. The shorter the tap interval becomes, the more serious the bubble and the missing code problems will be introduced. To solve this problem, we proposed a sub-TDL averaging topology to rearrange and regroup the carry outputs into several subsections with a shorter thermometer code as shown in Fig. 1. For a Virtex 7 FPGA, a TDL is separated into four sub-TDLs. The fine codes of the four sub-TDLs are summed up to form averaged TDL subsequently. This method is applied similarly to an UltraScale FPGA, but dividing a TDL into eight sub-TDLs. Using the sub-TDL topology is equivalent to using a less advanced process by elongating the tap interval, and therefore removing the appearances of bubbles. The LSB or the bin size of a TDL is equal to the full-scale range divided by the number of taps. From Fig. 1 for a raw TDL (4n taps) and a sub-TDL (n taps) built by n CARRY4s in Virtex-7 FPGA, the LSB of a raw TDL is as follows [50]:

$$\text{LSB}_{\text{raw}} = \left(\sum_{i=0}^{n-1} \sum_{j=0}^{3} \Delta t_{j,i}\right) / 4n = 4n \cdot \Delta t_{\text{AVE}} / 4n = \Delta t_{\text{AVE}}$$
(3)



Fig. 3. Code density test results of the (a) four sub-TDLs and (b) eight sub-TDLs in Virtex 7 and UltraScale FPGAs, respectively.

where  $\Delta t_{j,i}$  is the propagation delay of the *j*th tap in the *i*th CARRAY4 module,  $\Delta t_{AVE}$  is the average propagation delay of a tap (the exact delay model should include all delays on the routes and buffers to D-FFs, but here we only adopt a simple model). Also from Fig. 1, the LSB of the sub-TDL (the total delay divided by *n*) is around (note that the sampling instances of the sub-TDLs are different, but the delays are similar)

$$\text{LSB}_{\text{sub}} \approx \left(\sum_{i=0}^{n-2} \sum_{j=0}^{3} \Delta t_{j,i}\right) / n \approx [4(n-1) \cdot \Delta t_{\text{AVE}}] / n.$$
(4)

The sub-TDL averaging method for using four sub-TDLs together to obtain averaged TDL can be considered as a new bubble-free version of the multichain-TDL technique [26]. The original multichain-TDL technique used multiple TDLs to obtain a TDL with a smaller binwidth, but it still requires extra logic circuits to remove bubbles. Similar to [26, Eq. (4)], the LSB of averaged TDL

$$\text{LSB}_{\text{Ave}} \approx [4(n-1) \cdot \Delta t_{\text{AVE}}]/4n \approx \text{LSB}_{\text{sub}}/4.$$
 (5)

When  $n \gg 1$ , we have  $LSB_{Ave} \approx LSB_{raw}$ .

The code density test results of the individual sub-TDLs for both Virtex-7 and UltraScale are shown in Fig. 3. The advantage of the sub-TDL averaging approach is that as the equivalent binwidth of the sub-TDLs has been multiplied, the bubbles will not exist anymore in the sub-TDLs. Averaged TDL has no zerowidth bins (DNL = -1) and the number of missing codes (DNL  $\leq -0.9$ ) [50] is reduced effectively, as shown in Fig. 5. Since the missing codes still exist in averaged TDL, additional methods are required to improve linearity.

# B. Tuned-TDL and Tap Timing Test

The TDLs in the Virtex 7 and UltraScale FPGAs are shown in Fig. 2(a) and (b). Each delay element "MUX" contains two types (CO and O) of outputs. These two types of output signals have different delays [29]. The tuned-TDL method selects one of the two output types to improve the linearity. In our work, the tuned-TDL method is used with the sub-TDL topology. For



Fig. 4. Timing diagram based on the tap timing tests of the 16 taps in the UltraScale FPGA.

the CARRY4 module in a Virtex 7 FPGA, the *CO* and *O* output ports are mutually exclusive, whereas in the CARRY8 module in an UltraScale FPGA, the *CO* and *O* ports are all registered within the same CLB module. Therefore, each CARRY8 is able to generate 16 carry outputs. In 2016, a dual-sampling method using all 16 carry outputs was presented with a bin size of 2.25 ps [27]. However, the bubble problems and the nonlinearity are exacerbated with the reduced binwidth.

The actual timings of the TDL taps are desired for investigating the uniformities of the carry chains and the clock skews. Since the circuits of CARRY chains are fixed in FPGAs, the binwidth and location of missing-codes are static and therefore predictable. We therefore proposed tap timing tests to quantitatively analyze the time intervals of taps and select the taps with better intervals based on the sub-TDL topology. Similar to code density tests, an amount of random hit signals are fed into the TDC, and the 16 binary codes ( $B_n$ , n = 0, ..., 15) converted by the OH2BIN converters from all 16 sub-TDLs (CARRY8) are directly readout and collected. This set of the binary codes are generated after every measurement. The timing differences between taps,  $D_n$ , can be calculated by the following equation:

$$D_n = \frac{\sum_{m=0}^{m=L-1} (B_{n,m} - B_{n+1,m})}{L}, n = 0, 1, \dots 14$$
 (6)

where *L* is the number of the measurements and  $B_{n,m}$  is the *n*th binary code for the *m*th measurement. From (6), a set of timing differences from  $D_0$  to  $D_{14}$  can be quantified. Fig. 4 illustrates the results of the tap timing tests. It shows the ideal and the actual bin timings. The actual bin timings show that the widest bin is about 2.3LSB ( $CO_1$  to  $CO_5$ ) and the narrowest bin is less than 0.1LSB ( $CO_7$  to  $CO_4$ ). The highlighted sub-TDLs (in red) indicate how the mismatched timings of the bin boundaries contribute to the nonlinearity of FPGA-TDCs. The number of the used TDL taps depends on the requirements of applications

TABLE I LINEARITY PERFORMANCE BETWEEN A RAW-TDL AND THE AVERAGED-TDL

|                                   | Virtex 7      | (28nm)        | UltraScale (20nm) |               |  |
|-----------------------------------|---------------|---------------|-------------------|---------------|--|
| Unit: LSB                         | Raw-TDL       | Ave-TDL       | Raw-TDL           | Ave-TDL       |  |
| DNL                               | [-1, 3.78]    | [-0.95, 1.77] | [-1, 8.09]        | [-0.96, 2.56] |  |
| DNL <sub>pk-pk</sub>              | 4.78          | 2.73          | 9.09              | 3.53          |  |
| $\sigma_{DNL}$                    | 1.15          | 0.52          | 1.79              | 0.74          |  |
| RMS bin-width                     | 1.53          | 1.13          | 1.85              | 1.12          |  |
| INL                               | [-0.88, 5.90] | [-2.54, 2.61] | [-7.44, 13.88]    | [-0.90,5.38]  |  |
| $INL_{pk-pk}$                     | 6.78          | 5.14          | 21.32             | 6.28          |  |
| $\sigma_{\scriptscriptstyle INL}$ | 1.10          | 1.03          | 4.58              | 1.21          |  |

and the proper taps to be selected with relatively uniform time intervals. There is a tradeoff between the resolution and the linearity achieved. In this case, we selected 8 out of the 16 taps in a CARRY8 with the average bin size of 5 ps (LSB).

Table I, and Fig. 5(a) and (b) show the code density test results and the binwidth distributions of the raw TDLs and averaged TDLs in both FPGAs. The DNL is reduced from [-1, 3.78] to [-0.95, 1.77] LSB, and the  $\sigma_{DNL}$  is reduced from 1.15 to 0.52 LSB for the Virtex 7 device. For the UltraScale device, the DNL is reduced from [-1, 8.09] to [-0.96, 2.56] and the  $\sigma_{DNL}$  is reduced from 1.79 to 0.74 LSB. The binwidth distributions, see Fig. 5(c) and (d), show that the zero-width bins (DNL = -1) are totally removed from both FPGAs, and the width of the widest bins are well controlled such that DNL < 2 LSB. The root mean square (rms) binwidth is improved from 1.53 to 1.13 LSB for the Virtex 7 FPGA and from 1.85 to 1.12 LSB for the UltraScale FPGA.

# C. Compensated Histogram and Mixed Calibration Method

The high consumption of FPGA resources makes the previous design not friendly for multichannel design [30]. To



Fig. 5. DNL results of the raw and Averaged TDCs implemented for (a) Virtex-7 and (b) UltraScale FPGAs, and binwidth distributions of the raw and Calibrated TDCs for (c) Virtex-7 and (d) UltraScale FPGAs.



Fig. 6. Block diagram of the histogram compensation with mixed calibration in Virtex-7 and UltraScale FPGAs.

simultaneously achieve the optimized linearity, fast calibration, and low resource consumption, we proposed a direct histogram compensation and a mixed calibration method, see Fig. 6.

The measured events are expressed by fine codes and counted in the corresponding bins of the histogram. In a raw TDC, large quantization errors are generated since the time intervals between two adjacent TDL taps are largely nonuniform, and only one binary code is processed in each measurement. A compensation approach has been introduced in 2016 to solve this problem, but it was only used for postprocessing, introducing much more processing time especially in multichannel applications [46]. To achieve the fast and direct histogram compensation, we reassigned the fine code to a main bin calibration factor ( $BCF_m$ ) and a compensation bin calibration factor  $(BCF_c)$  when a hit signal is measured. These two factors (BCF<sub>m</sub>, BCF<sub>c</sub>) are the fine code outputs of compensated TDC. To calculate the BCF<sub>c</sub> and BCF<sub>m</sub>, the binwidth of the raw TDC needs to be estimated by performing code density tests first. The kth code transition level T[k] is needed for calculating the main and compensated



Fig. 7. Concept of the histogram compensation method.



Fig. 8. (a) DNL plot and (b) binwidth distributions of the Compensated TDCs for Virtex-7. (c) DNL plot and (d) binwidth distributions of the compensated TDCs for UltraScale FPGAs.

binary codes

$$T[k] = \sum_{n=0}^{k-1} W[n] = \sum_{n=0}^{k-1} \{ \text{LSB} \times (\text{DNL}[n] + 1) \}$$
(7)

where W[n] is the code binwidth of the *n*th bin. BCF<sub>m</sub> and BCF<sub>c</sub> are calculated accordingly. For the bins located within a single ideal normalized bin (highlighted in blue in Fig. 7), only BCF<sub>m</sub> is valid for readdressing the measured result. For bins which covers across different bins (highlighted in red), both BCF<sub>m</sub> and BCF<sub>c</sub> will be generated to address two bins at once. This process can be simplified as below pseudocode:

| $if(T_{actual}[k] < T_{ideal}[k])$                         |
|------------------------------------------------------------|
| $if \left( T_{\rm actual}[k+1] < T_{\rm ideal}[k] \right)$ |
| $BCF_m = K - 1$                                            |
| $BCF_c = void$                                             |
| else if $(T_{ideal}[k] < T_{actual}[k+1])$                 |
| $BCF_m = K - 1$                                            |
| $BCF_c = K \dots$                                          |
|                                                            |

The histogram compensation method will correct the measurement bias by readdressing the fine codes, and the missing codes are compensated as well. As shown in Fig. 8(a) and (c) and Table II, the linearity of compensated TDC is further improved. The DNL is reduced to [-0.73, 0.79] LSB, and  $\sigma_{DNL}$ is reduced to 0.29 LSB for the Virtex 7 FPGA. The DNL is

TABLE II LINEARITY PERFORMANCE BETWEEN THE COMPENSATED TDC AND THE CALIBRATED TDL

|                            | Virtex 7      | (28nm)        | UltraScale    | e (20nm)      |
|----------------------------|---------------|---------------|---------------|---------------|
|                            | Compensated   | Calibrated    | Compensated   | Calibrated    |
|                            | ŤDC           | TDC           | ŤDC           | TDC           |
| LSB (ps)                   | 10.5          | 54            | 5.0           | 2             |
| DNL (LSB)                  | [-0.73, 0.79] | [-0.05, 0.08] | [-0.75, 0.86] | [-0.12, 0.11] |
| DNL <sub>pk-pk</sub> (LSB) | 1.52          | 0.13          | 1.61          | 0.23          |
| $\sigma_{DNL}(LSB)$        | 0.29          | 0.01          | 0.35          | 0.03          |
| INL (LSB)                  | [-1.91,1.30]  | [-0.09, 0.11] | [-1.38, 1.98] | [-0.18, 0.46] |
| INL <sub>pk-pk</sub> (LSB) | 3.21          | 0.20          | 21.32         | 0.65          |
| $\sigma_{INL}$ (LSB)       | 0.63          | 0.04          | 0.64          | 0.16          |
| $W_{eq}(ps)$               | 11.73         | 10.55         | 5.85          | 5.03          |
| $\sigma_{eq}(ps)$          | 3.39          | 3.04          | 7.24          | 1.45          |



Fig. 9. Flow diagram of the TDC measuring events in the Virtex-7 FPGA.

reduced to [-0.75, 0.86] LSB, and  $\sigma_{DNL}$  is reduced to 0.35 LSB for the UltraScale FPGA. Fig. 8(b) and (d) show that the distributions of the binwidths (for both TDCs) are well-shaped, showing no missing codes. Since the minimum DNL has been improved to be better than -0.8 LSB after the compensation, the binwidth calibration is feasible for enhancing the linearity further. The code density test needs to be re-executed after the BCF<sub>m</sub> and BCF<sub>c</sub> are loaded. The binwidth calibration factor set of calibrated TDC contains two vectors for the main and compensation histogram(WCF<sub>m</sub> and WCF<sub>c</sub>), respectively. The results of the second code density test can be used to calculate the WCF<sub>m</sub> and WCF<sub>c</sub>:

$$WCF_{m}[k] = \frac{1}{DNL\{BCF_{m}[k]\} + 1}$$
$$WCF_{c}[k] = \frac{1}{DNL\{BCF_{c}[k]\} + 1}.$$
(8)



Fig. 10. (a) DNL and (b) INL plots of the compensated and calibrated TDCs for the Virtex-7 FPGA, and (c) DNL and (d) INL plots of the compensated and calibrated TDCs for the UltraScale FPGA.

 $BCF_m$ ,  $BCF_c$ ,  $WCF_m$ , and  $WCF_c$  can be calculated by using the offline methods (MATLAB) or on-the-fly approaches (on-chip processing). A probability profile of the code distribution can be obtained through code density tests, and then the histogram compensation performs code reassignment based on the probability profile and use of the binwidth calibration techniques to correct the distribution. To save the resource consumption, we splice two BCFs and two WCFs into a mixed-calibration factor set, and they are stored in the calibration block randomaccess memory (BRAM). The calibration BRAM dispatches the stored calibrated factors to the histogram BRAMs when the fine codes of averaged TDL are valid at the address ports. The flow diagram of the proposed TDC in the Virtex 7 FPGA is shown as Fig. 9. For different applications, a true dual-port BRAM or a two BRAMs working at the simple dual-port mode can be selected for histogramming.

#### III. EXPERIMENTS AND RESULTS

The results of the experiments and tests were used to evaluate the performances of the proposed calibrated TDCs. Two independent low-jitter crystal oscillators (DSC1103) were used as the signal sources for the code density tests. The temperature and operating voltage on the FPGA chip were maintained within a stable range.



Fig. 11. Time interval measurement results and rms resolutions of the calibrated TDCs for (a) Virtex-7 and (b) UltraScale FPGAs.

TABLE III LOGIC RESOURCES UTILIZATION

|             |          |           | Sin  | gle channel   | 96 channels |                      |  |  |
|-------------|----------|-----------|------|---------------|-------------|----------------------|--|--|
|             | Resource | Available | Used | Utilization % | Used        | <b>Utilization %</b> |  |  |
|             | Slice    | 108300    | 712  | 0.65          | 24637       | 22.74                |  |  |
| Virtex<br>7 | LUTs     | 433200    | 1145 | 0.26          | 55790       | 12.87                |  |  |
|             | FFs      | 866400    | 1916 | 0.22          | 91968       | 10.61                |  |  |
| -           | BRAM     | 1470      | 1.5  | 0.20          | 144         | 9.79                 |  |  |
|             | CARRY    | 30300     | 80   | 0.26          | 7680        | 25.35                |  |  |
| UltraS      | LUTs     | 242400    | 703  | 0.29          | 68357       | 28.2                 |  |  |
|             | FFs      | 484800    | 1195 | 0.24          | 114761      | 23.67                |  |  |
| -           | BRAM     | 600       | 1.5  | 0.25          | 144         | 24                   |  |  |

#### A. Linearity Test Results of Calibrated TDCs

The DNL, INL, and standard deviations ( $\sigma_{\text{DNL}}$  and  $\sigma_{\text{INL}}$ ) are the main parameters to evaluate the linearity. When compensated TDC is compared with calibrated TDC, both the DNL and INL are improved significantly. The results are summarized and illustrated in Table II and Fig. 10. After the calibration, DNL<sub>pk-pk</sub> (peak to peak of the DNL) and  $INL_{pk-pk}$  (peak to peak of the INL) are improved by more than 11-fold and 16-fold for the Virtex 7 FPGA, respectively. For the UltraScale FPGA, the  $DNL_{pk-pk}$ and INL<sub>pk-pk</sub> are improved about 7-fold and 5-fold, respectively. The standard deviations,  $\sigma_{\rm DNL}$  and  $\sigma_{\rm INL}$ , are improved by about 29- fold and 15-fold for the Virtex 7 FPGA, respectively. For the UltraScale FPGA, the  $\sigma_{DNL}$  and  $\sigma_{INL}$ , are reduced by 11fold and 4-fold, respectively. The equivalent binwidth  $w_{eq}$  and the equivalent standard deviation  $\sigma_{eq}$  were proposed by Wu for assessing the linearity performances of TDCs [51]. It is defined as the following equations:

$$\sigma_{\rm eq}^2 = \Sigma_i \left( \frac{W[i]^2}{12} \times \frac{W[i]}{W_{\rm total}} \right) \text{ where } W_{\rm total} = \Sigma_i W[i], \quad (9)$$

$$w_{\rm eq} = \sigma_{\rm eq} \sqrt{12} = \sqrt{\Sigma_i \left(\frac{W[i]^3}{W_{\rm total}}\right)}.$$
 (10)

#### B. Time Interval Measurements

To verify the measurement error and the rms resolution of the proposed TDC, programmable delay generators (such as IDELAYE2 and IDELAYE3) are used for generating the known time intervals between an origin signal and a delayed signal. The time intervals are measured by the presented calibrated TDCs



Fig. 12. Place and routing results of the 96-channel TDCs in Virtex-7(left) and UltraScale (right) FPGAs.

and an oscilloscope (Teledyne LeCroy WaveRunner 640Zi) at the same time. Both of the original signal and the delayed signal are outputted via two SMA connectors. The external jitter is minimized, since the time intervals are generated in the FPGA chip and sent to the TDC directly. The IDELAYE2 and IDELAYE3 are continuously calibrated by an IDELAYCTRL module based on a low jitter reference clock to prevent PVT variations. The time interval of IODELAY can be dynamically controlled with a step of 39 and 4.6 ps in IDELAY2 and IDELAY3 modules, respectively. With this arrangement, different time intervals were generated to cover the entire TDLs of the TDC. Each experiment captured 80 000 samples, and the time intervals were calculated based on the histogram. The measurement results and the rms resolution are shown in Fig. 11. The average rms resolution is 14.59 ps with  $\sigma = 0.84$  ps for the Virtex 7 FPGA and is 7.80 ps with  $\sigma = 0.45$  ps for the UltraScale FPGA. The standard deviations of the time intervals measured by the oscilloscope are 14.86 ps for the Virtex 7 FPGA and 8.55 ps for the UltraScale FPGA, respectively. The standard deviations of the differences between the measured results obtained by the TDC with the results are 4.04 and 5.37 ps for the Virtex 7 and the UltraScale FPGAs, respectively.

# C. Configurations and Multichannel TDC Design

The configurations of the multichannel TDC are various. In the Virtex 7 FPGA, each clock region contains 50 rows of CARRY4s. In the UltraScale FPGA, each clock region contains 60 rows of CARRY8s. To reduce the large nonlinearity contributed by the clock distribution, the TDLs are placed in two center clock regions in the Virtex 7 FPGA and are placed within the single clock region in the UltraScale FPGA. The single and dual sampling phases [28] can be selected according to the length of TDL and the frequency of the sampling clock.

In this paper, we implemented 96-channel calibrated TDCs in both Virtex 7 and UltraScale FPGAs. According to the postimplementation utilization report shown in Table III, each

| TABLE IV                                                                 |       |
|--------------------------------------------------------------------------|-------|
| SUMMARY OF LINEARITY PERFORMANCES OF *16-CHANNEL TDCS (OUT OF 96-CHANNEL | TDCs) |

|          | Channel              | 0    | 6    | 12   | 18   | 24   | 30   | 36   | 42   | 48   | 54   | 60   | 66   | 72   | 78   | 84   | 90   | Ave  |
|----------|----------------------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|
| Virtex 7 | DNL <sub>pk-pk</sub> | 0.17 | 0.20 | 0.14 | 0.15 | 0.17 | 0.12 | 0.22 | 0.12 | 0.12 | 0.15 | 0.18 | 0.14 | 0.15 | 0.13 | 0.18 | 0.15 | 0.15 |
|          | σ(DNL)               | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |
|          | INL <sub>pk-pk</sub> | 0.32 | 0.32 | 0.36 | 0.35 | 0.38 | 0.37 | 0.32 | 0.34 | 0.38 | 0.33 | 0.45 | 0.27 | 0.29 | 0.29 | 0.45 | 0.43 | 0.35 |
|          | σ(INL)               | 0.08 | 0.06 | 0.09 | 0.06 | 0.08 | 0.09 | 0.07 | 0.08 | 0.09 | 0.07 | 0.10 | 0.05 | 0.05 | 0.06 | 0.10 | 0.10 | 0.08 |
| e        | DNL <sub>pk-pk</sub> | 0.30 | 0.27 | 0.27 | 0.30 | 0.27 | 0.27 | 0.31 | 0.27 | 0.26 | 0.28 | 0.27 | 0.25 | 0.31 | 0.23 | 0.25 | 0.22 | 0.27 |
| Sca      | σ(DNL)               | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.03 | 0.04 |
| ltra     | INL <sub>pk-pk</sub> | 0.81 | 0.69 | 0.48 | 0.69 | 0.75 | 0.45 | 0.69 | 0.60 | 0.57 | 0.41 | 0.55 | 0.60 | 0.64 | 0.49 | 0.62 | 0.37 | 0.59 |
| n        | σ(INL)               | 0.18 | 0.15 | 0.10 | 0.18 | 0.17 | 0.11 | 0.17 | 0.13 | 0.13 | 0.08 | 0.10 | 0.12 | 0.16 | 0.10 | 0.12 | 0.07 | 0.13 |

\*These channels are evenly distributed in FPGAs; the other channels can be optimised similarly

| Authors Methods   |                                                                         | Year | Devices         | LSB<br>(ps) | RMS<br>Resolution | Measurement<br>uncertainty (ps) | DNL (LSB)              | INL(LSB)               |
|-------------------|-------------------------------------------------------------------------|------|-----------------|-------------|-------------------|---------------------------------|------------------------|------------------------|
| M. Fishburn [21]  | TDL, multi channels                                                     | 2013 | Virtex 6        | 10          | 18.5              | 19.60                           | [-1.00, 1.50]          | [-2.25, 1.61]          |
| N. Dutton [22]    | TDL, direct histogram                                                   | 2014 | Virtex 5        | 16.3        | NS                | NS                              | [-0.90, 3.00]          | [1.50; 5.00]           |
| Q. Shen [26]      | TDL, multichain<br>averaging                                            | 2015 | Virtex-6        | 1.7         | 4.2               | NS                              | [-0.70, 0.80]          | [-1.00, 0.70]          |
| Y. Wang [24]      | Bin realignment,<br>decimation, TDL                                     | 2015 | Kintex-7        | 17.6        | 12.7              | NS                              | [-1.00, 0.84]          | [-0.81, 0.87]          |
| J. Won [28]       | Dual-phase, TDL                                                         | 2016 | Virtex 6        | 10.0        | 10.0              | 11.03                           | [-1.00, 1.91]          | [-2.20, 3.93]          |
|                   |                                                                         |      | Kintex-7        | 10.6        | NS                | 8.13                            | [-1.00, 1.45]          | [-1.23, 4.30]          |
| J. Won [29]       | Tuned-TDL                                                               | 2016 | Virtex-6        | 10.1        | NS                | 9.82                            | [-1.00, 1.18]          | [-3.03, 2.46]          |
|                   |                                                                         |      | Spartan-6       | 16.7        | NS                | 12.75                           | [-1.00, 1.22]          | [-0.70, 2.54]          |
| Y. Wang [27]      | Bin realignment,<br>Dual-sampling                                       | 2016 | UltraScale      | 2.25        | 3.9               | NS                              | NS                     | NS                     |
| D. Tamborini [42] | Vernier delay loop,<br>8 channel TDCs                                   | 2016 | 0.35 μm<br>CMOS | 10          | 21                | 9 (rms)                         | 0.08 (peak)            | <0.8ps (rms)           |
| JC. Lai [18]      | Time-Residue<br>Feedback                                                | 2017 | 65nm<br>CMOS    | 0.98        | NS                | NS                              | [-0.8, 0.8]            | [-2.2, 2.2]            |
| M. Zhang [32]     | Matrix of counters                                                      | 2017 | Virtex 5        | 7.4         | NS                | 6.8                             | [-0.74, 0.74]          | [-1.52, 1.57]          |
| P. Chen [31]      | 2D Vernier                                                              | 2017 | Stratix IV      | 2.5         | 6.72              | NS                              | [-0.56, 0.46]          | [-2.98, 3.23]          |
| X. Qin [38]       | multi-chain integrated<br>TDL                                           | 2017 | Virtex-7        | 1.15        | 3.5               | NS                              | [-0.98, 3.5]           | [-5.9, 3.1]            |
|                   | Sub-TDL averaging<br>Histogram<br>compensation<br>Bin-width calibration | 2018 | Virtex-7        | 10.54       | 14.59             | 4.04                            | [-0.05, 0.08]<br>0 15* | [-0.09, 0.11]          |
| This work         |                                                                         | 2018 | UltraScale      | 5.02        | 7.8               | 5.37                            | [-0.12, 0.11]<br>0.27* | [-0.18, 0.46]<br>0.59* |

 TABLE V

 COMPARISON OF RECENT FPGA-BASED AND CUSTOM DESIGN CMOS TDCS

\* Averaged peak-to-peak DNL and INL results of the Multichannel TDCs

channel costs around 700 LUT modules and 1200 registers. The BRAM usage depends on the configuration and the designated measurement range. The minimum BRAM usage is 1.5 BRAM per channel in the dual-BRAM mode. However, the number of channels is not only limited by the resource usage. The timing requirement, routing congestion level, and system expandability should also be considered. Therefore, the space between two TDC adjacent channels needs to be guaranteed. The previous work [30] presented a high linearity, low dead time FPGA TDC. However, the high logic consumption makes the TDC not

suitable for a multichannel design. For multichannel applications, we presented this work to achieve both high linearity and low resource consumption. Fig. 12 shows the place and routing results for the 96-channel TDCs in the Virtex 7 (left) and UltraScale (right) FPGAs. To demonstrate the uniformity of the proposed multichannel TDC, we demonstrated the code density test results for 16 out of 96 channels (in both FPGAs). These 16 channels are evenly distributed in the used chip area. According to test results shown in Table IV, the linearity performances of the TDC channels in different locations show good uniformity.

#### **IV. CONCLUSION**

In this paper, we proposed and evaluated the following:

- 1) a new sub-TDL averaging TDL topology;
- 2) an innovative tap timing test;
- a new hardware-embedded histogram compensation and a mixed calibration methods.

The sub-TDL averaging is able to remove bubbles and zerowidth bins without consuming additional resources and extra process time. The novel taps timing test is able to quantify the actual timing of TDLs. The histogram compensation and mixed calibration methods are also used to correct the conversion bias and the binwidth deviation directly with limited resource consumption. By integrating these methods, high linearity and low-cost FPGA-TDCs were implemented and tested in the Virtex 7 and UltraScale FPGAs, respectively. The bin size (LSB) achieved 10.5 and 5.0 ps with the rms resolution of 14.59 and 7.80 ps for the Virtex 7 and the UltraScale FPGAs, respectively. Compared with previously published works, listed in Table V, the linearity has been significantly improved. The 96-channel TDCs were also implemented and tested in both FPGAs, and they show good uniformity from the test results. Our solutions demonstrate significant improvements compared with previously reported studies. They also have potentials for future applications for fast 3-D ranging or time-resolved imaging that were previously using other techniques (such as Raman spectroscopy, agricultural research, and wind farm).

#### REFERENCES

- [1] C.-R. Ho and M. S.-W. Chen, "10.5 A digital PLL with feedforward multi-tone spur cancelation loop achieving <-73 dBc fractional spur and <-110 dBc Reference Spur in 65 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2016, pp. 190–191.
- [2] S. K. Kao, Y. H. Hsieh, and H. C. Cheng, "An all-digital DLL with dutycycle correction using reusable TDC," *Int. J. Circuit Theory Appl.*, vol. 44, no. 5, pp. 1055–1070, 2016.
- [3] H. Akita, I. Takai, K. Azuma, T. Hata, and N. Ozaki, "An imager using 2-D single-photon avalanche diode array in 0.18-µm CMOS for automotive LIDAR application," in *Proc. Symp. VLSI Circuits*, 2017, pp. C290–C291.
- [4] N. Arora *et al.*, "Perovskite solar cells with CuSCN hole extraction layers yield stabilized efficiencies greater than 20%," *Science*, vol. 358, pp. 768– 771, 2017.
- [5] C. Lelii et al., "Enhanced photovoltaic performance with co-sensitization of quantum dots and an organic dye in dye-sensitized solar cells," J. Mater. Chem. A, vol. 2, no. 43, pp. 18375–18382, 2014.
- [6] G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, "Detection and tracking of moving objects hidden from view," *Nature Photon.*, vol. 10, pp. 23–26, 2015.
- [7] J. F. Cavanaugh *et al.*, "The mercury laser altimeter instrument for the MESSENGER mission," *Space. Sci. Rev.*, vol. 131, pp. 451–479, 2007.
- [8] N. Ollivier-Henry *et al.*, "Design and characteristics of a multichannel front-end ASIC using current-mode CSA for small-animal PET imaging," *IEEE Trans Biomed Circuits Syst.*, vol. 5, no. 1, pp. 90–99, Feb. 2011.
- [9] D. D.-U. Li *et al.*, "Video-rate fluorescence lifetime imaging camera with CMOS single-photon avalanche diode arrays and high-speed imaging algorithm," *J. Biomed. Opt.*, vol. 16, no. 9, 2011, Art. no. 096012.
- [10] Z. Cheng, M. J. Deen, and H. Peng, "A low-power gateable vernier ring oscillator time-to-digital converter for biomedical imaging applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 2, pp. 445–454, Apr. 2016.
- [11] I. Gyongy et al., "256 × 256, 100 kfps, 61% Fill-factor time-resolved SPAD image sensor for microscopy applications," in Proc. IEEE Int. Electron Devices Meeting, 2016, pp. 8.2.1–8.2.4.
- [12] F. M. Della Rocca *et al.*, "Real-time fluorescence lifetime actuation for cell sorting using a CMOS SPAD silicon photomultiplier," *Opt. Lett.*, vol. 41, no. 4, pp. 673–676, 2016.

- [13] K. Akiba et al., "The timepix telescope for high performance particle tracking," Nucl. Instrum. Methods A, vol. 723, pp. 47–54, 2013.
- [14] J. Wu, U. Fermi Nat. A. Lab, Z. Shi, and I. Y. Wang, "Firmware-only implementation of time-to-digital converter (TDC) in field-programmable gate array (FPGA)," in *Proc. IEEE Nucl. Sci. Symp. Conf. Rec.*, Portland, OR, USA, 2003, pp. 177–181.
- [15] M. Förtsch *et al.*, "A versatile source of single photons for quantum information processing," *Nature Commun.*, vol. 4, 2013, Art. no. 1818.
- [16] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution," *Metrologia*, vol. 41, no. 1, pp. 17–32, 2003.
- [17] Z. Cheng, X. Zheng, M. J. Deen, and H. Peng, "Recent developments and design challenges of high-performance ring oscillator CMOS time-todigital converters," *IEEE Trans. Electron Devices*, vol. 63, no. 1, pp. 235– 251, Jan. 2016.
- [18] J.-C. Lai and T.-Y. Hsu, "Cost-effective time-to-digital converter using time-residue feedback," *IEEE Trans. Ind. Electron.*, vol. 64, no. 6, pp. 4690–4700, Jun. 2017.
- [19] J. Wu and Z. Shi, "The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay," in *Proc. IEEE Nucl. Sci. Symp. Conf. Rec.*, Dresden, Germany, 2008, pp. 3440–3446.
- [20] M.-A. Daigneault and J. P. David, "A high-resolution time-to-digital converter on FPGA using dynamic reconfiguration," *IEEE Trans. Instrum. Meas.*, vol. 60, no. 6, pp. 2070–2079, Jun. 2011.
- [21] M. W. Fishburn, L. H. Menninga, C. Favi, and E. Charbon, "A 19.6 ps, FPGA-based TDC with multiple channels for open source applications," *IEEE Trans. Nucl. Sci.*, vol. 60, no. 3, pp. 2203–2208, Jun. 2013.
- [22] N. Dutton *et al.*, "Multiple-event direct to histogram TDC in 65 nm FPGA technology," in *Proc. 10th Conf. Ph.D. Res. Microlectron. Electron.*, Grenoble, France, 2014, pp. 1–5.
- [23] W. Pan, G. Gong, and J. Li, "A 20-ps time-to-digital converter (TDC) implemented in field-programmable gate array (FPGA) with automatic temperature correction," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 3, pp. 1468– 1473, Jun. 2014.
- [24] Y. Wang and C. Liu, "A nonlinearity minimization-oriented resourcesaving time-to-digital converter implemented in a 28 nm Xilinx FPGA," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 5, pp. 2003–2009, Oct. 2015.
- [25] C. Liu and Y. Wang, "A 128-channel, 710 M samples/second, and less than 10 ps rms resolution time-to-digital converter implemented in a Kintex-7 FPGA," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 773–783, Jun. 2015.
- [26] Q. Shen *et al.*, "A 1.7 ps equivalent bin size and 4.2 ps rms FPGA TDC based on multichain measurements averaging method," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 947–954, Jun. 2015.
- [27] Y. Wang and C. Liu, "A 3.9 ps time-interval rms precision time-to-digital converter using a dual-sampling method in an ultrascale FPGA," *IEEE Trans. Nucl. Sci.*, vol. 63, no. 5, pp. 2617–2621, Oct. 2016.
- [28] J. Y. Won, S. I. Kwon, H. S. Yoon, G. B. Ko, J.-W. Son, and J. S. Lee, "Dual-phase tapped-delay-line time-to-digital converter with on-the-fly calibration implemented in 40 nm FPGA," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 1, pp. 231–242, Feb. 2016.
- [29] J. Y. Won and J. S. Lee, "Time-to-digital converter using a tuned-delay line evaluated in 28-, 40-, and 45-nm FPGAs," *IEEE Trans. Instrum. Meas.*, vol. 65, no. 7, pp. 1678–1689, Jul. 2016.
- [30] H. Chen, Y. Zhang, and D. D.-U. Li, "A low nonlinearity, missing-code free time-to-digital converter based on 28-nm FPGAs with embedded binwidth calibrations," *IEEE Trans. Instrum. Meas.*, vol. 66, no. 7, pp. 1912– 1921, Jul. 2017.
- [31] P. Chen, Y.-Y. Hsiao, Y.-S. Chung, W. X. Tsai, and J.-M. Lin, "A 2.5-ps bin size and 6.7-ps resolution FPGA time-to-digital converter based on delay wrapping and averaging," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 25, no. 1, pp. 114–124, Jan. 2017.
- [32] M. Zhang, H. Wang, and Y. Liu, "A 7.4 ps FPGA-Based TDC with a 1024-Unit measurement matrix," Sensors, vol. 17, no. 4, 2017, Art. no. E865.
- [33] R. Szplet, J. Kalisz, and R. Szymanowski, "Interpolating time counter with 100 ps resolution on a single FPGA device," *IEEE Trans. Instrum. Meas.*, vol. 49, no. 4, pp. 879–883, Aug. 2000.
- [34] J. Kalisz, R. Szplet, J. Pasierbinski, and A. Poniecki, "Fieldprogrammable-gate-array-based time-to-digital converter with 200-ps resolution," *IEEE Trans. Instrum. Meas.*, vol. 46, no. 1, pp. 51–55, Feb. 1997.
- [35] P. Napolitano, A. Moschitta, and P. Carbone, "A survey on time interval measurement techniques and testing methods," in *Proc. IEEE Instrum. Meas. Technnol. Conf.*, Austin, TX, USA, 2010, pp. 181–186.
- [36] C. Favi and E. Charbon, "A 17 ps time-to-digital converter implemented in 65 nm FPGA technology," in *Proc. ACM/SGDA Int. Symp. Field Pro*grammable Gate Arrays, Monterey, CA, USA, 2009, pp. 113–120.

- [37] Y. Wang, J. Kuang, C. Liu, and Q. Cao, "A 3.9 ps rms precision timeto-digital converter using ones counter encoding scheme in a Kintex-7 FPGA," *IEEE Trans. Nucl. Sci.*, vol. 64, no. 10, pp. 2713–2718, Oct. 2017.
- [38] X. Qin, L. Wang, D. Liu, Y. Zhao, X. Rong, and J. Du, "A 1.15 ps bin size and 3.5 ps single-shot precision time-to-digital-converter with on-board offset correction in an FPGA," *IEEE Trans. Nucl. Sci.*, vol. 64, no. 12, pp. 2951–2957, Dec. 2017.
- [39] M. Shingo and C. Edoardo, "A 128-channel, 8.9-ps LSB, column-parallel two-stage TDC based on time difference amplification for time-resolved imaging," *IEEE Trans. Nucl. Sci.*, vol. 59, no. 5, pp. 2463–2470, Oct. 2012.
- [40] C. Veerappan et al., "A 160 × 128 single-photon image sensor with onpixel 55 ps 10b time-to-digital converter," in *Proc. IEEE Int. Solid State Crircuits Conf.*, San Francisco, CA, USA, 2011, pp. 312–314.
- [41] H. K. Chandrasekharan *et al.*, "Multiplexed single-mode wavelength-totime mapping of multimode light," *Nature Commun.*, vol. 8, 2017, Art. no. 14080.
- [42] D. Tamborini, D. Portaluppi, F. Villa, and F. Zappa, "Eight-channel 21 ps precision range time-to-digital converter module," *IEEE Trans. Instrum. Meas.*, vol. 65, no. 2, pp. 423–430, Feb. 2016.
- [43] J. P. Jansson, V. Koskinen, A. Mantyniemi, and J. Kostamovaara, "A multichannel high-precision CMOS time-to-digital converter for laserscanner-based perception systems," *IEEE Trans. Instrum. Meas.*, vol. 61, no. 9, pp. 2581–2590, Sep. 2012.
- [44] J. Richardson *et al.*, "A 32 × 32 50ps resolution 10 bit time to digital converter array in 130nm CMOS for time correlated imaging," in *Proc. IEEE Custom Integr. Circuits Conf.*, 2009, pp. 77–80.
- [45] E. Bayer and M. Traxler, "A High-resolution (<10 ps RMS) 48-channel time-to-digital converter (TDC) implemented in a field programmable gate array (FPGA)," *IEEE Trans. Nucl. Sci.*, vol. 58, no. 4, pp. 1547–1552, Aug. 2011.
- [46] S. Burri, H. Homulle, C. Bruschini, and E. Charbon, "LinoSPAD: A timeresolved 256 × 1 CMOS SPAD line sensor system featuring 64 FPGAbased TDC channels running at up to 8.5 giga-events per second," *Proc. SPIE*, vol. 9899, 2016, Art. no. 98990D.
- [47] J. Torres *et al.*, "Time-to-digital converter based on FPGA with multiple channel capability," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 1, pp. 107–114, Feb. 2014.
- [48] J. Wu, "Several key issues on implementing delay line based TDCs using FPGAs," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 3, pp. 1543–1548, Jun. 2010.
- [49] J. Wang, S. Liu, Q. Shen, H. Li, and Q. An, "A fully fledged TDC implemented in field-programmable gate arrays," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 2, pp. 446–450, Apr. 2010.
- [50] IEEE Standard for Terminology and Test Methods for Analog-to-Digital Converters, IEEE Standard 1241-2010, Jan. 14, 2011. [Online]. Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5692956
- [51] J. Wu, "Uneven bin width digitization and a timing calibration method using cascaded PLL," in *Proc. 19th IEEE-NPSS Real Time Conf.*, Nara, Japan, 2014, pp. 1–4.



Haochang Chen was born in Shaanxi, China, in 1990. He received the B.Sc. degree in electronic design automation from the University of Central Lancashire, Preston, U.K., in 2012, and the B.Eng. degree in electronic and information engineering from the North China University of Technology, Beijing, China, in 2012, and the M.S. degree in embedded digital systems from the University of Sussex, Brighton, U.K., in 2013. Since October 2014, he has been working toward the Ph.D. degree founded by EPSRC at

the Strathclyde University, Glasgow, U.K.

His current research interests include FPGA-based highprecision time metrology systems for ranging and biomedical imaging applications.



**David Day-Uei Li** received the Ph.D. degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, in 2001.

He then joined the Industrial Technology Research Institute, Taiwan, working on CMOS optical and wireless communication chipsets. From 2007 to 2011, he worked at the University of Edinburgh on two European projects focusing on CMOS single-photon avalanche diode sensors and systems. He took the lectureship in biomedical engineering, and in 2014 he joined

the University of Strathclyde, Glasgow, U.K., as a Senior Lecturer. He has authored more than 70 journal and conference papers and holds 12 patents. His research interests include CMOS sensors and systems, mixed signal circuits, embedded systems, optical communications, FLIM systems and analysis, and field-programmable gate array/graphics processing unit computing. His research exploits advanced sensor technologies to reveal low-light but fast biological phenomena.