Contents lists available at ScienceDirect

# Measurement

journal homepage: www.elsevier.com/locate/measurement

# Time-to-digital converters and histogram builders in SPAD arrays for pulsed-LiDAR

Vincenzo Sesta<sup>\*</sup>, Alfonso Incoronato, Francesca Madonini, Federica Villa

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy

#### ARTICLE INFO

#### Keywords: Light Detection And Ranging (LiDAR) Time-Of-Flight (TOF) SPAD array Time-to-Digital Converter (TDC) Data processing Histogram Centroid computation

#### ABSTRACT

Light Detection and Ranging (LiDAR) is a 3D imaging technique widely used in many applications such as augmented reality, automotive, machine vision, spacecraft navigation and landing. Pulsed-LiDAR is one of the most diffused LiDAR techniques which relies on the measurement of the round-trip travel time of an optical pulse back-scattered from a distant target. Besides the light source and the detector, Time-to-Digital Converters (TDCs) are fundamental components in pulsed-LiDAR systems, since they allow to measure the back-scattered photon arrival times and their performance directly impact on LiDAR system requirements (i.e., range, precision, and measurements rate). In this work, we present a review of recent TDC architectures suitable to be integrated in SPAD-based CMOS arrays and a review of data processing solutions to derive the TOF information. Furthermore, main TDC parameters and processing techniques are described and analyzed considering pulsed-LiDAR requirements.

# 1. Introduction

Light Detection and Ranging (LiDAR) is an optical technology used to evaluate distances and to allow 3D representations of the surveyed environment. LiDAR market is rapidly maturing and expanding in many applications, including automotive, UAV/drones, Earth mapping, virtual reality with ever-increasing performance and cost needs [1–4].

Time-Of-Flight (TOF) is the key enabling technique for long-range and real-time LiDAR. TOF determines the distance between a sensor and an object, either with time-resolved (*direct*-TOF) or phase-resolved (*indirect*-TOF) measurements [5]. In *direct*-TOF, the distance D between the target object and the detector is computed with:

$$D = \frac{1}{2} \cdot c \cdot TOF \tag{1}$$

where c is the light velocity in the considered medium and TOF is the round-trip travel time of an optical pulse that reaches the target object and returns to the detector after being back-scattered. This technique, also known as *Pulsed*-LIDAR, typically employs single photon detectors combined with Time-to-Digital Converters (TDCs) to timestamp returning photon arrival times. *Pulsed*-LiDAR ensures high-precision measurements (when short pulses and large bandwidth timing electronics are provided) over wide distances (kilometers ranges), since

pulsed lasers with high peak power are able to reach far distances, while maintaining eye-safety compliant average power [6]. In contrast to pulsed-LIDAR, continuous-wave (CW)-LiDAR shines the target with a continuous light signal, which can be either Amplitude Modulated (AMCW) or Frequency Modulated (FMCW). AMCW-LiDAR measures the phase difference between an amplitude modulated light source and the back-scattered signal to retrieve the target distance (indirect-TOF) [7,8]. FMCW-LiDAR calculates the object distance exploiting the interference between a linearly chirped emitted laser wavelength and the return echo signal, properly demodulated [8]. FMCW-LiDAR gives the highest precision among LiDAR systems, only requiring square-law detectors and low-frequency electronics, and it also allows direct computation of target object velocity by exploiting Doppler effect [7]. Recently, promising results have been showed allowing to achieve long range in FMCW-LiDAR, exceeding the limitation of FSR with the laser coherent length [9].

In *pulsed*-LiDAR, the TOF between emission and reception of echo pulse is measured, either with timing electronics (e.g., TDCs) or with counting electronics within short gate windows progressively spanning across the FSR. High sensitivity and good timing resolution are then crucial requirements for LiDAR receiver detectors [10–13]. Among the various detectors, Single Photon Avalanche Diodes (SPADs) have acquired an important role thanks to their single-photon sensitivity, which

https://doi.org/10.1016/j.measurement.2023.112705

Received 23 March 2022; Received in revised form 19 February 2023; Accepted 6 March 2023 Available online 9 March 2023





<sup>\*</sup> Corresponding author. E-mail address: vincenzo.sesta@polimi.it (V. Sesta).

<sup>0263-2241/© 2023</sup> The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

allows to obtain very fine timing resolution. Other key advantages are immunity to readout noise, relatively low bias voltage, and easy integration with processing electronics in the same fabrication process. SPADs can be combined either into Silicon PhotoMultipliers (SiPMs), providing a common output, or in SPAD arrays, where each pixel (typically including a SPAD with its front-end circuitry and processing electronics for counting or timing operations) provides spatial information.

Fig. 1 shows an example of a SPAD-based *pulsed*-LiDAR system, which includes a pulsed laser, a time-resolved detector, and a post-processing unit for TOF extraction. In order to cope with SPADs non-idealities and background light, several time measurements are accumulated in a histogram and centroid computation is implemented to achieve very precise TOF information.

However, the main drawback of SPAD-based sensors is the so-called "pile-up" distortion, i.e. the fact that first photons (coming, for instance, from background illumination) mask the detection of late photons due to the detector and TDC deadtime, preventing to retrieve the real echo signal through Time-Correlated Single-Photon Counting (TCSPC) [14,15]. For this reason, TDC multi-hit operation (i.e. possibility to time-stamp multiple photons within the same laser shot) and short detector deadtime are key parameters in SPAD and TDC design for LiDAR. In addition, when the background becomes dominant, various detection techniques such as ROI selection [16], coincidence detection [17], temporal gating [18], optical methods [19] and post-processing algorithms [20] have to be used to reduce the histogram distortion due to the pile-up effect.

The key parameters to precisely measure photon arrival times in *pulsed*-LIDAR are high-energy laser pulses, fast detectors and highbandwidth timing electronics. Indeed, the timing electronics resolution directly affects the obtained distance resolution, according to equation (1).

On the other hand, the timing electronics FSR, together with the laser power and the object reflectivity, defines the achievable spatial FSR. In long range measurements, a fundamental role is also played by the detector sensitivity. In fact, with eye-safe limited laser energy and low object reflectivity, few back-scattered photons return to the detector, down to the single photon level. Also, the detector pixel number impacts final LiDAR performance, by defining the achievable Field-of-View (FOV), angular resolution and measurement speed, depending on the illumination scheme. Indeed, TOF measurements in *pulsed*-LiDAR can be performed with either single spot, blade beam or flood light to illuminate the scene [10]. Single-spot and blade illumination require a scanning system, while with flood illumination, *flash*-LiDAR is achievable. Laser energy, eye-safety, acquisition speed, detector geometry, setup complexity, and specific application requirements are all parameters to be considered and combined in the illumination scheme choice.

Timing resolution, high sensitivity and large number of pixels are key



**Fig. 1.** Schematic representation of a pulsed-LiDAR setup, where the round-trip travel time of the laser pulse is computed by time-resolved detectors (e.g., SPAD arrays combined with TDCs) and repetitive time measurements are processed in a histogram processing unit for TOF extraction.

drivers for SPAD and SiPM-based LiDAR systems development among many companies (e.g., Toyota, ST-Microelectronics, Sony, Panasonic, onsemi, Ford-Argo). Moreover, the on-chip integration of detectors, time measurement electronics and histogram processing constitute the future-oriented path towards compact real-time and reduced datathroughput LiDAR solutions.

A final challenge for a reliable LiDAR system is immunity to interferences among multiple LiDAR devices, particularly in the automotive field. Different solutions could be adopted [16], and, for *pulsed*-LiDAR, Code-Division Multiple Access (CDMA) can be mentioned as a notable example, which employs some laser temporal signatures to illuminate and then to filter valid TOF estimations from interfering ones [21].

The goal of this paper is to describe the requirements of TDCs and data processing for LiDAR and discuss recent different integrated TDC and TOF processing solutions in SPAD arrays for various pulsed-LiDAR applications. This manuscript is organized as follows. Section 2 resumes the main parameters and architectures of TDCs for pulsed-LiDAR applications and therefore, a discussion on recent TDC architectures designed in LiDAR SPAD arrays is presented in Section 3. The most common histogram-based data processing algorithms to extract TOF information is presented in Section 4, while already implemented smart histogram processing solutions to define TOF is presented in Section 5. Finally, Section 6 draws conclusions and future perspectives.

# 2. Time-to-digital converters for pulsed-LiDAR

Time-to-Digital Converters (TDCs) are fundamental blocks in pulsed-LiDAR systems, since they are used to measure time-intervals and convert them in digital codes. Usually, a time interval is defined by two electrical pulses: the START pulse indicating its starting point and the STOP pulse indicating its end. In the following sections, we will illustrate the most important TDC parameters and the main TDC architectures, focusing on their integrability in SPAD arrays for pulsed-LiDAR applications.

# 2.1. Main TDC parameters

# 2.1.1. Full-scale range

The Full-scale Range (FSR) defines the maximum time interval that can be correctly measured. For architectures based on counters, this parameter is defined by the number of bits and reference clock frequency. In pulsed-LiDAR, the TDC FSR is an essential parameter since it defines the maximum achievable distance range, by equation (1). However, LiDAR maximum range is ultimately limited by other system parameters such as transmitter energy, optics, and receiver sensitivity. Therefore, the TDC FSR participate with these overall parameters in guaranteeing the required distance range.

#### 2.1.2. Time resolution and precision

The time resolution defines the smallest time interval that can be correctly discriminated. This parameter is usually referenced as Least Significant Bit (LSB) and it is strictly related to the TDC architecture characteristics, i.e., the technology and the noise performance. The time resolution defines the distance resolution (i.e., the smallest distance that can be discriminated) in a LiDAR system, through equation (1).

The time resolution is critical since it affects the overall precision. The rms value of precision can be calculated as follows:

$$\sigma_{TDC} = \sqrt{\sigma_q^2 + \sigma_{INL}^2 + \sigma_{jitter}^2}$$
(2)

where  $\sigma_q$  is the quantization error, usually obtained by LSB/ $\sqrt{12}$ ,  $\sigma_{INL}$  is the TDC Integral Non-Linearity (INL) standard deviation and  $\sigma_{jitter}$  is the additional jitter introduced within the TDC, for instance due to cross-talk with other signals [21]. The TDC precision directly impacts on the overall single shot precision of the TOF measurement, which also

0

depends on the laser pulse width and the SPAD time-jitter. In fact, it is defined as:

$$\sigma_{\text{single-shot}} = \sqrt{\sigma_{\text{laser}}^2 + \sigma_{\text{SPAD}}^2 + \sigma_{\text{TDC}}^2} \tag{3}$$

Since the single-shot precision is usually dominated by the laser pulse width used for LiDAR measurements (in many cases even lower than 1 ns FWHM [22]), the TDC LSB has to be smaller in order to make its contribution negligible with respect to the laser  $\sigma_{laser}$ , and the SPAD time-jitter  $\sigma_{SPAD}$ , improving the single shot precision of the distance measurement.

However, in many pulsed-LiDAR systems, the centroid of the arrivaltime histogram around its peak is computed to estimate the TOF, in order to overcome the TDC quantization error and achieve a higher precision, not limited by the TDC quantization error, as detailed in Section 4. In addition, the centroid computation also allows to reduce all limitations introduced in the final system, mainly related to nanosecond laser pulse width and limited SPAD temporal response. As a drawback, the centroid computation requires many TOF measurements accumulation to improve the precision, therefore increasing the measurement time (i.e., reducing the frame rate).

#### 2.1.3. Offset and non-linearity

The performance of a real TDC is highly influenced by different parameters that describe its non-ideal behavior. One of these parameters is the offset, which represents the vertical shift of the conversion curve or, in other words, the converter output with zero input. In LiDAR systems, the offset causes a fixed shift of all measured distances and it can be easily compensated. Besides the offset, the non-linearity represents another TDC parameter strongly impacting its final performance. The non-linearity can derive from process variations (e.g., temperature gradient and doping) and/or local variations (i.e., mismatches), which occur during the fabrication process. Two types of non-linearity can be defined: Differential Non-Linearity (DNL) and Integral Non-Linearity (INL).

DNL defines the difference between the length of each time step (LSB<sub>i</sub>) and the mean length of all steps  $(\overline{\text{LSB}}$ ) in the conversion characteristics. It is given by:

$$DNL_{i} = \frac{LSB_{i} - \overline{LSB}}{\overline{LSB}}$$
(4)

Besides, INL of each step is defined as the cumulative sum of DNLs of all preceding steps and it represents the difference between the step's actual output value and the corresponding value on the ideal gain curve. Thus, it is given by:

$$INL_{i} = \sum_{j=1}^{i} \frac{LSB_{j} - \overline{LSB}}{\overline{LSB}} = \sum_{j=1}^{i} DNL_{j}$$
(5)

DNL and INL can be indicated with their maximum value (worst case) or by their root mean square (rms) value over all steps. They are usually normalized to be expressed in LSB units.

These non-linearities have a serious impact on LiDAR systems performance, as they lead to an error in each distance measurement (i.e., DNL), and the errors vary with the measured distances along the range (i.e., INL). To reduce these converter non-idealities, various calibration and linearization techniques can be employed [23]. For instance, the Statistical Code Density Test method allows, for some TDC architectures, to measure each bin width and then correct in post processing the code generated by converter [24], while Sliding-Scale technique directly improves the linearity introducing a dithering in the measured time interval [25]. In this way, different portions of START and STOP interpolator ranges are exploited and, by averaging all the results, the nonlinearity is reduced. However, in this last case, the improvement of linearity is paid with a more complex architecture and a higher quantization error due to the START and STOP interpolators.

# 2.1.4. Dead time

The dead-time represents the time needed for a TDC to complete the conversion after the STOP pulse and to generate a valid digital code. The inverse of this parameter defines the maximum measurement rate at which the TDC can operate. Modern real-time LiDAR applications are demanding higher measurement rates; therefore, designing TDCs with low deadtimes is mandatory. Moreover, low dead times allow to detect more useful photons in each frame and, therefore, reduce the effect of the pile-up distortion in the distance measurement [20,18]. To minimize the deadtime, a multi-hit TDC implementation can be employed in order to perform and store multiple TOF measurements related to the same laser shot.

High conversion rate is a fundamental parameter also when one TDC is shared among multiple pixels, in approaches commonly used for large LiDAR arrays to reduce the overall power consumption [26].

#### 2.1.5. Power consumption

The power consumption of a TDC is mainly related to the architecture implemented and the technology being used. This parameter is important in the implementation of TDCs inside pulsed-LiDAR CMOS SPAD arrays since it contributes to the overall system power consumption. Typically, high precision LiDAR measurements require a high resolution TDC based on high frequency clocks [17]. This requirement leads to a high-power consumption for each TDC and, therefore, a smart sharing of TDCs inside a CMOS SPAD array is usually.

necessary to manage the overall power consumption, while maintaining a high precision [16,26].

# 2.2. TDC architectures

Various reviews on ASIC and FPGA-based TDCs are already present in literature [23,27–30]. FPGA approaches can reach very high performance [31,32] and present the advantage of reconfigurability [33], fast prototyping thanks to the development of portable IP-Cores [34] and low costs [35], nevertheless they are more suitable for small size and testing setups [36–38]. For this reason, we will focus on ASIC approaches, and, in this Section, we provide a brief summary of the main ASIC implementations with particular attention to those suitable to be integrated inside CMOS SPAD arrays for compact pulsed-LiDAR sensors.

The architecture based on a counter represents the most basic implementation of a TDC. The START pulse enables a counter, which then counts the clock periods until the STOP pulse is received. The simple implementation and low area are the principal advantages of this architecture. However, the maximum achievable resolution is limited by the system clock frequency. Therefore, when time resolution of few ns and long FSR are needed, a simple counter represents the best architecture that can be implemented.

However, when precise time resolution is needed, various architectures has been proposed and they can be divided in: i) *analog* approaches, requiring a Time-to-Amplitude Converter (TAC) [39,40] to convert a time interval to an analog voltage, followed by an Analog-to-Digital Converter (ADC); ii) *digital* approaches, which are based on different techniques using delay lines, time amplifier or Ring Oscillators (ROs) [23,41,42]. The typical advantage of the analog approach is to achieve better resolution, while the digital one allows for a much easier implementation in standard CMOS integrated circuits. For such reasons, the digital approach represents the preferred choice to integrate multiple TDCs in a single chip, as in the case of developing LiDAR SPAD arrays.

However, the digital approaches have a FSR limited to few ns, since it would result in performance compromises when the high resolution is needed. High area and power consumption, high timing jitter, or long dead time represent some drawback examples and they are strongly dependent on the specific implementation.

Therefore, if both high resolution and long FSR are needed. such as in pulsed LiDAR, these digital approaches are combined with a coarse counter to guarantee the required FSR and the high resolution, implementing the Nutt architecture [42].

Tapped Delay Line (TDL) is the most used digital approach in FPGA [43–45], but commonly used also for CMOS integration in arrays [46–48]. This architecture is simple, as shown in Fig. 2, and can easily implemented with moderate resolution, defined by the propagation delay of CMOS logic gates. However, when high FSR is required, a higher number of delay cells are needed, increasing the area and power consumption. Therefore, it is necessary to limit the number of delay cells and thus, the Nutt interpolation has to be implemented, using a counter to increase the FSR. Furthermore, a proper synchronism between TDL and counter is needed to guarantee a correct TDC conversion [17]. Dead time is very low since it is only given by the setting time of the register, therefore resulting in a suitable solution for real-time LiDAR applications.

In order to guarantee a precise and stable resolution against PVT variations, sometimes TDL architectures make use of locked loops (DLLs and PLLs). Some architectures use Delay-Locked Loops (DLLs) to define the propagation delay  $\tau_1$  of the cells used in the delay line through which START signal propagates until the STOP arrives [49,50], while others use the clock phases generated by DLL or PLL to subdivides the clock counts into smaller time-bins (corresponding to the final resolution) [47,51,52]. In the implementation based on clock phases, the state of such phases is sampled and decoded when the STOP signal is propagated.

To overcome the area limitation of TDL architecture, the *Ring Oscillator* (RO) structure can be employed [53]. The basic structure of the RO is shown in Fig. 3 (top). It consists of a delay line folded to form a loop through which the transition is propagated [54–56]. The time conversion is given by the number of cycles counted by a counter (CTR block in Fig. 3) and the phase of the RO sampled and decoded by a dedicated circuit.

Therefore, the same delay cells are reused for transition propagation until the conversion is completed. Thanks to this architecture, a gain of area is immediately obtained. Moreover, the RO architecture guarantees a simple and nil dead-time multi hit implementation if the RO is never stopped and STOP pulses are used to sample the states of the RO phases and counter counts.

The oscillation frequency of the RO and the number of delay cells define the TDC time resolution. Considering a RO composed of N stages, each having  $\tau_1$  as propagation delay, the oscillation period is expressed as:

$$T_{RO} = 2N \cdot \tau_1 \tag{6}$$

Given the dependence on the propagation delay  $\tau_1$ , the time resolution strictly depends on the chosen CMOS technology. Thus, a higher resolution can be obtained by sampling the states of each delay cells and an LSB down to tens of picoseconds can be easily achieved with scaled technologies [57]. In addition, these high resolution and linearity are obtained without any need of TDC calibration. As a disadvantage, high oscillation frequencies lead to a high-power consumption due to the free-running RO, which thus limits the multiple or per-pixel implementation in CMOS SPAD arrays. Furthermore, a higher FSR implies elevate number of clock cycles which leads a worsening timing jitter when high range has to be measured.



Fig. 2. Basic structure of Tapped Delay Line (TDL) TDC.



Fig. 3. Ring Oscillator (RO) (top) and Gated Ring Oscillator (GRO) (bottom) TDC architecture.

To reduce the power consumption, an improved architecture called *Gated Ring Oscillator* (GRO) can be implemented [58–60]. As shown in Fig. 3 (bottom), the ring oscillator is turned ON when the measurement time interval starts and turned OFF at the end of it. In this approach, the states of each delay cell are sampled by STOP pulses and kept until the successive measurement. Therefore, since the oscillation is stopped, there is no need of particular synchronism between the counter and the RO phases. Furthermore, in some implementations, GRO are also used to intrinsically adding dithering to the time measurement, resulting in a Delta-Sigma like noise shaping, which improves the conversion linearity [60].

In order to overcome the limitation in resolution due to the propagation delay in non-scaled technologies, a *Vernier Delay Line* architecture can be exploited [61–63]. It is based on two delay lines in which START and STOP signals propagate through, as shown in Fig. 4 (top). The propagation delay of cells in START delay chain ( $\tau_1$ ) is slightly greater than the one in the STOP delay line ( $\tau_2$ ). Since the signals propagate in the respective delay lines, the time difference between the START and STOP is decreased in each delay stage by a time resolution equal to  $\tau_1 - \tau_2$ , until the STOP signal overtakes the START. The position at which the overtaking happens carries the temporal information about START-STOP delay. However, this architecture is very sensitive to process



Fig. 4. Vernier delay line (top) and Vernier gated ring oscillator (bottom) TDC architecture.

variations and mismatches between cells, eventually impairing the conversion linearity, and it also requires a large number of cells, thus costing a wide silicon area. Therefore, a hybrid architecture of Vernier Delay Line has been proposed using RO or GRO instead of delay lines [64,65]. The basic architecture is shown in Fig. 4 (bottom). The loop structure of this hybrid architecture allows reusing the delay-line many times and thus, a longer time measurements range is obtained. However, picoseconds resolution and low area occupation is achieved at the expense of non-negligible deadtime needed to cover the limited FSR with higher resolution. As mentioned for the TDL architectures, Vernier architectures make use of DLLs to define a precise and stable propagation delays ( $\tau_1$ ,  $\tau_2$ ) hence, the time resolution, against the PVT variation.

For further improvements in time resolution, dead-time and FSR, a multi-stage interpolation has been proposed, by adding more interpolation stages [52,66,67]. The major disadvantage of this approach stands in the silicon area occupation, which limits multiple TDC implementations in SPAD arrays. Moreover, dedicated circuits to synchronize the interpolation stages are needed to avoid possible conversion errors in the final time interval measurement.

Fig. 5 shows the Venn diagram of the main TDC architectures described before, focusing on the key parameters needed for multiple or in-pixel implementation of TDCs in standard CMOS SPAD arrays (i.e., deadtime, resolution, power and area).

In LiDAR applications, especially when many TDCs are integrated in a SPAD array, low power and small area specifications play a key role even to the detriment of resolution. For multi-hit operation, fundamental for high dynamic range and background rejection, low deadtime is required. For these reasons, as we will show in Section 3, Vernier Delay line and Vernier Gated ring oscillators are poorly employed in LiDAR, TDLs are limited to relatively small arrays (up to about  $32 \times 32$ pixels), whereas ROs and GROs approaches are the preferred ones in most SPAD-based LiDAR detectors. The combination of these architectures with a counter in the Nutt interpolation allows to optimize TDC performance for pulsed-LiDAR applications, in terms of range, precision and power consumption.

# 3. Selected TDC architectures

In this Section, we review recent TDC architectures, implemented in SPAD and SiPM arrays for LiDAR. All the selected TDCs are based on Nutt interpolation architectures, exploiting a coarse counter to provide long FSR and a fine interpolator, either based on locked-loops (DLLs or PLLs) or Ring Oscillators (ROs), to reach high time resolution. Finally, we also analyze hybrid architectures called histogramming TDCs (hTDCs), which are TDCs architectures specifically conceived to be used in combination with histogram-builder circuits. All the presented TDCs are based on tapped delay line architectures, i.e. the dead-time is negligible, limited only by the propagation delays of the logic gates. Table 1 summarizes the main performance of the chosen TDCs, divided



Fig. 5. Venn diagram of main 'digital' TDC approach.

by architecture, and it compares them in terms of publication year, technology node, area of a single channel TDC, LSB (the best achievable is reported), FSR (corresponding to the reported LSB), linearity, power consumption (in many cases it has been deduced as a first approximation dividing the overall chip power by the number of TDCs, even if not only TDCs contributes to the overall power consumption), number of events that can be converted (single-hit or multi-hit), and main field of application.

# 3.1. Locked-loops architectures

Locked-loops can be used to propagate either the clock signal (to subdivide the main period in multiple and equally spaced phases) or the START signal (to sample its position along the line when the STOP signal arrives).

In Sesta [75], 80 TDCs are shared among 400 SPADs, in order to increase the arrival time statistics of a single-point rangefinder. The 8-bit TDCs achieve a 78 ps resolution over a 20 ns FSR, exploiting a 3-bit coarse counter and a 5-bit interpolator. The counter counts the number of coarse clocks between START (laser synchronism) and STOP (photon detection) signals. To reduce area occupation, routing complexity and power consumption, the fine interpolators uses both the information carried by the rising and falling edges of 16 clock phases with 50% duty cycle, generated by a single DLL and propagated along the entire array. The states of 16 clock phases are sampled by the trigger signal (START or STOP), providing the 5-bit fine resolution. In this implementation, separate START and STOP interpolators are used in order to implement the sliding-scale to improve the TDC linearity. This architecture presents some critical issues in the synchronization between the coarse counter and the fine interpolator, especially when the trigger signal arrives close to the coarse clock edge. In order to overcome these criticalities, Niclass [76], in a  $340 \times 96$  pixels array, proposes to use two 9-bit coarse counters, one incremented by the rising edge of the coarse clock and the other incremented by the falling edge. Depending on the phase sampled by the 3-bit interpolator, only the one free from ambiguity is considered in the final converted value. Differently from Sesta [75], the coarse counters are shared among all the pixels and their outputs, distributed along the entire array, are sampled in each of the 32 pixels, together with the multiphase clocks. Since the coarse counters are not stopped by the synchronization signal, multi-hit events can be measured, at the expense of more signals to be propagated throughout the array.

The TDC implemented by Zhang [77] employs a coarse counter with 640 MHz reference clock, whose state is sampled by the trigger signal, and a replica 16-taps tapped delay line that propagates the trigger signal, whose position along the line is sampled at the next coarse clock edge. The replica delay line is fed by a global DLL to make it immune to PVT variations. This architecture allows for multi-hit event time stamping, with a maximum count rate of 640 MHz, which is an important parameter since each TDC is shared among 64 SPADs of a 240  $\times$  160 SPAD array. In respect to [75] and [76], this architecture presents the advantage of a reduced power dissipation and area occupation, since no multi-phase clocks have to be distributed across the array.

# 3.2. Ring Oscillator (RO) architectures

ROs can be used to generated in-pixel high frequency clocks for fine interpolators, without the need for propagating high frequency multiphase clocks throughout the array, which would have detrimental effects both for the power consumption and also for the possibility of crosstalk with other array signals.

In Perenzoni [78], a reconfigurable TDC for high or low speed LiDAR applications is implemented in each pixel of a  $64 \times 64$  SPAD array. In high-speed mode, an 8-bit coarse counter is fed by a 40 MHz clock, whereas fine resolution of 500 ps is obtained through a GRO (started by the trigger signal and stopped by the next coarse clock edge), whose

#### Table 1

Performance of the selected TDC architectures.

| Ref.                               | Year | Technology         | Number of pixels | Number of<br>TDC | Channel area<br>(µm²) | LSB<br>(ps) | FSR<br>(ns) | DNL/INL<br>(% LSB) | Power<br>(μW)       | Multi-hit         | Application                 |  |
|------------------------------------|------|--------------------|------------------|------------------|-----------------------|-------------|-------------|--------------------|---------------------|-------------------|-----------------------------|--|
| Locked-loops architectures         |      |                    |                  |                  |                       |             |             |                    |                     |                   |                             |  |
| Niclass[76]                        | 2013 | 180 nm CMOS        | 340×96           | 32               | 27.7•10 <sup>3</sup>  | 208         | 853.5       | 52/73              | N.A.                | Yes               | 3D imaging<br>(Automotive)  |  |
| Sesta [75]                         | 2021 | 160 nm BCD         | 40×10            | 80               | 6.3•10 <sup>3</sup>   | 78          | 20          | 0.15/0.48          | 5000 <sup>a</sup>   | No                | Single-point<br>rangefinder |  |
| Zhang [77]                         | 2021 | 65 nm/65 nm<br>BSI | 240×160          | 600              | N.A.                  | 97.65       | 100         | 40/55              | 510 <sup>b</sup>    | Yes (640<br>Mcps) | 3D imaging (Mobile phone)   |  |
| Ping Oscillator (PO) architectures |      |                    |                  |                  |                       |             |             |                    |                     |                   |                             |  |
| Perenzoni<br>[78]                  | 2017 | 150 nm CMOS        | 64×64            | 1                | 1.8•10 <sup>3</sup>   | 250         | 6400        | 50/200             | 12 <sup>c</sup>     | No                | 3D imaging<br>(Navigation)  |  |
| Hutchings<br>[79]                  | 2019 | 90 nm/40 nm<br>BSI | 256×256          | 4096             | 130                   | 38          | 143         | 5/90               | N.A.                | No                | 3D imaging                  |  |
| Zhang [80]                         | 2019 | 180 nm CMOS        | 252×144          | 1728             | 4.2•10 <sup>3</sup>   | 48.8        | 50          | 50/90              | 300                 | No                | 3D imaging                  |  |
| Ximenes<br>[81]                    | 2019 | 45 nm/65 nm<br>CSI | 8×16             | 1                | 550                   | 60          | 1000        | 70/340             | 100                 | Yes               | 3D imaging                  |  |
| Histogramming TDC (hTDC)           |      |                    |                  |                  |                       |             |             |                    |                     |                   |                             |  |
| Hutchings                          | 2019 | 90 nm/40 nm<br>BSI | 256×256          | 4096             | 150                   | 560         | 9           | N.A.               | 19 <sup>d</sup>     | Yes               | 3D imaging                  |  |
| Seo [84]                           | 2021 | 110 nm CIS         | $1 \times 36$    | 36               | 234•10 <sup>3 e</sup> | 156         | 320         | 100/150            | $5000^{\mathrm{f}}$ | Yes               | 3D imaging<br>(Automotive)  |  |

<sup>a</sup> including the power consumption of multiphase clock drivers, 220 mW overall chip power consumption.

<sup>b</sup> 306 mW overall chip power consumption, including 600 TDCs.

<sup>c</sup> 50 mW overall logic power consumption, including 4096 TDCs.

<sup>d</sup> 77.6 mW overall chip power consumption, including 4096 TDCs.

<sup>e</sup> Including also the histogram circuitry.

f 180 mW overall chip power consumption, including 36 hTDCs.

output clock periods are counted by a 7-bit counter. An additional LSB is obtained sampling the state of the RO output clock, leading to a final 16bit (250 ps) resolution. In low-speed mode, the two counters are cascaded (providing totally 15-bit resolution) and fed by a 100 MHz clock. An architecture based on GRO is presented also by Hutchings [79]. In this case, in an array of  $256 \times 256$  SPADs, groups of  $4 \times 4$  SPADs share the same TDC, which can be operated either in TCSPC mode or in histogramming mode. Differently from [78], when the TDC is operated in TCSPC mode, the 11-bit coarse resolution is provided counting the GRO output clock periods between START and STOP, whereas the 3-bit fine resolution is obtained from the phases of the 4-stage GRO. The architecture presented in Perenzoni [78] is preferable when long FSR are required (in fact the GRO is activated only for a short period between the STOP signal and the next coarse clock rising edge), whereas the architecture in Hutchings [79] presents better resolution (38 ps), given by the GRO phases. The architecture presented by Zhang [80] describes a SPAD array with 252  $\times$  144 SPADs and 1728 TDCs, i.e. 6 TDCS are shared among the 126 pixels in a half-column. This architecture puts together the main advantages of [78] and [79]. In fact, thanks to a dual-clock architecture it is possible to overcome the tradeoff between fine resolution, achievable by sampling the GROs phases, and long FSR, achievable by counting the periods of a reference clock. The architecture is based on three stages: a coarse counter counts the periods of a reference clocks at 320 MHz, the RO counter counts the number of GRO periods (2.56 GHz) between the trigger and the next reference clock, and the GRO phases are used to get the fine resolution of 48.4 ps. Furthermore, in respect to [79], this approach has the advantage of minimizing the effects of the period mismatches among the GROs of different TDCs, since the overall FSR of the GRO counter is much shorter than the TDC FSR. Minimizing the effect of mismatches among TDCs is particularly important in the SPAD array propose in Zhang [80], since the 6 halfcolumn TDCs are dynamically reallocated in a daisy-chain approach, thus the same SPAD pixel can be connected to a different TDC in different acquisition slots (i.e. all TDCs mismatches translate in a degradation of the distance precision). The main limitation of this architecture is that the STOP frequency must be a divider of the reference clock (e.g., 80 MHz, 40 MHz...). This limitation could be easily overcome by introducing a global STOP interpolator, with the additional advantage of improving the linearity (i.e., by implementing the slidingscale technique).

The presented architectures based on GROs are limited to single-hit operation. In order to perform multi-hit detection, architectures based on free-running ROs, whose outputs are sampled by the trigger signal, can be used. In Ximenes [81], a LiDAR module with one TDC shared among 128 SPADs is described. The TDC 1 GHz Voltage Controlled RO (VCO) is connected to a 10-bit ripple counter and, when the trigger event happens, the counter output is sampled with the combination of a counter resampler and a TOF sampler with matched delay buffers for the clock signal. The resampler and the TOF sampler are registers made of standard cell flip-flops (FF), and they are used to correctly sample the asynchronous counter output. At the same time, also the 8-stage pseudodifferential RO is sampled through Sense-Amplifier FFs (SA-FFs), which are faster than standard cells. Since the RO is never stopped, multi-hit operation is possible. This is a particularly important feature in the presented array, since each TDC is shared among 128 pixels. Padmanabhan [82] presents a TDC architecture similar to Ximenes [81], but with the addition of a soft coupling (implemented through transmission gates) with the VCOs of the four neighboring pixels, in order to reduce phase noise and jitter among pixels (i.e., by favoring the injection locking, which is a desirable effect in multipixel 3D imagers). Indeed, the RO oscillation frequency variations among the 36 TDCs in the array decrease from 5% (without soft coupling) to 0.18% (with soft coupling).

#### 3.3. Histogramming TDCs

Histogramming TDCs (hTDCs) are a class of TDCs specifically conceived to be coupled with histogram-builder circuitry, thus their output is typically a one-hot code, instead of the binary code most commonly used for standalone TDCs [83]. The architecture presented by Hutchings [79] can be used both as standalone TDC for TCSPC

timestamping (as described in previous Section 3.2), or as a hTDC, in which the Multi-Event TDC (METDC) is composed by a shift register that propagates the photon event signal at each reference clock period, an EXOR network that identifies the position of the event signal along the shift register, and a parallel register that samples the event signal position when the STOP signal arrives, thus providing the METDC output as a one-hot code. The shift register reference clock can be internally generated by the same GRO used in TCSPC mode, or it can be externally provided. The achieved resolution is 560 ps per bin, with a limited total number of bins (i.e., 16 bins, which correspond to 4-bit resolution).

Another architecture of hTDC has been proposed by Seo [84] in a 36channels linear scanning LiDAR sensor. In this case, the TDC (and consequently the histogram) presents a 5-bit coarse block and a 6-bit fine interpolator. The coarse TDC uses a shift register to shift the global START signal at each period of the reference clock (100 MHz) and the trigger signal samples the START position along the shift register. The fine TDC propagates the trigger signal along a replica delay line (controlled by a global DLL) and its position is sampled by the next reference clock edge, like in standard locked loop interpolators. The final achieved resolution is 156 ps. More details about the histogram computation will be presented in Section 5.

# 3.4 Discussion about the selected TDC architectures

In the present Section 3, we analyzed TDC architectures based on locked loops and ring oscillators or specifically designed to build histograms. The performance comparison of the selected architectures is summarized in Table 1.

The best resolution (38 ps, corresponding to about 6 mm distance resolution) is achieved with a RO architecture (Hutchings [79]), but similar resolutions can be achieved also with locked loop approaches. Typically, hTDCs have worse resolutions, mainly because, when building a histogram, the number of histogram bins are limited by silicon area considerations.

Since most of the presented approaches are based on a coarse and a fine structure, the TDC FSR is given by the bit number of the coarse counter, as already mentioned in Section 2, while the linearity seems not to be affected by the employed architecture, since probably it is more related to the stability of the employed reference clock. A noticeable example is Sesta [75], which reaches a much better linearity in respect to the other presented TDCs (0.15% LSB DNL, 0.48% LSB INL), thanks to the sliding-scale technique implementation.

Power consumption values are not easily comparable, because often manuscripts only provide the overall chip power dissipation and the conditions at which they measure it (e.g. count rate) can be different. In general, configurations based on GRO have low power consumption, because the high frequency clocks are locally generated (and not distributed across the entire array) and the ring oscillates for a limited period of time between START and STOP pulses.

Multi-hit operation can be achieved both with locked loop and RO based architectures, at the condition of not blocking neither the coarse counter nor the fine interpolator, but sampling their states. This can cause several issues in the synchronization between coarse and fine TDC, leading to conversion errors. This issue has been addressed in Niclass [76] by using two coarse counters fed by complementary counter-phase clocks, and in Ximenes [81] by using SA-FF to speed-up the sampling operation.

Another important feature to be considered is TDC the reconfigurability. In most of the presented architectures, LSB can be traded off with FSR just by modifying the reference clock frequency. The TDC presented in Perenzoni [78] can be completely reconfigured to accommodate both high-speed and low-speed operations, just by modifying the connection between the two counters which the TDC is based on. The architecture in Hutchings [79] can operate in two completely different modalities, i.e. TCSPC time stamping or histogramming, for high resolution single-hit measurement or low-resolution multi-hit operation, respectively. In summary, a multistage GRO based TDC as the one presented in Zhang [58] is a good option when background rejection and high dynamic range are not primary requirements (since only single-hit operation is possible), whereas precise resolution and long FSR constitute a must (e.g., in night vision). Single-hit TDCs are an option also when the background rejection is performed at the sensor level (i.e., with digital SIPM and coincidence detection approaches), thus only laser-related triggers reach the TDC. For outdoor environments with strong solar illumination (e.g., automotive applications), multi-hit approaches are typically preferred, and in particular hTDCs simplify the task of building on-chip histograms, at the expense of a worse resolution or a shorter FSR. Anyway, with strong background illumination, heavy limitations to resolution and FSR are given by system considerations (e.g., laser power and pulse-width), thus also the specifications for the TDCs are less stringent.

#### 4. Data processing for pulsed-LiDAR

In principle, a TDC could provide the distance measurement with just one conversion, i.e., one detected photon. However, both SPAD dark counts and background light can trigger spurious TDC conversions. Given the impossibility to distinguish between signal photons and background photons with a single SPAD, a set of repetitive measurements is typically performed. By exploiting the TCSPC technique [15], it is possible to repeat the pulsed laser excitation to record each TOF measurement, and to build a histogram of arrival times. Thereafter, the histogram peak detection and computation of the peak centroid provides the average TOF, hence the distance information using equation (1). In this way, the Signal-to-Noise Ratio (SNR) improves, also improving the final distance precision. In fact, the TOF precision can be estimated as follows:

$$\sigma_{TOF} = \frac{\sigma_{\text{single-shot}}}{\sqrt{N}} \tag{7}$$

where  $\sigma_{\text{single-shot}}$  is the single-shot precision, reported in (3) and N the number of valid signal photons detected by the considered pixel.

However, in presence of ambient light or sunlight, the pixels of a SPAD array may be triggered by the earlier arriving ambient photons and the signal photons may be detected at a later time. If the TDC is able to convert only one event per laser pulse (single-hit operation), the ambient photon masks the signal and leads to a distortion in the final TOF histogram, known as photon pile-up [18], which makes it difficult to locate the laser pulse and results in large distance errors. The simplest approach to reduce pile-up is the optical attenuation of photon flux, but it reduces significantly also the signal photons. Therefore, various computational and hardware [20,68–70]solutions have been proposed to mitigate this effect. Among the hardware solutions, the implementation of multi-hit TDCs represents a valid solution, by allowing to optimize the detection of laser photons in combination to the photon coincidence solutions, which allow to reduce the possible conversions due to the ambient light.

In the histogram computation, the measured TOF is used as address of the histogram bins and the value stored in each bin represents how many times the corresponding TOF has been measured. However, when considering real-time high-performance LiDAR systems with many pixels, storing histograms with many bins and sufficient depth for each bin requires a huge amount of memory, which is critical to implement in a single ASIC.

The TOF histograms can be created off-chip by reading all TOF values of each pixel, using an external DSP/FPGA board. However, the readout of all raw TOF values is time-demanding and a very high-speed I/O system is needed in order to guarantee real-time acquisitions.

Therefore, different architectures and algorithms for ASICs have been proposed for histogram and processing on-chip integration in order to reduce data collection and transfer [71–74]. A straightforward solution is to integrate a simple Full Histogram (FH). However, due to area

constraints, usually only few full histograms can be integrated, with some limitations in terms of bins (i.e., range and resolution) and memory (i.e., depth). In some cases, a limited size histogram is shared among a group of SPAD pixels, activated by a photon coincidence to reduce the pile-up distortion, as described in Section 1.

To decrease the histogram required memory and allow an in-pixel implementation, various architectures have been proposed, using a Partial Histogram (PH) with lower number of bins. Scanning Histogram (SH) is one of these algorithms, which stores only a part of the full histogram (i.e., partial histogram) at a time and it is referred as time gated scanning approach [72]. It consists of partitioning the full range and creating partial histograms for each sub-range. In post processing, all partial histograms are combined in order to reconstruct the full histogram and detect the TOF peak. This allows to reduce the on-chip histogram memory since one partial histogram can be used for each sub-range with a number of bins lower than the number needed to cover the full range. Considering  $2^{N_f}$  bins for the full range and  $2^{N_p}$  bins for the partial histogram, this algorithm will require  $2^{N_f - N_p}$  partial histograms to scan the full range. The main disadvantage of this approach is the need to scan the full range histogram, thus the overall frame rate decreases by the number of partial histograms required to be scanned.

To further improve the frame rate, an alternative approach called Double-Stage Histogram (DH) has been proposed. It consists in building two different histograms: the MSBs-Histogram with  $2^{N_m}$  bins is created from the most significant bits and the LSBs-Histogram with 2<sup>N1</sup> bins from the remaining least significant bits [74]. Since the noise spread along full histogram is folded into both histograms and typically  $N_m < N_l$ , the noise floor of MSB histogram is higher than the one of LSB histogram due to the lower number of bins, which also is higher than the noise of conventional full histogram. The first approximation of the TOF measurement is performed after two histogram acquisitions and it is computed by summing the peak bins of both histograms. Therefore, only two histogram acquisitions are needed to perform the measurement and the overall frame rate decreases only by a factor of 2, which is smaller in respect to the SH approach. However, an uncertainty error could occur when the laser pulse is swept through multiple of MSB histogram bins. In this condition, an error equivalent to 2<sup>N1</sup> bins can be done in the centroid identification. An additional histogram could be acquired around the centroid estimated from the MSBs-Histogram and the LSBs-Histogram, to perform further corrections of the uncertainty errors and to compute the TOF centroid more accurately.

# 5. Selected data processing solutions

As shown in Section 4, raw TOF information require plenty of time to be readout, especially in large SPAD arrays with multi-hit TDCs. Some techniques to reduce the amount of information to be readout are here reported, involving on-chip processing and other smart solutions.

One first approach could be the implementation of an on-chip Digital Signal Processor (DSP) to process in real-time the TOF conversions provided by the TDCs. In Ximenes [81], a Digital Processing and Communication Unit (DPCU) has been designed and implemented on-chip. The solution consists of an Arithmetic Logic Unit (ALU) that performs a low pass filtering on the 14-bit TOF conversions. The logic implements a digital Infinite Impulse Response (IIR) filter that averages the signal at each repetition, drastically reducing the uncertainty due to all the jitter sources (i.e., laser, SPAD and TDC). The filter pole can be tuned by a right-shift of a parameter ( $\lambda$ ), thereby a smaller value will perform better filtering at the cost of a slower response. Note that such solution is effective in a low noise environment, where the laser pulse is well distinguished compared to the noise floor.

As already described in the previous Section 4, various solutions have been proposed to build and reconstruct the timing histogram directly on-chip, allowing to reduce data throughput and avoid off chip further processing of TOF values. Concerning the Full Histogram (FH) reconstruction, the classical architecture has been proposed in Sesta [75], whose TDC architecture has been already described in Section 3. Since the TDC presents two separate START and STOP interpolators, the TOFs are computed as STOP – START conversion difference by an onchip subtractor, and then the TOF of all TDCs are accumulated in the same FH builder. The histogram has 256 bins, 78 ps resolution (20 ns FSR) and 14-bit depth for each bin. However, this approach is not scalable to long FSRs, due to the huge required on-chip memory footprint.

Another FH solution can be found in Hutchings [79]. As described in Section 5, this array can operate either in TCSPC mode or in histogramming mode. In histogramming mode, the METDC, which provides a "one-hot" output code, is directly connected to the register used to store the histogram. The 16 bins can range from 560 ps to 560 ns depending on the selected clock generated by the GRO. The result is a very efficient histogram builder that requires compact electronics.

As already seen in Section 4, a trade-off exists between resolution, histogram FSR and area occupation, thereby new strategies have been studied to provide just a portion of the histogram that includes the laser echo peak. The result is a processing that requires less storage, by reducing the number of bins with a Partial Histogram (PH). The first important solution has been proposed in Zhang [80]. They implemented the Partial Histogram Readout (PHR) method, which is based on Double-Stage Histogram (DH) approach and it consists of a Peak Detection (PD) phase and a final histogram phase. The first one is used to detect the bin related to the peak, while the second one is needed to build the final partial histogram. In the PD, the peak value is computed in three steps in which sub-histograms of 8 bins are accumulated. In fact, the 10-bit TDC conversions are divided in group of 3 bits (bit-0 is not considered in this first phase) and 3 different histograms are computed in these 3 steps, as shown in Fig. 6. At the end, a final histogram is computed by generating a 16 bins window around the detected peak (exploiting also the bit-0) and accumulating a partial histogram of 16 bins and 5-bit depth. This technique achieves a very high compression, even if two noticeable problems, typical of all DH architectures, arise. The first consists in the fact that the 3 sub- histograms show an increased noise floor due to the refolding of the FH noise and background in only 8 bins. The second issue regards the peak value estimation error committed if the laser echo appears between 2 different bins, which leads to a possible error of a full sub-histogram range.

Zhang [77] proposes a post-processing solution on the PHR technique, where a matching filter is applied to the fine histogram to reconstruct the peak position. To compensate the error of border peaks, the Histogram Distortion Correction (HDC) is implemented so that the relation between the actual peak and the measured peak is defined through a mathematical model. The model is obtained by shifting the laser pulse to the right and the left boarders and subsequently performing the matching filter.

Another improved solution has been implemented in Vornicu [85] with the Shifted Inter-Frame Histogram (SiFH). In contrast to PHR, the SiFH performs a real subtraction to reconstruct the fine histogram instead of a bit masking. A first coarse peak is computed exploiting only



Fig. 6. Peak Detection (PF) phase in the PHR method proposed in [80].

 $N_{coarse}$  MSBs of the  $N_{tot}$  total histogram bins. The fine histogram is accumulated by filtering the TDC conversions in respect to two thresholds:

$$TH_{+} = 2^{N_{tot} - N_{coarse}} \cdot b_{peak, coarse} + SB - b_{os}$$
(8)

$$TH_{-} = 2^{N_{tot} - N_{coarse}} \cdot b_{pack \ coarse} - SB - b_{os} \tag{9}$$

The coarse peak ( $b_{peak,coarse}$ ) is here exploited taking some extra bins (SB) and the offset  $b_{os}$  to correct the peak position. The fine histogram is computed as follows: the filtered TDC conversions are shifted to the first bin by subtracting a defined value ( $\Delta$  in Fig. 7). The fine bits ( $N_{tot}$ - $N_{coarse}$ ) are then computed by building the new histogram. The fine histogram is reconstructed without noise folding and border effects thanks to a subtraction in the bins instead of discarding bits.

All the presented PH architectures involve at least a counters number equivalent to the PH bin number to build the sub-histograms, so some silicon area must be reserved to such electronics. An innovative solution has been introduced in Kim [86]. The histogram builder circuit is based on the Successive Approximation Register (SAR) approach, already used for ADCs. The method divides the observation time duration in two windows so that, once the peak is detected, the related sub-frame is halved again and the peak detection keeps going on the new window. A window generator is needed to gate the SPAD pixels in one portion of the FSR, while an up/down counter counts the photons in the first half of the searching region (in up mode) and in the second one (in down mode), thus the related bit value results in the counter sign bit. The architecture requires just an up/down counter and it is designed in an exponential counting mode, in order to speed up the conversion: the more the coincident photons are, the higher the counter increment values. After these N steps in which N TOF bits are computed, some extra fine bits are defined. The last sub-frame is divided in two windows and the counter performs the counts in these two regions (Countfirstsub-frame and Countsecondsub-frame respectively) similarly to an iTOF [87] acquisition. The fine value results:

$$T_{lastsub-frame} \frac{Count_{firstsub-frame}}{Count_{firstsub-frame} + Count_{secondsub-frame}}$$
(10)

This SAR-based histogram builder architecture can be easily integrated in-pixel due to the small area electronics involved.

Another histogramming TDC has been implemented in Seo [84], and



**Fig. 7.** SiFH processing diagram, showing the measured coarse histogram (top), the filtered data (center) and the achieved fine histogram (bottom).

already introduced in Section 3. The histogram builder is based on a mixed-signal architecture and a series of 64 capacitors represents the histogram storage. In this case, by exploiting the TOF measured by the coarse hTDC and expressed in a one-hot code, a current pulse is injected in the capacitor bin corresponding to the measured TOF, so that after some repetitions the analog voltage is proportional to the number of events per bin. The bin whose voltage exceeds a fixed threshold (chosen to discriminate the peak from the noise floor) is considered the coarse histogram centroid. An additional feature has been added to make the system immune to interference with other active LiDAR systems. In fact, the laser generates two light pulses at a well-defined time delay (characteristic of the specific system), and the coarse centroid is identified when two peaks with the characteristic delay are detected. Once the coarse peak position is identified, SPADs are activated (gated) in a limited temporal window around the peak, and all the conversions are used in the fine histogram computation, without the need to filter the converted data, as in the other selected architectures [77,80,85], and with the additional advantage of reducing the pile-up effect. The fine histogram is computed with an analog approach similar to the coarse histogram, thus by exploiting the same 64 capacitors and current generators, but this time considering the conversion of the fine hTDC.

In summary, we reviewed some on-chip post-processing strategies to compute the final TOF without reading-out all the TDC conversions at every frame. The histogram building is usually the best options. As Sesta [75] and Hutchings [79] propose, a FH can be acquired employing a reduced number of bins (255 in [75] and 16 in [79]) at the cost of low FSR. To overcome this issue the PH approach is exploited, reducing the memory occupation at the cost of more repetitions. Indeed, PH requires TOF accumulations in multiple phases, therefore many TDC conversions are wasted in order to zoom and select the region of interest within the FSR. Zhang [80] and Vornicu [85] employ these multiple phases histogramming to provide the final TOF. Kim [86] proposes a SAR TDC that requires counters and window generators and that computes one TOF bit at a time, similarly to an ADC converter. Finally, the hTDC architecture proposed by Seo [84] implements a mixed-signals hTDC with coarse and fine steps, requiring a sliding observation window. All these solutions are summarized in Table 2. In the future, combinations of the presented architectures could be explored, for instance by exploiting the idea introduced in Seo [84] of gating the SPADs around the peak to compute the fine histogram, together with the digital approaches to compute the PH proposed in Zhang [77,80] and Vornicu [85].

# 6. Conclusion

In this work, we have presented a review of TDC architectures and data processing solutions to be integrated in CMOS SPAD array for compact and high performance pulsed-LiDAR systems. In addition, the TDC performance parameters and principal data processing algorithms are provided and analyzed specifically for the application.

Over the years, various SPAD arrays and SiPMs with integrated TDCs and data processing algorithms have been published for long-range and high-resolution pulsed LiDAR. These arrays have been studied and compared in terms of TDC and data processing performance, in view of highlighting the most innovative solutions. Table 1 summarizes the main figures of merit of TDCs integrated in modern CMOS SPAD and SiPM arrays for pulsed-LiDAR, while post processing solutions are compared in Table 2.

As shown along the paper, recent developments in scaled technology and in 3D stacking give the possibility to integrate low-power highperformance TDCs and also smart data processing for final TOF information extraction. While TDC architectures are already mature and suitable for the most advanced LiDAR requirements, further improvement in data processing algorithm could be possible. A definitive upgrade would be to implement on-chip more advanced algorithms even including neural networks, to prevent or compensate distortions due to photon pile-up or real-world conditions (e.g., fog and rain), or to Performance of the selected data processing solutions based on histogram generation.

| Ref.                    | Year         | Architecture                                                 | Number of channels | Channel area<br>(mm <sup>2</sup> ) | Histogram<br>steps | LSB (ps)                          | FSR (ns)                         | Number of<br>bins <sup>a</sup> | Approach           |
|-------------------------|--------------|--------------------------------------------------------------|--------------------|------------------------------------|--------------------|-----------------------------------|----------------------------------|--------------------------------|--------------------|
| Sesta [75]<br>Hutchings | 2021<br>2019 | Full Histogram (FH)<br>Full Histogram (FH)                   | 1<br>4096          | 0.6<br>N.A.                        | 1<br>1             | 78<br>560/<br>560000 <sup>b</sup> | 20<br>9.17/<br>9170 <sup>b</sup> | 256<br>16                      | Digital<br>Digital |
| Zhang [80]<br>Seo [84]  | 2021<br>2021 | Partial Histogram Readout (PH)<br>Coarse-Fine Histogram (PH) | 72<br>36           | 1.22<br>0.23                       | 4<br>2             | 48.8<br>156.25                    | 25<br>320                        | 8<br>64                        | Digital<br>Analog  |
| Vornicu [85]            | 2019         | Shifted Inter-Frame Histogram<br>(PH)                        | N.A.               | 0.4 <sup>c</sup>                   | 2                  | N.A.                              | N.A.                             | 256                            | Digital            |
| Kim [86]                | 2021         | Successive Approximation<br>Register (PH)                    | 1920               | 0.004                              | 8                  | 90                                | 640                              | 1                              | Digital            |

<sup>a</sup> Corresponding to on-chip memory width.

<sup>b</sup> The FSR can be extended at the cost of wider LSB.

<sup>c</sup> Estimated with a 90 nm CMOS technology.

compute the TOF centroid in order to reach increasingly high distance precision without any off-chip processing.

#### CRediT authorship contribution statement

Vincenzo Sesta: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Alfonso Incoronato: Data curation, Formal analysis, Writing – original draft, Visualization. Francesca Madonini: Data curation, Formal analysis, Writing – original draft, Visualization. Federica Villa: Writing – original draft, Writing – review & editing, Supervision.

# **Declaration of Competing Interest**

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

# Data availability

The authors do not have permission to share data.

#### References

- M.E. Warren, Automotive LIDAR Technology, 2019 Symposium on VLSI Circuits, 2019, pp. C254-C255, https://doi.org/10.23919/VLSIC.2019.8777993.
- [2] A.F. Elaksher, S. Bhandari, C.A. Carreon-Limones, R. Lauf, Potential of UAV lidar systems for geospatial mapping, in: Proc. SPIE 10406, Lidar Remote Sensing for Environmental Monitoring, San Diego, CA, USA, 2017, 104060L, https://doi.org/ 10.1117/12.2275482.
- [3] A.W. Yu et al., Orbiting and in-situ lidars for earth and planetary applications, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, 2020, pp. 3479-3482, https://doi.org/10.1109/ IGARSS39084.2020.9323088.
- [4] G. Bui, B. Morago, T. Le, K. Karsch, Z. Lu, Y. Duan, Integrating videos with LIDAR scans for virtual reality, IEEE Virtual Reality (VR) 2016 (2016) 161–162, https:// doi.org/10.1109/VR.2016.7504703.
- [5] D. Bronzi, Y. Zou, F. Villa, S. Tisa, A. Tosi, F. Zappa, Automotive three-dimensional vision through a single-photon counting SPAD camera, IEEE Trans. Intell. Transport. Syst. 17 (3) (2016) 782–795, https://doi.org/10.1109/ TITS.2015.2482601.
- [6] B. Behroozpour, P.A.M. Sandborn, M.C. Wu, B.E. Boser, Lidar system architectures and circuits, IEEE Commun. Mag. 55 (10) (2017) 135–142, https://doi.org/ 10.1109/MCOM.2017.1700030.
- [7] D.J. Lum, S.H. Knarr, J.C. Howell, Frequency-modulated continuous-wave LiDAR compressive depth-mapping, Opt. Express 26 (2018) 15420–15435, https://doi. org/10.1364/OE.26.015420.
- [8] F. Zhang, L. Yi, X. Qu, Simultaneous measurements of velocity and distance via a dual-path FMCW lidar system, Opt. Commun. 474 (2020), 126066, https://doi. org/10.1016/j.optcom.2020.126066.
- [9] C. Rogers, A.Y. Piggott, D.J. Thomson, et al., A universal 3D imaging sensor on a silicon photonics platform, Nature 590 (2021) 256–261, https://doi.org/10.1038/ s41586-021-03259-y.

- [10] F. Villa, F. Severini, F. Madonini, F. Zappa, SPADs and SiPMs arrays for long-range high-speed light detection and ranging (LiDAR), Sensors 21 (111) (2021) 3839, https://doi.org/10.3390/s21113839.
- [11] B. De Monte, R. T. Bell, Development of an EMCCD for lidar applications, in: Proc. SPIE 10565, International Conference on Space Optics 2010, Rhodes Island, Greece, 2010, https://doi.org/10.1117/12.2309150.
- [12] L. Cester, A. Lyons, M.C. Braidotti, D. Faccio D., Time-of-flight imaging at 10 ps resolution with an ICCD camera, Sensors 19 (1) (2019) 180, https://doi.org/ 10.3390/s19010180.
- [13] G. Adamo, A. Busacca, Time of flight measurements via two LiDAR systems with SiPM and APD, in: 2016 AEIT International Annual Conference (AEIT), 2016, pp. 1-5, https://doi.org/10.23919/AEIT.2016.7892802.
- [14] D. Phillips, R.C. Drake, D.V. O'Connor, R.L. Christensen, Time correlated singlephoton counting (Tcspc) using laser excitation, Instrum Sci. Technol. 14 (3–4) (1985) 67–292, https://doi.org/10.1080/10739148508543581.
- [15] W. Becker, Advanced Time-Correlated Single Photon Counting Techniques. Berlin, Germany: Springer-Verlag, 2005, https://doi.org/10.1117/12.529143.
- [16] V. Sesta, F. Severini, F. Villa, R. Lussana, F. Zappa, K. Nakamuro, Y. Matsui, Spot tracking and TDC sharing in SPAD arrays for TOF LiDAR, Sensors 21 (9) (2021) 2936, https://doi.org/10.3390/s21092936.
- [17] D. Portaluppi, E. Conca, F. Villa, 32 × 32 CMOS SPAD imager for gated imaging, photon timing, and photon coincidence, IEEE J. Sel. Top. Quantum Electron. 24 (2) (2018) 1–6, https://doi.org/10.1109/JSTQE.2017.2754587.
- [18] A. Gupta, A. Ingle, M. Gupta, Asynchronous single-photon 3D imaging, 2019 IEEE/ CVF International Conference on Computer Vision (ICCV), 2019, https://arxiv.org/ abs/1908.06372v1.
- [19] B. Buttgen, P. Seitz, Robust optical time-of-flight range imaging based on smart pixel structures, IEEE Trans. Circuits Syst. I Regul. Pap. 55 (6) (2008) 1512–1525, https://doi.org/10.1109/TCSI.2008.916679.
- [20] A. Incoronato, M. Locatelli, F. Zappa, Statistical modelling of SPADs for time-offlight LiDAR, Sensors 21 (13) (2021) 4481, https://doi.org/10.3390/s21134481.
- [21] W.C. Kwong, W.-Y. Lin, G.-C. Yang, I. Glesk, 2-D Optical-CDMA modulation in automotive time-of-flight LIDAR systems, in: 2020 22nd International Conference on Transparent Optical Networks (ICTON), 2020, pp. 1-4, http://doi.org/10.1109/ ICTON51198.2020.9203019.
- [22] H. Wenzel, et al., High pulse power wavelength stabilized 905 nm laser bars for automotive LiDAR, IEEE High Power Diode Lasers and Systems Conference (HPD) 2019 (2019) 7–8, https://doi.org/10.1109/HPD48113.2019.8938682.
- [23] S. Tancock, E. Arabul, N. Dahnoun, A review of new time-to-digital conversion techniques, IEEE Trans. Instrum. Meas. 68 (10) (2019) 3406–3417, https://doi. org/10.1109/TIM.2019.2936717.
- [24] M.W. Fishburn, E. Charbon, Time-to-digital converters for PET: An examination of metrology aspects, in: 2012 IEEE Nuclear Science Symposium and Medical Imaging Conference Record (NSS/MIC), 2012, https://doi.org/10.1109/ NSSMIC.2012.6551222.
- [25] R. Sumner, A sliding scale method to reduce the differential non linearity of a time digitizer, in: 2001 IEEE Nuclear Science Symposium Conference Record (Cat. No.01CH37310), 2001, pp. 803-806 vol. 2, https://doi.org/10.1109/ NSSMIC.2001.1009679.
- [26] F. Arvani, T.C. Carusone, E.S. Rogers, TDC sharing in SPAD-based direct time-offlight 3D imaging applications, in: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, https://doi.org/10.1109/ISCAS.2019.8702586.
- [27] J. Kalisz, Review of methods for time interval measurements with picosecond resolution, Metrologia 41 (1) (2004) 17–32, https://doi.org/10.1088/0026-1394/ 41/1/004.
- [28] R. Machado, J. Cabral, F.S. Alves, Recent developments and challenges in FPGAbased time-to-digital converters, IEEE Trans. Instrum. Meas. 68 (11) (2019) 4205–4221, https://doi.org/10.1109/TIM.2019.2938436.
- [29] A. El-Hadbi, O. Elissati, L. Fesquet, Time-to-digital converters: a literature review and new perspectives, in: 2019 5th International Conference on Event-Based Control, Communication, and Signal Processing, 2019, https://doi.org/10.1109/ EBCCSP.2019.8836857.
- [30] S. Tancock, J. Rarity, N. Dahnoun, Developments in time-to-digital converters during 2020, in: 2021 7th International Conference on Event-Based Control,

#### V. Sesta et al.

Communication, and Signal Processing (EBCCSP), 2021, https://doi.org/10.1109/ EBCCSP53293.2021.9502397.

- [31] P. Kwiatkowski, D. Sondej, R. Szplet, Subpicosecond resolution time interval counter with multisampling wave union type B TDCs in 28 nm FPGA device, Measurement 209, 112510, https://doi.org/10.1016/j. measurement.2023.112510.
- [32] L. Xiang, P. Yang, T. Wu, M. Zhou, Ultra compact pulse shrinking TDC on FPGA, Measurement 203, 111874, https://doi.org/10.1016/j. measurement.2022.111874.
- [33] N. Lusardi, F. Garzetti, A. Geraci, Digital instrument with configurable hardware and firmware for multi-channel time measures, Rev. Sci. Instrum. 90 (2019), 055113, https://doi.org/10.1063/1.5028131.
- [34] F. Garzetti, N. Corna, N. Lusardi, A. Geraci, Time-to-digital converter IP-core for FPGA at state of the art, IEEE Access 9 (2021) 85515–85528, https://doi.org/ 10.1109/ACCESS.2021.3088448.
- [35] A.O. Korkan, H. Yuksel, A novel time-to-amplitude converter and a low-cost wide dynamic range FPGA TDC for LiDAR application, IEEE Trans. Instrume. Measur. 71 (2022) 1–15, Art no. 2005015, doi: 10.1109/TIM.2022.3200117.
- [36] N. Corna, F. Garzetti, N. Lusardi, A. Geraci, Digital instrument for time measurements: small, portable, high-performance, fully programmable, IEEE Access 9 (2021) 123964–123976, https://doi.org/10.1109/ ACCESS.2021.3109155.
- [37] F. Garzetti, et al., Assessment of the bundle SNSPD plus FPGA-based TDC for highperformance time measurements, IEEE Access 10 (2022) 127894–127910, https:// doi.org/10.1109/ACCESS.2022.3227462.
- [38] N. Lusardi et al., High-resolution imager based on time-to-space conversion, IEEE Trans. Instrum. Measur. 71 (2022) 1-11, Art no. 2004811, doi: 10.1109/ TIM.2022.3198442.
- [39] G. Acconcia, M. Ghioni, I. Rech, 4.3ps RMS Jitter Time to amplitude converter in 350nm SI-Ge Technology, in: 2021 7th International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP), 2021, https://doi.org/ 10.1109/EBCCSP53293.2021.9502398.
- [40] D. Morrison, S. Kennedy, D. Delic, M.R. Yuce, J.-M. Redouté, A 64 × 64 SPAD flash LIDAR sensor using a triple integration timing technique with 1.95 mm depth resolution, IEEE Sens. J. 21 (10) (2021) 11361–11373, https://doi.org/10.1109/ JSEN.2020.3030788.
- [41] R. Rashidzadeh, M. Ahmadi, W.C. Miller, Short time interval measurement using a time amplifer, Can. Conf. Electr. Comput. Eng. 2008 (2008) 000321–000324, https://doi.org/10.1109/CCECE.2008.4564548.
- [42] R. Nutt, Digital time intervalometer, Rev. Sci. Instrum. 39 (9) (1968) 1342–1345, https://doi.org/10.1063/1.1683667.
- [43] N. Lusardi, F. Garzetti, A. Geraci, The role of sub-interpolation for delay-line timeto-digital converters in FPGA devices, Nucl. Instrum. Methods Phys. Res. Sect. A: Accelerat. Spectromet. Detect. Assoc. Equip. 916 (2019) 204–214, https://doi.org/ 10.1016/j.nima.2018.11.100.
- [44] Y. Wang, W. Xie, H. Chen, D.D. Li, High-resolution time-to-digital converters (TDCs) with a bidirectional encoder, Measurement 206 (2023), 112258, https:// doi.org/10.1016/j.measurement.2022.112258.
- [45] E. Zhou, et al., An FPGA-based 48-channel, 250 mega samples per second throughput time measurement system, Nucl. Instrum. Methods Phys. Res., Sect. A 1046 (2023), 167668, https://doi.org/10.1016/j.nima.2022.167668.
- [46] T.E. Rahkonen, J.T. Kostamovaara, The use of stabilized CMOS delay lines for the digitization of short time intervals, IEEE J. Solid State Circuits 28 (8) (1993) 887–894, https://doi.org/10.1109/4.231325.
- [47] Y. Arai, A high-resolution time digitizer utilizing dual PLL circuits, Proc. IEEE Nuclear Science Symp. Conf. Rec. 2 (2004) 969–973, https://doi.org/10.1109/ NSSMIC.2004.1462368.
- [48] S. Hua, D. Wang, L. Wang, Y. Liu, J. Li, A PVT-insensitive all digital CMOS time-todigital converter based on looped delay-line with extension scheme, in: 2015 IEEE 11th International Conference on ASIC (ASICON), 2015, pp. 1-4, http://doi.org/ 10.1109/ASICON.2015.7517127.
- [49] M. Zlatanski, W. Uhring, J. Le Normand, V. Zint, A new high-resolution Time-to-Digital Converter concept based on a 128 stage 0.35 µm CMOS delay generator, in: 2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference, 2009, pp. 1-4, http://doi.org/ 10.1109/NEWCAS.2009.5290425.
- [50] A. Hejazi, et al., A low-power multichannel time-to-digital converter using alldigital nested delay-locked loops with 50-ps resolution and high throughput for LiDAR sensors, IEEE Trans. Instrum. Meas. 69 (11) (2020) 9262–9271, https://doi. org/10.1109/TIM.2020.2995249.
- [51] A. Mantyniemi, T. Rahkonen, J. Kostamovaara, A CMOS time-to-digital converter (TDC) based on a cyclic time domain successive approximation interpolation method, IEEE J. Solid State Circuits 44 (11) (2009) 3067–3078, https://doi.org/ 10.1109/JSSC.2009.2032260.
- [52] B. Markovic, S. Tisa, F.A. Villa, A. Tosi, F. Zappa, A high-linearity, 17 ps precision time-to-digital converter based on a single-stage vernier delay loop fine interpolation, IEEE Trans. Circuits Syst. I Regul. Pap. 60 (3) (2013) 557–569, https://doi.org/10.1109/TCSI.2012.2215737.
- [53] Z. Cheng, X. Zheng, M.J. Deen, H. Peng, Recent developments and design challenges of high-performance ring oscillator CMOS time-to-digital converters, IEEE Trans. Electron Devices 63 (1) (2016) 235–251, https://doi.org/10.1109/ TED.2015.2503718.
- [54] C.-C. Wang, K.-Y. Chao, S. Sampath, P. Suresh, Anti-PVT-variation low-power timeto-digital converter design using 90-nm CMOS process, IEEE Trans. Very Large-Scale Integration (VLSI) Syst. 28 (9) (2020) 2069–2073, https://doi.org/10.1109/ TVLSI.2020.3008424.

- [55] A. Hajimiri, S. Limotyrakis, T.H. Lee, Jitter and phase noise in ring oscillators, IEEE J. Solid State Circuits 34 (6) (1999) 790–804, https://doi.org/10.1109/4.766813.
- [56] I. Nissinen, J. Kostamovaara, Time-to-digital converter based on an on-chip voltage reference locked ring oscillator, in: 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings, 2006, https://doi.org/10.1109/ IMTC.2006.328409.
- [57] J. -P. Jansson, P. Keränen, J. Kostamovaara, A. Baschirotto, "CMOS technology scaling advantages in time domain signal processing, in: 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2017, pp. 1-5, http://doi.org/ 10.1109/I2MTC.2017.7969659.
- [58] J. Richardson, R. Walker, L. Grant, D. Stoppa, F. Borghetti, E. Charbon, M. Gersbach, and R. K. Henderson, A 32×32 50ps Resolution 10 bit time to digital converter array in 130nm CMOS for time correlated imaging, in: 2009 IEEE Custom Integrated Circuits Conference, 2009, https://doi.org/10.1109/CICC.2009.5280890.
- [59] M. Perenzoni, L. Gasparini, D. Stoppa, Design and characterization of a 43.2-PS and PVT-resilient TDC for single-photon imaging arrays, IEEE Trans. Circuits Syst. Express Briefs 65 (4) (2018) 411–415, https://doi.org/10.1109/ TCSII.2017.2694482.
- [60] M.Z. Straayer, M.H. Perrott, A multi-path gated ring oscillator TDC with first-order noise shaping, IEEE J. Solid State Circuits 44 (4) (2009) 1089–1098, https://doi. org/10.1109/JSSC.2009.2014709.
- [61] W.B. Pierce, The Vernier delay unit, IEEE Trans. Nucl. Sci. 32 (1) (1985) 95–99, https://doi.org/10.1109/TNS.1985.4336800.
- [62] P. Dudek, S. Szczepanski, J.V. Hatfield, A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line, IEEE J. Solid State Circuits 35 (2) (2000) 240–247, https://doi.org/10.1109/4.823449.
- [63] Y. Li, H. Yu, S. Liu, X. Huang, L. Jiang, A CMOS time-to-digital converter for realtime optical time-of-flight sensing system, IEEE Commun. Mag. 56 (8) (2018) 113–119, https://doi.org/10.1109/MCOM.2018.1700653.
- [64] P. Lu, A. Liscidini, P. Andreani, A 3.6 MW, 90 nm CMOS gated-vernier time-todigital converter with an equivalent resolution of 3.2 ps, IEEE J. Solid State Circuits 47 (7) (2012) 1626–1635, https://doi.org/10.1109/JSSC.2012.2191676.
- [65] M. Lee, A.A. Abidi, A 9 B, 1.25 PS Resolution Coarse-fine time-to-digital converter in 90 nm CMOS that amplifies a time residue, IEEE J. Solid State Circuits 43 (4) (2008) 769–777, https://doi.org/10.1109/VLSIC.2007.4342701.
- [66] J.-P. Jansson, A. Mantyniemi, J.A. Kostamovaara, CMOS time-to-digital converter with better than 10 ps single-shot precision, IEEE J. Solid State Circuits 41 (6) (2006) 1286–1296, https://doi.org/10.1109/JSSC.2006.874281.
- [67] N. Narku-Tetteh, A. Titriku, S. Palermo, A 15b, Sub-10ps resolution, low dead time, wide range two-stage TDC, in: 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS), 2014, pp. 13-16, https://doi.org/10.1109/ MWSCAS.2014.6908340.
- [68] Adithya K Pediredla, Aswin C Sankaranarayanan, Mauro Buttafava, Alberto Tosi, Ashok Veeraraghavan. Signal processing-based pile-up compensation for gated single photon avalanche diodes. arXiv preprint arXiv:1806.07437, 2018.
- [69] F. Heide, S. Diamond, D.B. Lindell, G. Wetzstein, Sub-picosecond photon-efficient 3D imaging using single-photon sensors, Sci. Rep. 8 (1) (2018) pp, https://doi.org/ 10.1038/s41598-018-35212-x.
- [70] J. Rapp, Y. Ma, R.M. Dawson, V.K. Goyal, Dead time compensation for high-flux ranging, IEEE Trans. Signal Process. 67 (13) (2019) 3471–3486, https://doi.org/ 10.1109/TSP.2019.2914891.
- [71] J. Van de Weijer, R. Van den Boomgaard, "Local mode filtering," in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR) 2(2001) II-428–II-433, https://doi.org/10.1109/CVPR.2001.990993.
- [72] O. Levy, L. Wolf, Live repetition counting. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3020–3028, https://doi.org/10.1109/ICCV.2015.346.
- [73] A. K. Sharma, A. Laflaquiere, G.A. Agranov, G. Rosenblum, S. Mandai, SPAD array with gated histogram construction, U.S. Patent 2017 0 052 065 A1, Feb. 23, 2017.
- [74] S. Lindner, C. Zhang, I. Antolovic, M. Wolf, E. Charbon, A 252×144 SPAD pixel flash lidar with 1728 dual-clock 48.8 ps TDCs, integrated histogramming and 14.9to-1 compression in 180nm CMOS technology, in: Proc. IEEE Symp, VLSI Circuits, Jun. 2018, pp. 9–14, https://doi.org/10.1109/JSSC.2018.2883720.
- [75] V. Sesta, K. Pasquinelli, R. Federico, F. Zappa, F. Villa, Range-finding SPAD array with smart laser-spot tracking and TDC sharing for background suppression, IEEE Open J. Solid-State Circ. Soc. (2021) 1–1, https://doi.org/10.1109/ OJSSCS.2021.3116920.
- [76] C. Niclass, M. Soga, H. Matsubara, S. Kato, M. Kagami, A 100-M range 10-frame/s 340 × 96-pixel time-of-flight depth sensor in 0.18-μm CMOS, IEEE J. Solid State Circuits 48 (2) (2013) 559–572, https://doi.org/10.1109/JSSC.2012.2227607.
- [77] C. Zhang, N. Zhang, Z. Ma, L. Wang, Y. Qin, J. Jia, and K. Zang, "A 240 x 160 3D stacked SPAD dtof image sensor with rolling shutter and in pixel histogram for mobile devices, IEEE Open J. Solid-State Circ. Soc. (2021) 1–1, https://doi.org/ 10.1109/OJSSCS.2021.3118332.
- [78] M. Perenzoni, D. Perenzoni, D. Stoppa, A 64 × 64-pixels digital silicon photomultiplier direct TOF sensor with 100-mphotons/s/pixel background rejection and imaging/altimeter mode with 0.14% precision up to 6 km for spacecraft navigation and landing, IEEE J. Solid State Circuits 52 (1) (2017) 151–160, https://doi.org/10.1109/JSSC.2016.2623635.
- [79] S.W. Hutchings, N. Johnston, I. Gyongy, T. Al Abbas, N.A. Dutton, M. Tyler, S. Chan, J. Leach, R.K. Henderson, A reconfigurable 3-D-Stacked Spad imager with in-pixel histogramming for flash lidar or high-speed time-of-flight imaging, IEEE J. Solid State Circuits 54 (11) (2019) 2947–2956, https://doi.org/10.1109/ JSSC.2019.2939083.

wise integrated histogramming, IEEE J. Solid State Circuits 54 (4) (2019) 1137–1151, https://doi.org/10.1109/JSSC.2018.2883720.

- [81] A.R. Ximenes, P. Padmanabhan, M. Lee, Y. Yamashita, D. Yaung, E. Charbon, A modular, direct time-of-flight depth sensor in 45/65-nm 3-D-stacked CMOS technology, IEEE J. Solid State Circuits 54 (11) (2019) 3203–3214, https://doi. org/10.1109/JSSC.2019.2938412.
- [82] P. Padmanabhan, C. Zhang, M. Cazzaniga, B. Efe, A. R. Ximenes, M.-J. Lee, and E. Charbon, 7.4 A 256×128 3D-stacked (45nm) SPAD flash lidar with 7-level coincidence detection and progressive gating for 100m range and 10klux Background Light, in: 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, https://doi.org/10.1109/ISSCC42613.2021.9366010.
- [83] E. Conca, V. Sesta, M. Buttafava, F. Villa, L.D. Sieno, A.D. Mora, D. Contini, P. Taroni, A. Torricelli, A. Pifferi, F. Zappa, A. Tosi, Large-area, fast-gated digital SIPM with integrated TDC for portable and wearable time-domain nirs, IEEE J. Solid State Circuits 55 (11) (2020) 3097–3111, https://doi.org/10.1109/ JSSC.2020.3006442.
- [84] H. Seo, H. Yoon, D. Kim, J. Kim, S.-J. Kim, J.-H. Chun, J. Choi, Direct TOF scanning lidar sensor with two-step multievent histogramming TDC and embedded interference filter, IEEE J. Solid State Circuits 56 (4) (2021) 1022–1035, https:// doi.org/10.1109/JSSC.2020.3048074.
- [85] I. Vornicu, A. Darie, R. Carmona-Galan, A. Rodriguez-Vazquez, Compact real-time inter-frame histogram builder for 15-bits high-speed TOF-imagers based on singlephoton detection, IEEE Sens. J. 19 (6) (2019) 2181–2190, https://doi.org/ 10.1109/JSEN.2018.2885960.
- [86] B. Kim, S. Park, J.-H. Chun, J. Choi, S.-J. Kim, 7.2 a 48×40 13.5mm depth resolution flash lidar sensor with in-pixel zoom histogramming time-to-digital converter, 2021 IEEE International Solid-State Circuits Conference (ISSCC), 2021, https://doi.org/10.1109/ISSCC42613.2021.9366022.
- [87] S. Bellisai, D. Bronzi, F.A. Villa, S. Tisa, A. Tosi, F. Zappa, Single-photon pulsedlight indirect time-of-flight 3D ranging, Opt Express. 21 (4) (2013) 5086–5098, https://doi.org/10.1364/OE.21.005086.