Abstract: We present a smart pixel based on a single-photon avalanche diode (SPAD) for advanced time-of-flight (TOF) and time-correlated single photon counting (TCSPC) applications, fabricated in a cost-effective 0.35-m CMOS technology. The large CMOS detector (30-m active area diameter) shows very low noise (12 counts per second at room temperature at 5-V excess bias) and high efficiency in a wide wavelength range (about 50% at 410 nm and still 5% at 800 nm). The analog front-end electronics promptly senses and quenches the avalanche, thus leading to an almost negligible afterpulsing effect. The in-pixel 10-bit time-to-digital converter (TDC) provides 312-ps resolution and 320-ns full-scale range (FSR), i.e., 10-cm single-shot spatial resolution within 50-m depth range in a TOF system. The in-pixel 10-bit memory and output buffers make this smart pixel the viable building block for advanced single-photon imager arrays for 3-D depth ranging in safety and security applications and for 2-D fluorescence lifetime decays in biomedical imaging.
Introduction
An imaging sensor capable of precisely detecting the arrival time of a faint light pulse result is essential in many different applications. For instance, in depth ranging (e.g., using the LIDAR approach) by means of the time-of-flight (TOF) principle, such a sensor can directly provide the distance of an object, or different objects, in a given scene [1] , requiring almost no additional signal post processing. In life sciences, such a sensor plays an extremely important role in applications based on the time-correlated single-photon counting (TCSPC) [2] approach, e.g., fluorescence lifetime imaging (FLIM) [3] , fluorescence correlation spectroscopy (FCS) [4] , DNA sequencing [4] , or TOF-resolved PET systems [6] . Single-photon avalanche diodes (SPADs) [8] , provide single photon sensitivity, high photon detection efficiency (PDE), low dark counting rate (DCR) and low time jitter, and can be fabricated using commercial CMOS technologies. In fact, it has been demonstrated that arrays of SPADs, monolithically integrated with readout and quenching electronics, are able to provide fast 2-D images in low light conditions [9] , [10] .
SPAD-based imagers used for TOF based ranging, using a time-to-digital converter (TDC) in each pixel, have already been presented in the literature [11] - [14] . In [11] , very high resolution (10 ps) is achieved but with a very low fill-factor of just 0.5%. In [12] , 119 ps resolution and 2.3% fill-factor are obtained by using a scaled technology, but at the expense of poor SPAD performances, namely a circular photoactive area with only 8.6 m diameter, a DCR of 11 kHz, the peak PDE of 36%, and time jitter of 128 ps FWHM [15] . In [13] , a resolution of 55 ps has been obtained at the expense of the fill-factor (1%) and SPAD performances: DCR of 160 Hz and peak PDE of 27.5% obtained using a 5.6 m diameter SPAD photoactive area. In Ref. [14] , 6% fill-factor is reached but with a multiplexed architecture, which shares just one TDC per column of the detector array.
Our aim was to develop a dual-function pixel able to combine very high SPAD performance with cost-effective smart in-pixel electronics to be used for both photon timing, i.e., for measuring the photon arrival time with sub-nanosecond resolution, and photon counting, i.e., for counting the number of impinging photons in a defined amount of time. To fabricate the sensor, we selected a 0.35 m CMOS technology, aiming at a viable compromise between the SPAD performances, the costs and the SPAD pixel fill-factor. In fact, up to now, more scaled technology (e.g., 130 nm) failed to deliver SPADs with good performances. We conceived the smart pixel as the building block for easily customizable imagers. When a large number of pixels is desired it would be very difficult to route offchip all the analog signals from the SPADs, both for geometrical reasons and for signal integrity concerns (worst jitter). For this reason all the functionality of the circuit has been implemented on-chip. Fig. 1 shows the block diagram of the pixel, containing a large photoactive area SPAD (30 m in diameter), an analog Variable Load Quenching Circuit (VLQC) for fast avalanche sensing and quenching [16] , pulse shaping electronics with a well defined output, and a 10 bit TDC to measure the photon arrival time. A 10 bit memory latch stores the data during the readout, performed using the output buffers already sized to drive a bus shared among 16 pixels. On-chip, there are also a frequency doubler and a global Delay Looked Loop (DLL) to feed all in-pixel TDCs with common signals.
In Section 2 of this paper, we present the measured SPAD performances and the architecture of the quenching circuit; in Section 3, we discuss the functionalities of the pixel, in particular the operation of the TDC; finally in Section 4, we show the measurements performed to characterize the TDC. 
SPAD Performances and VLQC Architecture

SPAD Performances
One important issue when using a CMOS technology to fabricate a SPAD array, where the SPADs themselves and the smart pixel electronics share the same semiconductor substrate, is the proper isolation of the avalanche photoactive region of the SPAD from the surrounding elements. For this reason, the SPAD structures were designed using a relatively deep n-well acting as a cathode, a shallow p þ layer on its surface used as an anode, and a p-well based guard-ring incorporated on the edges of the p þ photoactive area to define the space-charge region with high electric field for photon absorption and avalanche multiplication within the n-well in the region directly beneath the p þ layer. Another issue that must be considered, when an array of SPAD is fabricated, is the crosstalk between different SPADs. However, we know from previous works on SPAD arrays that crosstalk is not an issue for CMOS pixel with similar pitch and SPAD diameter (much below 10 À4 ) [9] . This device presents a breakdown voltage of 26 V. The typical measured detector performances, obtained from about 20 SPAD structures fabricated as described, are summarized in Table 1 . Fig. 2 reports the DCR (in counts per second, Hz) dependence on temperature and excess bias of the 30 m diameter SPAD. As it can be seen, by cooling the SPAD to 0 C, the DCR flattens at 1 Hz at 5 V-overvoltage (corresponding to a peak PDE of about 50% for 400 nm wavelengths), i.e., just one spurious ignition every second, corresponding to a Bdark[ generation rate of about 1:4 Â 10 À3 counts/s=m 2 , i.e., 22 fA/cm 2 . Even at room temperature, the DCR is very low, only 10 Hz. This is about three orders of magnitude lower than the results obtained from the state-of-the-art SPADs developed in a similar 0.35 m CMOS technology [16] , in the same operation condition, and at least five orders of magnitude better than other scaled technologies reported so far. For instance, in a 0.13 m CMOS technology, the best DCR obtained is of 189 counts/s=m 2 with a 35% peak PDE [15] . In a 90 nm technology, the DCR is of 318 counts/s=m 2 with a 12% peak PDE [17] . Fig . 3 shows the PDE obtained from the SPADs for wavelengths between 400 nm and 1 m at the same 3 different V EX . For the case of the SPAD smart pixels presented in this work, the fast rolloff of the PDE with increasing wavelengths is mainly due to the typical low absorption of Silicon at longer wavelengths on the one side, and to the thin depleted region thickness of the standard CMOS processing employed, mainly due to the limiting n-well depth, on the other. Only custom processing can drastically widen the absorption regions [18] . Nevertheless, the PDE of 10% at 740 nm is still high if compared to the PDE achieved with S20 (1%), S25 (5%) photocathode of photomultiplier tubes [19] . The intrinsic SPAD time jitter, measured in a conventional TCSPC setup [8] , is 100 ps full-width at half-maximum (FWHM) when the substrate is biased at the same voltage of the well, in order to avoid current path narrowing.
VLQC Architecture
For the TOF and TCSPC measurements, we aim at synchronizing the sensor irradiation time with the active optical excitation (usually a laser pulse). For this reason, we conceived a front-end able to gate on and off the detector in well-defined time slots. After each ignition, the VLQC quenches the SPAD and keeps it quenched until the next reset, when it is driven back to operation (i.e., above breakdown). The high voltage (about 30 V) required to bias the SPAD is applied only to the SPAD cathode through a dedicated pad, so it does not have any impact to the circuitry, while the VLQC is connected to the anode.
A schematic of the VLQC is shown in Fig. 4 . The SPAD cathode is biased above the breakdown voltage V B by an excess bias V EX , while the anode is at 0 V during quiescence, and at V EX for quenching. During the gate-off phase, transistor M3 is ON, hence no current can flow through the SPAD since both M4 and M5 are shut OFF. Transistors M2 and M5 are ON only during the reset transition. Actually, M5 is switched ON to quickly reset the SPAD by rapidly lowering the anode voltage to 0 V. Instead, transistor M4 is kept ON having relatively high impedance thanks to its very low W/L aspect ratio. Therefore, as soon as the SPAD is ignited, M1 senses the voltage onset across M4 and drives M4 definitely OFF, thus quenching the avalanche process. The SPAD is left quenched until the next reset pulse marks a new gate-on. Transistors M1, M4, and M5 are high voltage MOS, since they have to support at least the excess bias voltage. For properly sizing all front-end transistors, we had to employ a proper model for the time-dependent SPAD behavior [20] . The digital pulse provided by the VLQC marks the photon arrival and is the START of the in-pixel TDC conversion.
Pixel Operation
The smart pixel can operate both in Bphoton timing[ and in Bphoton counting[ operation modes. In Bphoton timing[ mode, the TDC converts the time delay between START and STOP signals into a 10 bits code. The START is provided by the VLQC and it is synchronous with the detection of a photon; instead the STOP is synchronized with the laser pulse. In this way, the TDC is forced to start a conversion only when the SPAD detects a photon and not at every laser excitation.
The TDC is divided into a coarse 6 bit counter and a 4 bit fine interpolator. The former asynchronously counts the clock periods between START and STOP. The frequency of the external reference clock is doubled inside the chip with a frequency multiplier based on a four-cells Delay Locked Loop (DLL), whose outputs are combined in an EX-NOR logic unit and the delay of which is controlled by a voltage internally generated by a charge-pump, as shown in Fig. 5 . The coarse counter and the frequency multiplier were designed for an external 100 MHz clock to provide a coarse resolution of 5 ns and a full-scale range (FSR) of 320 ns. The 4 bit fine interpolator is based on a global DLL, similar to the one shown in Fig. 5 . It is composed by 16 delay cells, a phase detector and a charge pump, and therefore it is insensitive to process-voltagetemperature drifts. It divides the clock period in 16 intervals of 312 ps each, and provides 16 multiphase clocks to all pixels. The in-pixel fine interpolator detects the phase of the START in respect of the 16 multiphase clocks, and provides a 16-levels thermometric scale that is then converted into a 4 bit binary code. Therefore, the TDC acts as a flash converter and requires a negligible conversion time of about 1 ns due to propagation delays. We employed an identical 4 bit interpolator also for the global STOP measurement which was left asynchronous of the reference clock. In this way, we could exploit the sliding scale technique [21] , which allows for improvement of the linearity by converting deterministic nonlinearity in stochastic jitter. This is possible thanks to the fact that each conversion is affected by different parts of the interpolator characteristic. This feature is exploitable in every TOF based distance measurement by computing the centroid of the arrival times distribution.
The result of each measurement consists of 6 bits from the in-pixel coarse counter ðN coarse Þ, 4 bits from the in-pixel START interpolator ðN start Þ, and 4 bits from the global STOP interpolator ðN stop Þ. The arrival time T meas is computed using (1), where T ck equals to one period of the reference clock, i.e., 5 ns. Therefore, the pixel provides one least significant bit (LSB) unit of T ck =16 ¼ 312:5 ps, with a FSR ¼ 320 ns and 10 bit resolution
Apart from the Bphoton timing[ operation mode, we added also an extra feature to the pixel: the possibility to work in the Bphoton counting[ mode. To achieve this aim, the 6 bit asynchronous (coarse) counter is reconfigured to count the SPAD ignitions, while the interpolator is not used. In this way, the pixel can be used to measure the intensity of constant or slowly varying optical signals within time slots that can be set from 50 ns to 500 ms, with maximum counting rates from 25 MHz to 115 Hz, respectively, considering a hold-off time of 40 ns and a minimal signal-to-noise ratio (SNR) of 10. For short integration periods (the minimal duration of which is limited by the readout), the maximum counting rate is the inverse of the hold-off time of the SPAD, while for long integration periods (the maximal duration of which depends on the desired SNR), the maximum counting rate is limited by the depth of the counter, that is 6 bits. Here, the product of maximum counting rate and integration time must be lower than 2 6 .
Characterization of the TDC
To check the accuracy and reliability issues of the TDC, we characterized it at different temperature and power supply variations in terms of differential nonlinearity (DNL), integral nonlinearity (INL) and single-shot precision, i.e., the standard deviation of the TDC conversion results when a constant time interval is measured a large number of times.
TDC Linearity
For the TDC linearity, we performed a statistical code density test, consisting of a long enough acquisition, which enables the collection of about 8000 photons for each of the 1024 TDC bins, using a random START-STOP distribution. We then computed the DNL at each bin, and compared the ideal flat histogram at the mean value n mean with the actual histogram, as expressed by (2), where i is the index of the bin and nðiÞ is the counts in that bin
Then we computed INL as the cumulative sum of DNLs, as expressed by
Fig . 6 shows the counts histogram and the computed DNL and INL for the overall conversion. In the computation of rms values of DNL and INL, we ignored the first and the last 10 ns of the conversion range, that are impaired by the transients of switching ON and OFF the SPAD. In order to appreciate the effectiveness of the sliding scale technique, we also computed DNL and INL of both individual START and STOP interpolators, as shown in Fig. 7 .
As we expected, the linearity of the total conversion is much better than the linearity of the two individual interpolators: with a measured DNL rms of 56.3 ps and 79.4 ps of START and STOP interpolator, respectively, i.e., 18% LSB and 25% LSB, as shown in Fig. 7 , we get an overall DNL rms of 15.3 ps, i.e., 4.9% LSB, as shown in Fig. 6 . Similarly, from a measured INL rms of 96.9 ps and 66.6 ps of START and STOP interpolators, respectively, i.e., 32% LSB and 21% LSB, as shown in Fig. 7 , we get a final INL rms of 36.6 ps, i.e., 11.7% LSB, as in Fig. 6 . Therefore, the present TDC attains one of the best linearity behaviors reported so far in literature, as it can also be observed in Table 2 , where a comparison between the performances of the smart pixel presented in this work and other smart SPAD based pixels and TDCs reported elsewhere in literature can be found.
TDC Precision
In order to measure the TDC characteristics, we fed the TDC with two external START and STOP pulses and varied their delay with 20 ns steps over the whole 320 ns FSR. Fig. 8 (left) shows the measured delays versus the theoretical delay, by averaging 2 Â 10 5 measurements for each interval. As it can be seen, the TDC characteristic is very linear, as expected from the code density 
We then characterized the TDC precision by providing two external START and STOP pulses to the TDC, delayed with 500 ps steps from 115 ns to 120 ns, i.e., spanning one bin of the coarse counter. We performed 2 Â 10 5 measurements for each time delay and then computed the histogram and the standard deviation. Two examples of such measured histograms are shown in Fig. 9 for a delay of 116 ns and 120 ns. We also performed precision measurements of the TDC circuit, employing a pulsed laser emitting 30 ps-wide pulses at 850 nm wavelength. We used different time delays, obtained by adding 2 meters of optical fiber, corresponding to about 10 ns delay, for each new measurement. Fig. 10 shows the obtained results from three such measurements. The mean precision was of 282 ps, larger than the value obtained from the electrical test (see Fig. 8, right) , because the optical measurement is affected also by laser and SPAD detector jitter [8] , [22] .
We designed the DLL to be insensitive to process, voltage, temperature (PVT) variations and drifts. For this reason, we performed measurements (with electrical START and STOP external pulses) at different temperatures and supply voltages, but using fixed time delays. Small modifications in the test board were required for this, yielding to slightly worse TDC performances. The mean value of the computed time delay and the precision at different temperatures and power supply voltages are plotted in Figs. 11 and 12 , respectively. As it can be observed, they proved to be almost the same in all conditions, thus confirming the correct PVT independent operation of the DLL as well as the overall pixel operation reliability.
Conclusion
In conclusion, we developed a dual-function smart pixel using a SPAD detector and in-pixel electronics, for both Bphoton timing[ and Bphoton counting[ applications. To the best of our knowledge, the pixel shows the lowest DCR for unity active area reported so far for CMOS SPADs, together with low afterpulsing, high detection efficiency and good timing performance. Thanks to the sliding scale technique exploited, the photon timing TDC attains one of the best linearity behaviors reported in literature so far. The precision is practically limited by the quantization noise and the SPAD jitter shows no need for any additional calibration. The smart pixel presented results to be the ideal building block for an array of SPAD based smart pixels, able to work in both operation modes, employed in 2-D and 3-D imaging attaining single photon sensitivity.
