In this paper, we present the architecture and the experimental characterization of an improved version of a previously developed 32 × 32 Single Photon Avalanche Diodes (SPADs) and Time to Digital Converters (TDCs) array, and two new arrays (with 8 × 8 and 128 × 1 pixels) with the additional capability of actively gating the detectors with subnanosecond rise time. The arrays include high performance SPADs (0.04 cps/µm 2 , 50% peak PDE) and provide down to 410 ps Full-Width at Half-Maximum (FWHM) single shot precision and excellent linearity. We developed a camera to exploit these imagers in timeresolved, single-photon applications.
Introduction
Single Photon Avalanche Diodes (SPADs) are single photons detectors that, after being used for several decades in research applications, are recently gaining interest also in industrial, automotive and consumer electronics. Key parameters that favor the SPAD commercial exploitation are small size, possibility to be integrated in CMOS processes, ruggedness to high intensity light, room temperature operability, low power supply required to bias the detector and the possibility of being rapidly enabled or disabled. Instead, other single photon detectors, such as Photomultiplier Tubes (PMTs), Superconducting Nanowire Single Photon Detectors (SNSPDs) and Hybrid Photon Detectors (HPDs), do not feature these useful properties all at once.
The main strengths of SPAD arrays with respect to Charge Coupled Devices (CCDs) and CMOS Active Pixel Sensors (APSs) are the absence of readout noise (which allows down to one photon counting within each integration time), the possibility to precisely time stamp the photon arrival time, and to rapidly gate on and off the detector. CMOS SPADs do however suffer from some limitations, namely a limited detection efficiency, especially in the near-infrared, the relatively large pixel pitch and the higher power consumption with respect to conventional image sensors. The new SPAD array designs presented in this paper aim to exploit technology-specific opportunities, in particular photon time-tagging.
We will focus on the architecture and on the experimental characterization of the improved version of a 32 × 32 SPADs and Time to Digital Converters (TDCs) array already presented in [1] and of brand new arrays with 8 × 8 and 128 × 1 pixels. The new designs, in addition to the photon timestamping capabilities of the former, also allow to actively gate the detector with sub-nanosecond rise time.
Applications for the 32 × 32 SPAD and TDC array include Light Detection and Ranging (LiDAR) from satellites [2] , first photon imaging [3] , under-water obstacle identification [4] , coincidence detection in quantum applications [5] , multiple time constants Fluorescence Lifetime Imaging (FLIM) [6] , Diffuse Optical Tomography (DOT) [7] and quantum physics [8] . The new 32 × 32 array overcomes the limitations which hindered the exploitation of the previous version, maintaining the same SPAD size (30 µm) and pixel pitch (150 µm), whereas the new 8 × 8 and 128 × 1 arrays halve the pixel pitch (75 µm) and include active gating of the detector. Applications like non-line of sight 3D ranging and time domain DOT can benefit from such a feature to discard the effect of the strong first reflection that would saturate the SPADs and prevent to measure the arrival time of the late photons carrying the useful information. The linear array is perfectly suited for spectroscopy applications, with time-gating and time-tagging capabilities being desirable for advanced Raman techniques [9] .
The rest of the paper is structured as follows: Section 2 describes the architecture of the three arrays, Section 3 shows the experimental characterization in terms of SPAD and TDC performance, while Section 4 summarizes the results and provides conclusions.
SPAD Arrays Architectures
The three arrays have been designed and fabricated in a 0.35 µm high-voltage CMOS technology; despite it being quite an old technology node, it offers state-of-the-art SPADs [10] and thus represents an excellent technology choice when the pixel pitch can be relaxed. All the arrays are based on the same architecture, including an array of pixels (SPAD, front end circuit, counter and TDC, internal memories and output buffers) and the TDC and readout global electronics, very similar to the one described in [1] .
Timing Electronics and Pixel Architecture
The timing electronics consists of a 16-phases clock interpolation scheme, with separate "START" and "STOP" interpolators to adopt the sliding scale technique. The "START" interpolator is shared by the whole array, while each pixel includes the STOP interpolator, triggered by a photon detection, with an 8-bit counter to extend the Full Scale Range; the counter can be repurposed to operate the detectors in photon-counting mode. Double-buffering allows global shutter operation and concurrent acquisition and readout to reduce dead time.
Differently from the previous 32 × 32 array [1] , the TDC works with START and STOP in "direct" configuration, where the START is a global synchronization signal and the STOP is the in-pixel photon detection. The "reverse" configuration has clear advantages for a single TDC operating with a stable laser at high frequencies. In fact, in this case the START is provided only when a photon is detected and the STOP is provided by the subsequent trigger signal, thus the TDC converts only the useful signal reducing the power dissipation. In case of low frequency or not stable lasers, the "reverse" configuration requires the synchronization signal to be provided to the chip at the end of the measurement cycle, which means that, unless the laser can be externally triggered by the camera, a long delay (potentially as long as the TDC full scale range) needs to be introduced on the laser sync by means of a delayer or long cables. The "direct" configuration overcomes this requirement because the sync is provided at the beginning of the measurement, allowing to easily exploit the arrays also in applications with not constant laser repetition rate. Furthermore, in an array of TDCs the power consumption does not increase significantly with the "direct" approach with respect to the "reverse" one, because most of the dissipation is related to the distribution of the clocks signal, with just a minor contribution from the running in-pixel counters.
The TDC performance has been improved, reaching 1 µs Full Scale Range (FSR) and 260 ps resolution, corresponding to 150 m and 4 cm respectively in LiDAR measurements, by redesigning the clock generation and distribution circuits (which allowed to improve the achievable resolution to 260 ps, down to 312.5 ps of the previous chips, by means of a higher operating frequency) as well as by extending the in-pixel coarse counter (increasing the full scale range).
The layouts of the three arrays are shown in Fig. 1 . Alignment marks for micro-lenses mounting have been implemented in the chip design, as we expect to improve the overall equivalent fill-factor (FF) up to a theoretical limit of about 78% (given by the fact that the micro-lens is circular whereas the SPAD pixel is square), using micro-lens arrays (MLAs) already developed and tested [12] or by means of new MLAs developed by Micro Photon Devices [13].
32 × 32 SPADs and TDCs Array
The main feature that sets apart the revised version of the 32 × 32 array from its predecessor is the ability to increase the measurement duty cycle in photon timing mode both when operating with low repetition rate lasers (thanks to the extended TDC range) and with higher repetition rate lasers (by allowing multiple detection windows within the same frame). In fact, the measurement duty cycle is set by the ratio between the time within which the SPADs are active over the frame-time; in time-tagging mode, the limit is set by the TDC FSR. However, if multiple excitation windows can be opened within the same frame, the duty cycle (D) can increase to ( Fig. 2) :
The previous array could approach a unity duty cycle by opening ∼30 gate windows per frame of acquisition, requiring a minimum sync frequency of 3 MHz, due to the 320 ns FSR and 10 µs frame-time. The new revision, thanks to a longer FSR (< 1 µs) and shorter frame-time (5 µs min, thanks to a redesigned readout circuit) can obtain the same duty cycle with a sync frequency as low as 1 MHz, with the advantage of a doubled throughput and potentially allowing to distinguish even faster variations in the imaged scene.
Multiple detection windows are allowed by an out-of-pixel counter that, per each frame, stores the ID of the detection window where the photon is detected; this value is appended to the 12 bit TDC conversion and, together with a 64-entry global START interpolator memory, allows to open multiple detection windows while still providing the correct time tag of the photon arrival time; the conversion rate is still limited to one TDC conversion per pixel per frame. We experimentally verified that it is possible to keep SPADs and TDCs active for about 80% of the frame, operating at 200 kfps, which represents a significant improvement in respect to the 35% of the first implementation of the 32 × 32 array [11] .
Despite the additional counters are outside the imaging area, the FF is only 3.14%; however, it will be recovered by means of an array of micro-lenses, whose effectiveness has been already proved for f-# larger than 16 [12] ; recent developments in microlenses developed by Micro Photon Devices (MPD) [13] have shown a concentration factor larger than 20 for f-# as small as 5, allowing to approach the 78% fill-factor recovery limit for round microlenses.
Although recently many SPAD arrays for photon timing have been developed by several research groups [14] - [18] , with different trade-offs and recommended applications, we believe that the array we present provides remarkable flexibility with its per-pixel TDCs with extended range and high duty cycle, coupled with extremely low-noise SPADs.
8 × 8 and 128 × 1 Gated SPADs and TDCs Array
The developed 8 × 8 and 128 ×1 SPAD arrays have a main additional feature with respect to the larger array, namely the possibility to be actively gated on and off, bringing the SPAD bias voltage above or below the breakdown voltage. Such active gating is performed by the same circuit that quenches the SPAD and senses the avalanche, as shown in the simplified schematic in Fig. 3 left. In particular, transistors M1 and M2 sense the avalanche, "event detection" block masks spurious events synchronous with the SPAD disabling, "signal generation + holdoff" block assures the correct hold-off duration after each triggering event and drives transistors M3 and, through a level shifter, M4, which reset and disable the SPAD when operated in gated mode.
In order to improve the FF of the 8 × 8 array, only the front-end transistors M1, M2, M3, M4 have been laid out close to the SPAD active region, whereas the remaining part of the circuitry has been located outside this area, as shown in Fig. 3 right. Given the 75 µm pitch and 30 µm diameter SPADs for both the arrays, this allowed to achieve a fill-factor of 12.5% for both 8 × 8 and 128 × 1 variants.
Experimental Characterization
In order to characterize the arrays and exploit them in final applications, a camera able to host each one of the three chips has been developed. Three interchangeable chip carriers have been designed to allow the connection of the different arrays to the camera. The camera provides the power supplies, manages the communication with the SPAD arrays and sends the acquired data to a remote computer through a USB 3.0 link. It has been optimized to facilitate the heat dissipation of the chip, specifically for the 32 × 32 array, whose power consumption may reach 5 W when operating for 80% of the frame duration. To this aim, the 32 × 32 array is directly glued on a copper heat sinker to move the heat towards the housing; cooling is completely passive to avoid the introduction of a fan which may introduce unwanted vibrations in optical setups. Only the 32 × 32 pixel camera at the moment has been fitted with the cooling system.
SPAD DCR and PDE
All the arrays include 30 µm diameter SPADs with the same performance presented in [10] . The Dark Counting Rate (DCR) Cumulative Distribution Functions (CDFs) of the three arrays present the same trend, with 60 cps median DCR at 5 V excess bias and 5% of hot pixels. Fig. 4 left shows the DCR CDF of the 32 × 32 array at 4 V, 5 V and 6 V excess bias.
The Photon Detection Efficiency (PDE) at 4 V, 5 V and 6 V excess bias is shown in Fig. 4 right, with no appreciable differences among the three arrays. The peak PDE is about 50% at 450 nm and 4% at 850 nm.
Optical Crosstalk
We measured the optical crosstalk as in [19] : the arrays were operated in photon-timing mode while keeping them in a dark environment. SPADs were activated with 5 V excess bias over 350 ns gate windows and the arrival time of one ignition per frame per pixel was recorded in each window, collecting a total of 5·10 9 measurements. Then, a pixel was arbitrarily selected as an "aggressor", and for each "victim" pixel a histogram of the difference between "aggressor" and "victim" arrival times was built. The resulting distribution for a pixel adjacent to the "aggressor" is shown in Fig. 5 . In order to get rid of spurious coincidences introduced by dark counts, the expected triangular cross-correlation in absence of crosstalk was subtracted ( Fig. 5, red line) ; the resulting histogram contains only the N xy counts due to crosstalk, which were used to compute the crosstalk probability as:
where N x and N y are the total counts accumulated in "aggressor" and "victim" pixels, respectively. The results for each array are shown in Fig. 6 . For the smaller arrays, crosstalk is higher for adjacent pixels (10 −3.8 ), decreases for the diagonal ones in the 8 × 8 array (10 −4.7 ) and is negligible for the farther away pixels, resulting in a total crosstalk probability of 6.6·10 −4 for the 8 × 8 array, dropping to 5.6·10 −4 for the 128 × 1. The crosstalk for the 32 × 32 array is lower because of its larger pitch (150 µm), reaching 10 −4.3 for adjacent pixels in the horizontal direction and resulting in a total crosstalk probability of 2.3·10 −4 . Unexpectedly, the crosstalk in the horizontal direction is much stronger than on the vertical, probably due to the nonsymmetric layout of the pixel and of the metal lines. 
TDC Precision and Linearity
The TDC single shot precision has been estimated as the Full-Width at Half-Maximum (FWHM) of the histogram shown in Fig. 7 left. The dependence of the measured TDC precision versus resolution is shown in Fig. 7 right for the three arrays, which present similar results. A single shot precision as good as 410 ps FWHM can be achieved with 260 ps resolution. In order to reduce power dissipation, it is possible to decrease the TDC clock frequency, with a maximum LSB of 625 ps and a correspondingly longer FSR, resulting in about 1 ns single shot precision FWHM. The TDC non-linearity has been measured through a code density test and expressed in terms of Differential-Non-Linearity (DNL) and Integral-Non-Linearity (INL). The root mean square (rms) DNL is 0.61%, 1.4%, 2.2% of the Least Significant Bit (LSB), while the rms INL is 10.1%, 16.9%, 13.1% of the LSB, respectively for the 32 × 32, 8 × 8, 128 × 1 arrays. As the 32 × 32 array is not designed for gated mode operation, the first four bins of the histogram have been discarded for DNL and INL computation, whereas for the 8 × 8 and 128 × 1 also the enabling and disabling transitions have been considered and they provide the dominant contribution to non-linearity, as visible in Fig. 8 , which shows the DNL for representative pixels. The very good TDC linearity performance has been achieved by implementing the sliding scale technique in the TDC architecture [20] , which has two separate interpolators for START and STOP signals.
Active Gate
The shape of the active gate has been characterized by illuminating the 8 × 8 and 128 × 1 arrays with a pulsed laser and by shifting the laser pulse with 50 ps steps (by means of a programmable delayer) while operating the arrays in photon counting mode. The results for a representative pixel are presented in Fig. 9 , which shows enabling edges of 750 ps (8 × 8 array) and 780 ps (128 × 1 array), considering 10%−90% transitions.
Falling edges are faster than rising ones, since they are given by the masking operation of the "event detection" block ( Fig. 3, left) whereas rising edges are representative of the actual excess bias provided to the SPADs. The result is almost comparable to state of art single pixel fast gating circuits based on the SPAD-dummy approach, such as the one described in [21] , which achieves 430 ps transition edges. In both the curves an overshoot, whose amplitude is lower than 10% of the average counts and who is caused by the bond wire parasitic inductance, is visible at the beginning of the activation window. Note that the different background levels before and after the gate period in Fig. 9 (left) are caused by trends on the chip temperature, which varied during the measurement causing a variation in the SPADs DCR.
Conclusions
Three SPAD arrays for photon timing applications have been presented and the micrographs are shown in Fig. 10 . The arrays have a different number of pixels (32 × 32, 8 × 8 and 128 × 1) and different FF (3.14% the 32 × 32 array and 12.5% the 8 × 8 and 128 × 1 arrays). The 30 µm diameter SPADs integrated in the arrays present low DCR (0.12 cps/µm 2 at operating temperature) and 50% peak PDE at 450 nm. The crosstalk among adjacent pixels is kept below 10 −3.8 in all the arrays. The TDCs have a long FSR of 1 µs (corresponding to 150 m in LiDAR measurements) and a resolution as low as 260 ps, which leads to 410 ps single shot precision (FWHM). Very good linearity performance has been achieved by exploiting the sliding scale technique, limiting the rms DNL to few % of LSB and the INL well below 20%. The two smaller arrays can also be operated with sub-nanosecond edges active gating. Excellent performance has been achieved in all the most important parameters for SPAD imagers, enabling their exploitation in many single-photon time-resolved applications. Table I summarizes the main performance figures of the SPAD arrays presented in this paper, namely in terms of temporal resolution, gating capability and maximum framerate. Table II compares this work with respect to other CMOS SPAD imagers with time-tagging capabilities presented in literature; it can be noticed that the arrays presented in this work compare favorably in terms of TDC FSR and, although limited in terms of temporal resolution by the old technology node, obtain best-in-class detector noise and detection efficiency.
