Abstract: Single-photon detectors play a key role in many research fields such as biology, chemistry, medicine, and space technology, and in recent years, single-photon avalanche diodes (SPADs) have become a valid alternative to photo multiplier tubes (PMTs). Moreover, scientific research has recently focused on single-photon detector arrays, pushed by a growing demand for multichannel systems. In this scenario, we developed a compact 32-channel system for time-resolved single-photon counting applications. The system is divided into two independent modules: a photon detection head including a 32 Â 1 SPAD array built in custom technology, featuring high time resolution, high photon detection efficiency (44% at 550 nm), and low dark count rate (mean value G 400 cps at À10 C) at 6-V excess bias voltage and a 32-channel acquisition system able to perform timecorrelated single-photon counting (TCSPC) measurements. The TCSPC module includes eight four-channel time-to-amplitude converter (TAC) arrays, built-in 0.35-m Si-Ge BiCMOS technology, characterized by low differential non-linearity (rms value lower than 0.15% of the time bin width) and variable full-scale range. The system response function of this TCSPC instrumentation achieves a mean time resolution of 63 ps FWHM , considering a mean count rate of 1 Mcps.
Introduction
In recent years, the biomedical and chemical research has focused on optical analysis to better understand biological processes. These noninvasive measurements are indeed the best solution for in vivo experiments, since they allow to preserve the analyzed sample integrity; moreover, they are particularly suitable in the medical field when diagnostic tests have to be run on the patient. This trend has pushed research in the electronic field toward the development of high-performance photodetectors, in order to meet the strict requirements imposed by applications. As an example, fluorescence correlation spectroscopy (FCS) [1] and single-molecule spectroscopy (SMS) [2] involve examining very low concentration samples, therefore, photodetectors with extremely high sensitivity and low noise, are needed.
The best answer to these requirements is nowadays represented by single-photon detectors, and recent achievements in single-photon avalanche diodes [3] (SPADs) have made them a preferable solution than photo multiplier tubes [4] (PMTs), thanks to the higher photon detection efficiency (PDE), lower power consumption, and the possibility to be integrated in arrays. Indeed, a demand for monolithic detector arrays has recently arisen, since it leads to a significant measurement time reduction in analysis such as SMS and fluorescence lifetime imaging microscopy (FLIM) [5] , allowing the study of fast dynamic processes.
Moreover, the temporal response of the single-photon detector is of utmost importance when the time-correlated single-photon counting (TCSPC) technique [6] is employed to obtain fluorescence decay curves with subnanosecond accuracy, as required by analysis such as FLIM, fluorescence diffuse optical tomography (FDOT) [7] , or Fö rster resonance energy transfer (FRET) [8] , [9] . The TCSPC technique consists of measuring and recording the delay between the arrival time of a photon (START signal) emitted from the specimen and the reference pulse given by a laser (STOP signal), which periodically stimulates the sample. The measurement of the time delay can be performed by two classical architectures: the time-to-digital converter (TDC) or the time-toamplitude converter (TAC) followed by an analog-to-digital converter (ADC) [6] . The histogram recorded after several periods represents the probability distribution of the photon detection times and, under the assumption that one photon, at maximum, is recorded in each cycle, this distribution corresponds to the fluorescence light signal emitted by the stimulated sample.
Nowadays, state-of-the-art imaging sensors integrate thousands of single-photon detectors on the same chip. In particular, the devices developed under the Megaframe [10] and MiSpia [11] research projects employ a smart pixel [12] architecture, designed in CMOS technology, which includes the SPAD and the circuit required to measure the photon time-of-arrival (ToA) and to convert it into a digital code (TDC). Data generated by the imaging sensor can be processed onchip to obtain the TCSPC measurement results. However, although the employment of the CMOS technology allows to fabricate matrices with extremely high number of detectors, the performance is still far from the one obtainable with custom technology detectors. Indeed, in the last few years, several research groups have produced some prototypes of devices by using preferably highvoltage CMOS technologies (HVCMOS). In some cases, these devices suffer a lower PDE [10] , [13] with respect to custom SPADs, a significantly higher DCR per unit area [14] , [15] and a higher afterpulsing probability, forcing the device designers to use smaller active areas and low excess bias voltages in order to limit detector noise; on the other hand, CMOS SPADs featuring low DCR and afterpulsing (comparable with custom devices) achieve a poorer time resolution [11] , [12] .
A different approach is aimed at maximizing the photodetector performance, employing a custom technology; as an example, the eight-channel module developed in the Parafluo research project [16] ensures extremely high time resolution and PDE, but it features only eight parallel channels. To collect TCSPC measurements, it has to be connected to an external TCSPC multichannel acquisition system, and the state of the art for these instruments is provided by Becker & Hickl GmbH (four channels) [17] and by PicoQuant GmbH (eight channels) [18] .
To break the tradeoff between performance and number of channels, we designed a complete 32-channel system for TCSPC measurements, which is presented in this paper. The system is composed of a photon detection head connected to a TCSPC acquisition system through a custom 32-differential-pair parallel cable. The photon detection head directly exports the photon-timing signals generated by a 32 Â 1 SPAD linear array fabricated in custom planar technology; moreover, it allows on-board data processing for photon-counting applications. The TCSPC module receives and manages the 32 START signals exploiting the TAC ADC architecture in order to guarantee a very low differential non-linearity (DNL).
Photon Detection Head
The photon detection head, shown in Fig. 1(a) , is a stand-alone module based on the 32 Â 1 linear SPAD array; the module provides the proper bias voltages to the detectors and allows a direct readout of the signals representing the photon-detection events.
A block diagram of the signal processing board is reported in Fig. 1(b) . Inside the sealed chamber, photons are detected by a parallel structure that will be described in the next sections. Afterwards, two parallel signals are extracted: the photon-timing one (on the left), which carries the information on the photon arrival time with very low jitter, is directly exported off-board through a parallel connector; conversely, the photon-counting one (on the right) is handled by an FPGA (fieldprogrammable gate array) that allows on-board data processing for photon-counting applications.
Pixel Structure
From a circuital point of view, the single pixel structure shown in Fig. 2(a) is parallelized to obtain a 32-channel single-photon detector.
The SPAD operates in Geiger-mode by connecting an active quenching circuit (AQC) to its anode; the photon-timing signal is extracted from the cathode through an integrated pick-up circuit, which has the function of a logic inverter. This architecture was chosen since a simple parallelization of a single-channel structure based on an external pick-up circuit [19] would have led to a strong tradeoff between time resolution performance and electrical crosstalk among channels. Indeed, with the presented pixel structure, it was possible to achieve a time jitter as low as 45 ps FWHM (full-width half-maximum), while the electrical crosstalk among channels is kept negligible with respect to the optical one [20] .
The behavior of the single-pixel circuit is quite simple: In the quiescent state, the SPAD is reverse biased above the breakdown voltage by V SPAD ¼ V OV þ V POL , being V OV the cathode voltage and ÀV POL the negative reference of the AQC. When the avalanche is triggered by an incoming photon, the prompt passive quenching is achieved through a sensing resistor R POLY , which immediately causes a drop of the SPAD cathode voltage as the avalanche current starts to flow. The cathode signal causes a commutation of the pick-up circuit and, subsequently, the comparator. Finally, the active quench is performed by the AQC that rises the anode voltage and keeps the SPAD bias below the breakdown voltage for a fixed hold-off time.
Sealed Chamber
The single-pixel structure described in Section 2.1 was employed to build the eight-channel Parafluo module [16] , which includes three 8 Â 1 linear arrays, respectively, a SPAD array, an AQC array, and a comparator array. The presented system was designed to improve the channel number from 8 to 32, while maintaining the same pixel structure, but this required a strong modification at system level. Indeed, state-of-the-art custom SPADs feature the best dark count rate (DCR) performance when cooled [3] , but this requires that the detectors are placed in a dry atmosphere; otherwise, they will be damaged by the formation of moisture. In the Parafluo module, this issue was solved by sealing the three integrated arrays inside a custom TO-like kovar package. However, this approach is not suitable for the 32-channel system, since too many output pins would be needed.
A solution was found by properly shaping the inner surface of the aluminum case that is used to cover the presented module, as shown in Fig. 2(b) . The upper cover has a glass window that allows the photons to enter in the sealed chamber, where the detectors are placed. The signal processing board, which will be described in Section 2.4, penetrates into the chamber (sealed by two O-rings); therefore, a large area can be used to export the signals generated by the 32 pixels, exploiting the inner planes of the PCB. The SPAD temperature is controlled by a thermoelectric cooler (TEC).
Integrated Arrays
In the presented system, the single-pixel structure is parallelized by connecting two 32 Â 1 linear arrays (SPADs and AQCs) and four 8 Â 1 comparator arrays through bonding wires, as shown in Figs. 1(a) and 2(b).
The 32 Â 1 linear SPAD array was fabricated in custom planar technology by parallelizing the 8 Â 1 array employed in the Parafluo module, maintaining the pitch of 250 m. A mean breakdown voltage of 34.7 V among the 32 detectors was measured at room temperature (25 C) . Each pixel comprises a detector with 50-m diameter and the integrated pick-up circuit connected to the SPAD cathode. A modification of the SPAD fabrication process was required to build the nMOS that make up the pick-up circuit, but the process was still optimized to preserve the high performance of the detector. For this reason, the resulted logic inverter is far from the state of the art. Indeed, although the inverter output signal is CMOS-compatible, it cannot be used to directly export off-board the photon-timing information without a significant increase in electrical crosstalk and time jitter, since it is a single-ended and slow-varying signal. Therefore, we employed the same comparator used in the Parafluo module to export the output signal of the pick-up circuit. The 8 Â 1 comparator array is built in 0.35-m CMOS technology and features LVDS (low-voltage differential signaling) outputs. Since the crosstalk between channels must be minimized, the whole circuit has been designed with constant current structures that reduce disturbances injected through power supply nets during the input signal commutations.
To quench the SPAD avalanche and restore the reverse bias, a 32 Â 1 mixed active-passive quenching circuit array was designed. Since high integration level, low power consumption, and high-voltage transistors were required, the custom technology process optimized for the SPAD performance could not be employed; therefore, the AQC array was fabricated in 0.18 m HV-CMOS technology. The equivalent AQC threshold the SPAD avalanche current is read at, the reset, and the hold-off time can be easily adjusted by external resistors; in particular, a minimum dead time of 16 ns can be set for the AQCs, which corresponds to a maximum count rate of 60 Mcps per channel of the developed system. Finally, each AQC provides a digital 1.8-V signal every time a photon is detected by the corresponding SPAD; therefore, single-photon counting data elaboration can be performed both on-board and off-board, as will be described in Section 2.4.
Signal Processing Board
The photon detection head was specifically designed for photon-timing applications, and the 32 signals extracted from the integrated comparators have to be exported without worsening the low time jitter and keeping as low as possible the crosstalk between channels. Therefore, the parallel signal path shown in Fig. 1(b) was designed. The outputs of the four comparator arrays are buffered by 32 differential drivers: the SY54016AR buffer from Micrel was chosen, since it features very low time jitter ðG 1 ps RMS Þ and provides a steep current mode logic (CML) differential output signal. Finally, to connect the photon detection head to the multichannel TCSPC acquisition system, we employed the 32-differential-pair custom cable from SAMTEC, which features high channel density, high bandwidth, and low crosstalk between signal pairs.
However, the photon-timing signals cannot be used for single-photon counting data elaboration. Indeed, although the AQC provides a count rate up to 60 Mcps per channel, at such high commutation frequency, the slow-varying output signal of the integrated inverter would be extremely filtered, making impossible to detect the single-photon counting events. Moreover, placing a data processing circuit for photon-counting elaboration into the photon-timing signal path would add a capacitive load that may worsen the timing performance and the signal integrity. For these reasons, the AQC outputs are used for photon-counting applications, employing a different parallel signal path, as shown in Fig. 1(b) . Since the AQC ground reference is tens of volts below the common ground, AC-coupling is required. Time jitter is not critical for single-photon counting applications; therefore, the FPGA is used to route the 32 photon-counting signals. With this architecture, raw photon counting signals can be exported by a 68-pin SCSI VHDCI connector, while the processed data can be exported through a USB 2.0 transceiver (the FT2232H device from FTDI Chip) by connecting the photon detection head to a PC. A graphic user interface (GUI) was designed in C# language to allow the user to download the data resulting from the on-board photon-counting measurement and also to remotely control the power management board (which is described in the next section).
Power Management Board
The power management board included in the designed module implements a microcontroller that allows a digital adjustment of the bias voltages provided to the integrated arrays. As an example, the SPADs bias voltage can be set between 20 V and 43 V; therefore, the system can handle SPADs with a wide range of breakdown voltages, precisely controlling the excess bias voltage (that is a key variable in PDE, DCR, and time resolution performance). The microcontroller also implements a closed-loop temperature control by reading the SPADs temperature through a negative temperature coefficient (NTC) thermistor and by driving the TEC cooler. Employing the temperature control, the SPADs temperature can be regulated down to À15 C, reducing the DCR (as described in Section 2.2) and avoiding fluctuations of SPAD performance while a measurement is running.
The photon detection head works with a single 16-V DC input power supply and consumes less than 40 W with the temperature control active and at the maximum achievable count rate.
TCSPC System
The 32-photon timing signals coming from the detection head are used to start the time-toamplitude conversion in the 32-channel TCSPC module (see Fig. 3 ). It comprises four compact eight-channel TCSPC boards, each of them capable of managing and recording its TCSPC data independently. The START signals are fed to the TAC arrays included in the four eight-channel TCSPC boards through an interface board. Data recorded in the boards are exported using a USB HUB, mounted on a control board; this way, the data transmission to and from an external PC is ensured through a Hi-Speed USB 2.0 connection. The whole system works with a single DC power supply (from 8 V to 16 V), has a power consumption of about 30 W, and is enclosed in a small aluminum case (160 Â 125 Â 30 mm).
Eight-Channel Board
The eight-channel acquisition board [21] includes two four-channel TAC chips, which outputs are fed to a commercial eight-channel ADC. As shown in Fig. 4 , the ADC outputs are recorded by an on-board FPGA that organizes the collected measurements and manages the TAC operations. Finally, data are exported through a USB transceiver that allows a USB 2.0 link.
The TAC is the most critical block in the acquisition chain, and to obtain high performance, our system includes eight integrated four-channel TAC arrays built in 0.35-m Si-Ge technology. The converters represent an improved version of a four-channel TAC chip developed in our research laboratory [22] . Moreover, each converter features a variable full-scale range (FSR) that can be changed by digitally setting the current generated to charge the capacitor. The four possible values are 11 ns, 22 ns, 45 ns, and 88 ns.
The operating principle of each converter is reported in Fig. 5(a) : The START signal triggers a current generator that begins to charge a conversion capacitor ðC CONV Þ with a constant current ðI CONV Þ, until the STOP signal arrival. A STROBE signal is driven high when the output value is ready to be sampled by subsequent circuital blocks, and after the conversion, the TAC returns to the initial conditions when the RESET signal is asserted.
As previously mentioned, the TAC ADC structure can provide a very low system DNL, but this can be achieved only by reducing the impact of ADC DNL on the whole acquisition system. To this aim, we adopted a specific dithering technique, known as Bsliding scale[ [23] , which consists of adding a signal, called dithering signal, to the TAC output.
In the presented system, the variable signal is generated by a 10-bit digital-to-analog converter (DAC), integrated in the four-channel TAC chip [as shown in Fig. 5(b) ]. The dithering signal is added to the TAC conversion and is uncorrelated with the TAC output [24] , which is a statistic distribution for a TCSPC measurement. Downstream the ADC, the FPGA must subtract the dithering contribution from the sampled digital value. The effectiveness of this technique was tested and demonstrated in a single-channel TCSPC system developed in our research group [25] , and in order to minimize the area occupation, a 10-bit counter is integrated with the DAC to generate a triangular shaped dithering signal. The DAC contribution is added to the four TAC outputs using four integrated adder stages that also adapt the resulted voltages to the ADC input dynamic range. For each channel, the ADC conversion is sampled by the on-board FPGA when the TAC provides the STROBE signal. The acquired value represents the address of the memory cell, which content has to be increased in order to build the TCSPC histogram. The chosen A/D converter is the AD9252 from analog devices since it features eight independent channels with 14-bit resolution and 50-Msps conversion rate. The AD9252 eight serial outputs are in LVDS DDR standard, synchronous with a 350-MHz clock.
The eight ADC outputs are sampled by the XC6SLX150T device, a Spartan-6 FPGA from Xilinx, that deserializes the 14-bit codes using the built-in programmable Serializer/Deserializer blocks. Inside the FPGA, the analog value added by the DAC to the TAC outputs is subtracted from the deserialized result, thus compensating the dithering contribution. The digital value after the dithering compensation represents the time bin that received the photon detection event, and it corresponds to the RAM address which content has to be updated. The eight histograms are stored inside the FPGA into eight dual-port RAM blocks divided into 2 14 cells, each one having a 32-bit depth. The dual-port RAM allows simultaneous writing and reading operations, managed by two different processes. Writing operations are synchronous with the STROBE signal, while reading ones are synchronous with the FT2232H Hi-Speed 2.0 USB transceiver from FTDI chip. The latter receives the TCSPC results and transmits these data to the control board that gathers all the four data stream in a single one through the USB HUB. In order to build a complete acquisition system, we also designed a GUI in Visual C# language to allow a remote control of the developed module (see Fig. 6 ). The software can be used to start or stop the TCSPC acquisition, to read the data coming from the FPGAs, or to erase the FPGA memory. The histograms resulted from the measurements are shown in a dedicated chart area, and all the downloaded data can be saved in ASCII format.
During a measurement, the histograms inside the FPGA memory are continuously updated with the actual TAC conversion frequency, and they are downloaded with the rate allowed by the PC; consequently, the achievable USB transfer rate does not impact on the maximum system count rate. The main limitation in the data download is introduced by the graphical data visualization that is quite resource demanding. However, it does not represent a limitation of the TCSPC acquisition; it simply means that the histograms are updated on the PC screen only two to three times per second.
Interface Board
The 32 START signals coming from the photon detection head through the custom 32-differentialpair cable are read using an interface board that receives the signals from a 98-position connector and routes them to the four eight-channel TCSPC boards (through four 80-position SAMTEC connectors). This board is also used to manage the STOP signal coming from the laser and to send this reference to the four TCSPC boards without increasing its timing jitter. Finally, the interface board routes the four USB buses, coming from the TCSPC boards, to the control board, which is connected with two 40-position SAMTEC connectors.
Control Unit
As already described, all the data recorded inside the four TCSPC boards are read using the USB HUB (USB2517 from Microchip Technology) that is included in the control board. This HUB allows to use only one USB cable to download the data from the module to the external PC. Moreover, since the module works with a single DC power supply (from 8 V to 16 V), the control board receives and manages the voltage reference from an external AC/DC adapter.
Experimental Results
Several tests were made to characterize the developed system. We chose to show the results as a function of the channel number in the TCSPC module, rather than the SPAD position in the 32 Â 1 array, since the former revealed a more interesting correlation between channels, as described in Section 4.3. With this numeration, each group of four consecutive channels refers to one fourchannel TAC chip, while each group of eight consecutive channels refers to one eight-channel TCSPC board. All the measurements were made setting the minimum AQC dead-time of 16 ns.
First, we will present the dark count rate and the PDE of the detection head, which were measured as a function of the SPAD excess bias voltage. The DCR variation versus temperature was also evaluated. Afterwards, we will show the DNL and the time resolution results of the 32-channel TCSPC module. The eight-channel boards included in the system have been already fully characterized [21] ; therefore, in this paper, we will only show the performance variation as a function of the 32 channels, highlighting the correlations between different eight-channel boards. Finally, we will discuss the time jitter performance and the crosstalk between channels, which involve the whole system from the SPAD array to the ADC input.
Dark Count Rate
Even if the SPAD is kept in an environment with absence of light, an avalanche can still be triggered by thermally generated electron-hole pairs [26] , [27] . These events, called dark counts, represent a significant noise contribution in single-photon detectors; therefore, it is important to estimate their rate. The average dark count rate over a time-window equal to 30 s has been evaluated for each SPAD of the 32 Â 1 array by performing a photon counting measurement in a dark environment (zero optical power on the detectors). Fig. 7(a) shows the DCR (normalized to be counts per second) of the 32 channels, sorted in ascending order and rescaling the x -axis between 0% and 100%. The two measurements were made at two different temperatures, with 6 V of SPAD excess bias voltage. As a result, more that 90% of the channels feature a DCR lower than 20 kcps at 25 C and lower than 600 cps at À10 C. Although from the reported graphs two different SPAD families can be identified, the first one featuring a mean DCR of 40 cps and the second characterized by a 500-cps DCR, the displayed trend is absolutely random, uncorrelated from the SPAD position and different between different arrays. The correlation between different DCR families and the presence of impurities in custom SPADs is still under investigation [27] . Fig. 7(b) reports the DCR versus SPAD excess bias voltage, at two different temperatures. Each vertical bar shown in the chart represents the DCR distribution among the 32 channels. The two ends of each bar indicate, respectively, the maximum and minimum DCR values measured at a certain temperature and SPAD excess bias voltage. Moreover, the circles connected to the lines represent the mean DCR values measured for the 32 channels. The DCR behavior agrees with the results obtained for single SPADs [26] . In particular, the advantage of cooling the detectors is clear, since at À10 C, the mean DCR is more than one order of magnitude lower than the DCR at room temperature.
PDE
A basic step in the characterization of a single photon detector is the measurement of the PDE, which is defined as the ratio between the number of detected photons and the total number of photons that hit the detector. For each channel of the presented system, the PDE was measured versus the excitation wavelength, at room temperature (25 C). The wavelength was set from 400 nm to 1000 nm with steps of 50 nm, using a monochromator (Oriel Spectraluminator 69050). The PDE results were obtained by comparing the photon counts with the output of a power meter that is used to calibrate the measurement setup. The mean PDE values among the 32 channels versus wavelength and SPAD excess bias voltage are shown in Fig. 8(a) .
PDE increases at higher SPAD excess bias voltage, since the higher electric field enhances the avalanche triggering efficiency. As an example, considering the 7 V of excess bias voltage, the mean PDE peak is close to 50% and corresponds to a wavelength equal to about 550 nm. Moreover, the efficiency remains higher than 20% over all the visible spectrum. In Fig. 8(b) , the PDE versus channel is shown at three different wavelengths, with 6 V of SPAD excess bias voltage. The chart reveals a very good uniformity of PDE among the 32 channels.
DNL
The DNL of the 32-channel TCSPC module was tested to characterize the acquisition system and to compare it with the TCSPC state-of-the-art instruments. The employed setup consists of two pulse generators: one provides the STOP signal with a fixed frequency and the other generates a random reference that is used as START; this way, START and STOP signals are uncorrelated.
For better readability, only the rms values of the DNL, for the four FSRs, are reported in Fig. 9(a) . The evaluated DNL values show very slight variation between the different channels and, in the reported trends, a strong correlation between the position of the TAC inside the four-channel chip and the measured DNL can be noticed. This is probably introduced by the physical layout of the chip and, in particular, by a nonperfect symmetry of the power supply voltage distribution between the external pads and each converter. However, the obtained DNL values are extremely good, lower than 0.15% of LSB (least significant bit), with a maximum fluctuation lower than 0.02% of LSB. 
Time Resolution
The time resolution of the TCSPC module was measured too. The START and STOP signals were obtained by splitting the output of a pulse generator, with an adjustable passive delay line placed on the STOP signal path. The four FSRs of each channel were measured, but, for better readability, only one measurement for a fixed time delay (40 ns for 88-ns FSR, 20 ns for 45-ns FSR, 10 ns for 22-ns FSR, and 5 ns for 11-ns FSR) is reported in Fig. 9(b) . As can be noticed, the FWHM values scale if the FSR is reduced. Switching from 88-ns to 45-ns FSR, an increase by a factor of 2 in the time resolution can be observed, but, for shorter full scale, the improvement factor is lower.
As we have already explained [21] , TAC resolution is limited by the analog noise of the conversion stage and the jitter of the input logic; the latter contribution is completely negligible for longer FSRs, but, for shorter ranges, it overcomes the analog noise.
After the TCSPC module characterization, we measured the time resolution of the whole system, connecting the photon detection head to the multichannel TCSPC instrument through the parallel cable. A mode-locked pulsed laser with 80 MHz of repetition rate, 780 nm of wavelength, and optical pulse with FWHM G 1ps was employed to stimulate all the 32 SPADs in parallel. This laser also provides an output signal, synchronous with the optical pulse, used as STOP reference for the TCSPC measurement. Since the STOP period is equal to 12.5 ns, the acquisition system FSR was set to 11 ns.
In Fig. 10 (a) the system response function (SRF) is shown for one channel; the measurement was carried out with 6 V of SPAD excess bias voltage and 1 Mcps of conversion frequency, and resulted in an FWHM value of 58 ps. The same measurement was made for all the 32 channels, at room temperature, as a function of the SPAD excess bias voltage and the mean conversion frequency. The obtained results are reported in Fig. 10(b) . As can be noticed, the time resolution increases with the SPAD excess bias voltage [27] . This behavior is due to the higher electric filed that makes the avalanche current grow faster; consequently, the statistic contributions to the avalanche current propagation have a reduced effect on the current time jitter [28] . Moreover, the time resolution is worsened when the conversion frequency increases. The effect is caused by the slow-varying output signal of the integrated inverter, which does not perfectly return to the quiescent value before the following SPAD avalanche is detected.
Exploiting the same setup, the time resolution of the detection head itself was measured by substituting the developed TCSPC module with a commercial TCSPC acquisition system (SPC130 from Becker&Hickl). The measurement resulted in a mean time resolution of 60 ps FWHM (at 1-Mcps mean count rate and 6-V SPAD excess bias voltage), poorer than the 45 ps FWHM previously obtained with the same pixel structure [16] . The higher number of detectors leads to larger performance fluctuations across the array, and the optimization of a single detector time jitter causes worse performance of other SPADs. Hence, the system parameters were adjusted to limit the time resolution fluctuations among all the channels.
Crosstalk
Crosstalk is a key parameter for a multichannel TCSPC system, since it introduces spurious correlations between the output signals of different channels.
Optical crosstalk consists in an avalanche event on one SPAD that is triggered by an optical emission caused by the avalanche current flowing in another SPAD. The probability of optical crosstalk depends on many variables, in particular, the amount of charge flowing during an avalanche, the pitch between pixels, and the SPAD structure [29] . The crosstalk probability for the SPAD array included in the presented system was measured, resulting in a probability of 10 À2 for adjacent detectors, and a negligible probability for SPADs separated by more than one time the array pitch. The phenomena will be more deeply investigated in a further paper.
On the contrary, electrical crosstalk is due to electric coupling between different channels of the whole signal path that starts from the SPAD cathode (in the detection head) and ends at the ADC input (in the TCSPC system). Measurements were carried out on all the parts that make up the photon-timing signal, revealing an electrical crosstalk that is negligible with respect to the optical one. Indeed, the effect is not strong enough to cause a TAC conversion on one channel that is not due to a photon-detection event on the same channel. However, the electrical crosstalk can interfere with the photon-timing signal, generating a distortion in the DNL of the TCSPC system. This effect was evaluated for the whole signal path with a measurement setup similar to the one described in [21] . The photon detection head and the parallel cable contributions resulted negligible with respect to the TCSPC system. Indeed, the crosstalk can be estimated in a peak-to-peak disturbance equal to 6% of the time bin width when three converters are interfering with an observed TAC. This value is extremely weak, considering that the measure represents the worst possible operating condition.
Conclusion
In this paper, we have presented a complete and compact 32-channel system for TCSPC measurements. As shown in Fig. 11 , the single-photon detection part is a stand-alone module and is connected to the TCSPC elaboration unit through a parallel cable.
The system includes a 32 Â 1 custom technology SPAD array featuring low dark count rate (mean DCR G 400 cps at À10 C) and high PDE (44% at 550 nm) at 6-V excess bias voltage. The photontiming information is transmitted through a low-jitter interface to the multichannel TCSPC acquisition Fig. 11 . Picture of the complete TCSPC acquisition system: it includes the photon detection head and the 32-channel module for TCSPC measurements.
module. The time-to-digital conversion is performed exploiting the TAC-ADC structure that features a time resolution on the order of tens of picoseconds and a DNL value lower than 0.15% rms of the LSB (both meet the specifications for TCSPC applications). Finally, considering a mean count rate of 1 Mcps, the mean value of the SRF is 63 ps FWHM .
In Table 1 , we summarized a performance comparison among state-of-the-art multichannel systems that can be employed in TCSPC measurements.
The presented system features very high detector performance, particularly in terms of PDE in the visible range. A comparison with state-of-the-art TCSPC instruments reveals that our system features also the lowest DNL and achieves the best time resolution. On the contrary, the number of channels of the presented system is still much lower than the one of CMOS imaging sensors, and the occupied area and power consumption per channel is larger.
However, if we look at the complete system, the overall count rate obtainable with our device is comparable with the one achieved by systems with much more pixels. Hence, since most of the TCSPC applications require more and more detectors to increase the overall count rate and to reduce the measurement time, the presented system is able to address this requirement with an extremely lower channel number. 
