Abstract-
such as photomultiplier tubes and desktop computers. This bulky and relatively expensive hardware has limited the approach to a few channels, megahertz acquisition rates, and imaging based on mechanical scanning. More recently, CMOS manufacturing has permitted large arrays of SPAD detectors to be manufactured together with timing and signal processing electronics on a single chip.
SPAD arrays together with parallel TCSPC have been the enabling factor in the first high volume quantum photonic consumer applications [2] . Large investment in LIDAR for autonomous vehicles is further propelling CMOS SPAD technology toward advanced nanometer nodes [3] with detector performance approaching that of custom devices.
A number of SPAD image sensors have been proposed, permitting TCSPC data to be acquired in parallel from every pixel [4] [5] [6] [7] [8] . They provide new capabilities to capture images of light in flight, non-line-of-sight targets, objects through diffuse media, two-photon fluorescence lifetime imaging microscopy (FLIM), super-resolved single molecules, and ToF depth at various range scales [9] [10] [11] [12] [13] [14] . Despite their excellent timing performance, these arrays suffer from low fill factor (a few percent) or large pixel pitches (40-150 µm) limiting their sensitivity and spatial resolution. A number of recent SPAD image sensors resolve this tradeoff by using event-driven dynamic allocation of time-to-digital converters (TDCs) either off-focal plane in frontside illumination or vertically stacked [15] , [16] . The per-SPAD TDC imager architecture is of interest to provide the maximum capacity to convert simultaneous photon arrivals within the same laser cycle such as those occur in flash LIDAR on highly reflective targets.
In this paper, we present a SPAD-based TCSPC imager in 40-nm CMOS technology with the smallest TDC reported to date (9.2 µm × 9.2 µm) [17] . The 12-bit TDC achieves the finest timing resolution (tunable from 33 to 120 ps) of all reported TCSPC pixels at an energy efficiency figure of merit (FoM) of 34 fJ/conv and <1 LSB differential nonlinearity (DNL) and <6 LSB integral nonlinearity (INL). The photon detection efficiency (PDE) of the array has been enhanced with cylindrical microlenses to provide a mean concentration factor of 3.25% and a 42% effective fill factor. The sensor also has a very low median dark count rate (DCR) of 25 Hz obtained at 1.5-V excess bias and at room temperature conditions. SPADs have a peak photon detection probability (PDP) of 34% at 560 nm for 1-V excess bias at room temperature [3] and a quench time of 5 ns.
This combination of high sensitivity, low noise, and precise timing resolution offers a transformative capability to lowlight, time-resolved, wide-field microscopy, where sensitivity is paramount to image very dim samples and the contrast of lifetime is the key requirement to answer many biological questions. The sensor improves on the architecture proposed in [4] by trading off lower frame rate and jitter for a higher pixel sensitivity and resolution. Indeed, many practical applications of FLIM demand video frame rate and few 100 ps contrast between multiplexed fluorophores with lifetimes in the 2-10-ns range. In addition, the 18 kf/s of timestamped pixel corresponds to the typical photon arrival rates typical of dim microscopic scenarios thus minimizing pile-up loss.
The maximal TDC full-scale range of 490 ns also enables ToF laser ranging applications up to 73.5-m distance. The imager integrates an on-chip calibration scheme using a column of the imager which continuously measures full-periods of a clock to allow off-chip digital process, voltage, and temperature (PVT) compensation of every TCSPC frame on the fly. Full characterization results of the sensor are presented, as well as FLIM images.
II. SENSOR DESIGN
A micrograph of the sensor is shown in Fig. 1 . The 3.15 mm×2.37 mm chip is integrated into STMicroelectronics 40-nm CMOS technology offering industrialized SPADs [3] . The sensor block diagram in Fig. 2(a) consists of addressing circuitry, 64 parallel-to-serial converters, and a 192 × 128, 18.4 µm × 9.2 µm pixel array. Each pixel comprises a TDC coupled to a SPAD. The last column of the pixel array implements a calibration scheme to allow off-chip digital PVT compensation of every frame on the fly. A serial interface (SI) allows configuration of the sensor in multiple modes such as TCSPC and photon counting, as well as optional enabling and disabling of selectable pixel rows and columns. The SPADs adopt a column pairwise well-sharing layout strategy in order to optimize the fill factor and allow future 3-D stacking at a regular 9.2-µm pitch [18] . The sensor operates with the greatest photon efficiency at a maximum frame rate of 18.6 kframes/s with laser repetition rates of around 2 MHz (assuming the conventional 1% pile-up limit). 
A. Circuit Architecture
A highly optimized version of the pixel architecture originally proposed in [4] has been implemented in order to attain a pitch compatible with scientific imaging or ToF applications and scalable to megapixel resolutions. This involves dispensing with any functions that are not necessary for the pixel and re-using hardware resources wherever possible. In particular, the d-type flip-flop found in the digital cell library has been optimized to remove redundant re-buffering of clock signals and outputs. This saves over 30% of the dominant area contributor to the pixel.
The circuits interfacing the SPAD to the TDC and photon counting functions are shown in Fig. 3 . The layout of the pixel overlaid to show the different blocks is shown in Fig. 2(b) . A single thick oxide 3.3-V nMOS biased with a global gate voltage VQ passively quenches the SPAD. SPAD pulses are level shifted to the 1.1-V digital V dd by a thick oxide inverter. All other circuits exploit the digital 40-nm transistors.
In TCSPC mode (TCSPC = 1), a compact edge-sensitive trigger circuit generates an enable signal S for the TDC by means of a pair of d-type flip-flops. The first flip-flop will latch a 1 on the rising edge of the first SPAD pulse falling within the exposure period and coincident with a high state of WINDOW signal (time between R st pulses) starting the TDC. The second flip-flop resets S to 0 on the next rising edge of the STOP waveform provided that the TDC has been started. In this way, the WINDOW signal achieves global electrical masking of photons events, allowing suppression of ambient background in LIDAR applications or dark count events in FLIM. In particular, this reduces the likelihood that precious TDC resources are expended on photons likely to be uncorrelated with the laser excitation.
In photon counting mode (TCSPC = 0), another d-type flip-flop toggles on the rising edge of the SPAD pulse only if WINDOW is high, generating the SPADWIN signal. This signal also acts as the least significant bit (LSB) of the photon count. In this mode, the WINDOW signal provides global electrical masking on light intensity. This allows either global shutter imaging with zero parasitic light sensitivity or single photon synchronous detection (SPSD) operation [19] when operated over a number of frames and quadrature WINDOW gates. Fig. 4 shows the TDC circuit which consists of a 4-stage pseudo-differential gated ring oscillator (GRO), level shifting and coupling stages [4] . The ring oscillator core is supplied from a separate power rail V ddro to allow global external tuning of the TDC resolution. The separate V ddro power rail also minimizes coupling of this critical high-frequency timing reference to the unrelated activity of other digital functions on the chip such as row addressing and readout. Both V ddro and V dd are gridded up to the top metal layers (metals 6 and 7) at the smallest allowed routing pitch. The top metals are thick and have low resistivity, minimizing the voltage droop due to high current activity from the fast switching ring oscillators and ripple counters.
Setting signal R high resets the TDC to an initial condition. The rising edge of signal S starts the ring oscillator that operates over a range of 2-4 GHz depending on the V ddro setting. At the instant, the signal S falls through the nodes T 3:0 andT 3:0 regenerate to memorize the internal state of the oscillator. The state of these internal nodes is used to provide the three LSBs of the TDC by a decoding operation performed in software. Three balanced dynamic comparators act to level shift the states of T 3:0 from V ddro to V dd while reducing the loading on the loop to only two floating nMOS transistors. A cross-coupled level shifter couples T 3 andT 3 to the first stage of a ripple counter and resolves potential metastability issues when S falls at the same instant as a positive transition on T 3 .
The main pixel schematic integrated into a 9.2 µm×9.2 µm area is shown in Fig. 5 . An 8-bit ripple counter is multiplexed either to act as a photon counter or to count oscillator periods to extend the dynamic range of the TDC. In TCSPC mode, a dedicated high-speed toggle flip-flop immediately divides the ring oscillator frequency to allow this high-speed signal to pass the multiplexer. Thus, the coarse LSB in TCSPC mode (C 0 ) and the LSB in photon counting mode (SPADWIN) are derived from two different flip-flops. Tri-state inverters controlled by a row read signal drive the 14-bit state of the pixel onto a column output bus under control of the row addressing circuit.
B. TDC Calibration
It is well known that gated ring-oscillator TDC resolution is strongly influenced by the power supply voltage and temperature [20] , [21] . In addition, a standard deviation of around 1% in the LSB has been determined across a single column. The wide TDC resolution tuning range is a useful feature to extend the dynamic range of the sensor for different fluorescence lifetimes or ToF distances. However, it also represents an uncertainty of the achieved time resolution in the case of unknown process, voltage and temperature variations affecting the ring oscillator. A column of pixels on the right side of the imager continuously measures full-periods of the STOP clock to allow off-chip digital PVT compensation of every frame on the fly.
These particular pixels, henceforth known as calibration pixels, occupy alternating rows of the right-most column of the pixel array for 96 calibration pixels. When enabled, the TDC data from the calibration column are read out of the chip in place of the last four right-most ordinary pixel columns. The TDC in the test pixels differs from the imaging pixels by having the STOP clock connected in place of the SPAD anode in the imaging pixels. The TDC is started by the rising edge of the STOP clock and stopped by the rising edge of the subsequent STOP cycle, thus timing the period of the STOP clock.
The 64 parallel input-to-serial output (PISO) converters read out the data from each of the pixels in a rolling row readout scheme; 32 top and bottom PISOs read out the respective half columns of the array. Each PISO reads out four pixel columns from each row as shown in Fig. 6 .
C. Sensor Operation
Two exposure modalities are possible; high temporal aperture ratio (TAR) [22] rolling or parasitic light insensitive global shutter. In the former, pixels are read and reset using a rolling shutter scheme with minimal motion artifacts due to the high frame rates. A token-passing row shift register reads pairs of rows of the pixel array from the central rows outwards in a rolling cycle and operates continuously at up to 18.6 kframes/s. An arbitrary pattern of rows can be read out at a faster frame rate upon identification of regions of interest. At any time, only the currently two addressed rows of the pixel array are not in integration, achieving a TAR of 99% that is essential for low light imaging applications.
Upon triggering and timing a single photon, each TDC is dead and unavailable to detect subsequent photons within the same exposure before readout. High illumination conditions and long exposures could decrease the effective TAR due to pixels firing at the start of the exposure and being inactive for the remaining temporal aperture. This effect is negligible due to the low light operating conditions of FLIM. On the other hand, longer exposures increase the probability of triggering more TDCs to avoid reading out empty frames. Adjusting the readout frequency and thus the exposure period in response to the illumination conditions can be used to optimize the TAR while maximizing the probability of reading out triggered pixels from each frame.
In global shutter mode, the WINDOW signal is used to enable TCSPC or photon counting within arbitrary frame durations. In TCSPC mode (Fig. 7) , a laser is pulsed in synchrony with the STOP pulse distributed to the whole array via a clock tree. The TDC will only startup if the rising edge of the SPAD pulse is contained in the WINDOW high period. Only the first such photon will be captured within an exposure period. In the photon-counting mode, photons will be integrated precisely within the WINDOW high period which allows exposure times to be set from nanoseconds to seconds time scales.
Banks of 32 parallel-to-serial converters at the top and bottom of the array, each converts 4 columns of 14-bit data into a 56-bit serial sequence to 64 I/O pads at a maximum rate of 100 MHz. The readout time of an entire frame is therefore 54.76 µs.
III. SENSOR CHARACTERIZATION
The TDC INL and DNL are measured using a code density test with ambient light providing a random input to populate a histogram with over 300-k photon timestamps. The DNL/INL plot of a typical pixel is shown in Fig. 8 with the TDC operating at nominal V ddro = 1.1 V over 140 ns (92.5% of full scale at this voltage).
The impulse response function (IRF) of a typical pixel is measured using a Hamamatsu PLP-10 685-nm laser diode in Fig. 9 . The SPAD is biased at 1.5-V excess bias and a typical jitter characteristic with a diffusion tail [3] and FWHM/100 of around 1 ns is observed. A map of IRFs of the full-pixel array is shown in Fig. 10 with the IRF of hot pixels set to 0 (highlighted in blue) and removed from calculations. The mean pixel FWHM jitter is 219 ps with a variance of 26.7 ps. This is close to the native jitter of the SPAD of 170 ps [3] , suggesting around 138 ps is due to the laser (FWHM = 70-100 ps) and the TDC. The deviation on the TDC LSB at full TDC range is 45 ps for nominal TDC operating conditions. The SPAD median DCR is measured to be 25 Hz at a 1.5-V excess bias and at room temperature conditions. Each SPAD has a 22-µm 2 active area, implying a 1.14-Hz/µm 2 median DCR.
The TDC resolution as a function of ring oscillator supply voltage V ddro is measured using the calibration column. The results reported in the graph in Fig. 11 show a TDC resolution varying from 33 to 112 ps when varying V ddro from 1.2 to 0.7 V.
The TDC data reported by the calibration column are used to calibrate off-chip TDCs of the imaging pixels. To test the TDC calibration of the pixels in response to ring oscillator supply voltage variation, the IRF of a Hamamatsu PLP-10 654-nm laser diode is measured at different V ddro values from 1.2 to 0.75 V in steps of 100 mV. The uncalibrated TDC data show a shift in the IRF across the histogram bins in Fig. 12 consistent with the varying TDC resolution across the voltage values. Increasing V ddro results in a narrower bin size, thus the same laser pulse is shifted to a higher bin position in the histogram. The width of the histogram IRF also increases with the increasing V ddro in line with a narrower bin width. A Hamamatsu C10196 Picosecond Light Pulser is used to drive the laser and provide a 10-MHz STOP clock to the sensor. The calibration pixels output the TDC data corresponding to the measurement of the STOP clock period. This is used together with knowledge of the STOP clock frequency to calculate the TDC LSB resolution. The TDC resolution computed by the calibration columns for each voltage step allows correcting for the voltage-dependent time shift in the IRF and distortions in the histogram. By multiplying the imaging pixel TDC data by the reported resolution of the calibration pixels, the IRF is shown to align in Fig. 13. Fig. 14 shows the centroid of the pixel IRF before and after applying the TDC calibration to the collected TCSPC data. The calibrated IRF centroid undergoes a deviation of 75 ps across the V ddro range (blue trace in Fig. 14) , a factor of 100 improvement on the maximum voltage-induced IRF centroid variation in the non-calibrated histogram (red trace in Fig. 14) . Though not demonstrated here, the same calibration can be extended to correct for process and temperature-dependent variations in the TDC resolution.
A single calibration pixel is sampled for 10 000 cycles with V ddro set to 1.1 V to determine the deviation of the measured TDC LSB over multiple calibration cycles. The TDC LSB distribution of a single calibration pixel has a standard deviation of 0.087 ps as shown in Fig. 15. Fig. 16 shows the distribution of the TDC LSB when sampling the whole calibration column, comprising 96 pixels, over 10 000 calibration cycles. The standard deviation in the TDC LSB increases to 0.401 ps, 1% of the TDC LSB. A low deviation in the calibration column TDC LSB means that samples from all calibration pixels can be combined to obtain a single calibration factor to correct for PVT-induced histogram distortions in the imaging pixel data as shown in Figs. 12 and 13 . Combining the calibration data from all pixels in the calibration column reduces the time required to collect enough samples to calculate the reference TDC LSB resolution for the TCSPC histogram data, thus being able to perform calibration on the fly with a maximum of 96 calibration samples per frame. The number of samples required for effective calibration depends on the sensor application and the expected variations. As calibration TDCs are also affected by ring oscillator accumulated jitter, a higher number of samples might be required when using a low frequency STOP clock for calibration. Temperature variations from on-chip power consumption, in turn determined by the illumination intensity, might also require more frequent calibration cycles to monitor temperature-related TDC LSB variations.
Measurement of the power consumption of the sensor on each of the chip supplies as a function of the incident illumination is shown in Fig. 17(a) . As expected, the power consumption of such an array is proportional to light level. The low duty cycle operation of the TDCs shows negligible power consumption of the ring oscillators saturating at 0.41 mW with power consumption per TDC of 16.5 nW on V ddro . The dominant contributors being 90 mW for the SPADs, 10 mW of the core electronics supply, and 3 mW I/O power for a maximum total sensor power consumption of 140 mW under high illumination. The power consumption on the ring oscillator supply shown in Fig. 17(b) shows the power increasing with light level due to an increasing number of TDCs triggered per frame. V ddro saturates at the level corresponding to all pixel TDCs triggered within a 10-MHz STOP clock period. Power consumption measurements are presented for a light intensity sweep, reaching illumination level orders of magnitude higher than would be expected in FLIM applications. The sensor power consumption for all supplies would be expected to lie in the linear regions of the curves in Fig. 17 rather than at peak power saturation, expecting a total sensor power consumption less than 10 mW for count rates below the sensor pile-up limit.
Cylindrical microlenses have been implemented on a per-die basis [23] , achieving a mean concentration factor of 3.25 ( Fig. 18 ) and increasing the effective SPAD fill factor from 13% to 42%. These microlenses focus light onto a pair of SPADs between a column of two TDCs as shown in Fig. 19 . Some light is lost in the n-well isolation region between the two shared-well SPAD rows.
IV. FLUORESCENCE LIFETIME IMAGING
In order to demonstrate wide-field FLIM, onion cells stained with the dye DASPMI [24] were studied on a microscope setup using a HORIBA Scientific DeltaDiode DD-485L laser as the excitation source. The SPAD array was packaged into a camera module referred to as QuantiCam and integrated into commercial FLIM software and mounted on a simple microscope demonstration system. The HORIBA Scientific EzTime Image software enables a "region of interest" to be selected in the photon counting intensity image [see Fig. 20(a) , where the red box indicates the region selected]. TCSPC data were just collected from pixels in the region of interest. This showed an area including the cell walls. The lifetime data were analyzed globally using the EzTime software as the sum of two exponentials and lifetimes of 1.58 and 2.55 ns were obtained. Maps showing the average lifetime [ Fig. 20(b) ] and the normalized pre-exponential components for each of the lifetimes are given in Fig. 20(c) and (d) . This shows that the longer-lived decay component is predominately associated with the cell wall and provides a contrast to the cell interior.
Wide-field FLIM acquisition with the Quanticam is compared to scanning FLIM using a modified HORIBA Scientific DynaMyc (including FiPho timing electronics, DeltaDiode laser excitation, and HPPD-720 detection). In both cases, data were collected and analyzed using the EzTime Image software. Although theoretically the Quanticam's parallel data acquisition can collect times faster, a "real world" measurement on a sample to the same precision (225 million photon events and a similar peak histogram count) was made using a Convallaria root sample.
The outcome of these measurements is shown in Fig. 21 . The time to collect the data with the scanning system was ∼16 min, while the equivalent measurement using the QuantiCam on a simple microscope demonstrator took 15 s. Thus, even on a very simple setup, the QuantiCam can collect data orders of magnitude faster than a conventional scanning system.
To further investigate the potential for fast-FLIM acquisition, a test was performed taking into account the report that >185 photon events are required for a basic analysis of TCSPC data [25] . Measurements with different data acquisition times were taken on the convallaria sample and the images accessed to check that sufficient pixels contained over 200 photon events in order to facilitate data analysis. Fig. 22 shows the image quality obtained with a data acquisition time of 100 ms. It should be noted that both permanently on or dead pixels can be masked or removed from the image using the acquisition software (EzTime Image) and thus not contribute to the fitting of the lifetime data. However, this can give rise to the speckle effect seen in some of the FLIM data and at this point, no attempt has been made to disguise this by interpolation of the data from surrounding pixels. Also, no correction has been made for the fact that the pixel size is not the same in both x and y directions. This can simply be corrected by adjustment of the resultant image aspect ratio. For the purposes of this paper, we show the "raw" images produced by the device to illustrate its potential. Table I shows a comparison of QuantiCam with other SPAD imagers with per-pixel TDC. Our device is integrated into the most advanced CMOS technology node resulting in the smallest TDC and pixel pitch while at the same time providing the highest fill factor after microlensing. The technology node also makes it possible to achieve a finer time resolution at a comparable energy efficiency FoM to other sensors. The 12-bit dynamic range has been chosen to allow the TDC to cover practical time ranges for common organic fluorescent dyes, as well as the outdoor time of flight ranges. The low DCR of 25 Hz at 1.5-V excess bias is at a practical level for microscopy and could readily be improved by cooling. Time gating the TDC in a range of interest where the signal is located can further reduce the impact of DCR on the signalto-noise ratio. The percentage of hot pixels is considerably higher than other solid-state low light imaging technologies for microscopy and still represents an impediment to the use of SPAD imagers in these applications. Average power consumption of the sensor is extremely low, as the TDCs are only active at a low duty cycle (typically <0.1%). On the other hand, the peak power consumption of such a GRO-based architecture is potentially very high (Watts) in the case that all the TDCs become active in the same instant. Such a scenario which occurs if the laser is directly incident on the sensor is, however, not a relevant microscopy use case.
V. CONCLUSION
Advanced nanometer CMOS nodes provide TDC pixels with practical pitch and fill factor for high-resolution imaging. The sensor enables parallel TCSPC and multi-exponential FLIM at two orders of magnitude faster acquisition rates than scanning systems. Enhanced PDE and low DCR offer competitive imaging performance with wide-field microscopy cameras at far superior time resolution.
ACKNOWLEDGMENT
The authors would like to thank T. A. Abbas and R. Walker for assistance in SPAD layout and chip finishing, as well as F. Zanella at CSEM, Muttenz, Switzerland, for microlens design. They would also like to thank STMicroelectronics within the ENIAC POLIS Project, for chip fabrication. Richard Hirsch has more than 15 years' experience in FPGA and embedded systems development. He has been working at HORIBA Scientific, Glasgow, U.K., as an Electrical Engineer, developing TCSPC and FLIM systems.
David McLoskey has more than thirty years' experience of developing commercial photon counting and timing instrumentation for spectroscopy and other applications. He is currently the Managing Director of HORIBA Jobin Yvon IBH Ltd., Glasgow, U.K. 
Philip Yip

