Abstract-This paper presents a CMOS array of 64 48 pixels capable of detecting single photons with timing accuracies better than 80ps. Upon photon arrival, a digital pulse is generated and routed by an event-driven digital readout scheme to a specific location for further processing. This method allows non-sequential row-wise and simultaneous column-wise detection while preserving photon arrival timing information. The readout scheme is scalable and it is shown to have minimal impact on timing accuracy. Time-correlated fluorescence spectroscopy and optical rangefinders are among the target applications for this technology.
INTRODUCTION
Traditionally, imaging with high timing accuracy has been critical in some specific devices, such as optical rangefinders and ultra-fast cameras. Recently, timecorrelated techniques have become widely used in imaging of molecular processes in physics and the life sciences. Examples of such trend are in fluorescence decay measurements [1] , Fluorescence Lifetime Imaging and Fluorescence Correlation Spectroscopy [2] , [3] , Förster Resonance Energy Transfer [4] , flow cytometry, etc.
Over the years, conventional imagers based on CCD and CMOS APS architectures have consistently improved speed and sensitivity [5] , [6] . However, new approaches are now required to ensure deep subnanosecond accuracy, even at low photon counts or in the presence of massive background illumination.
To cope with sensitivity and timing requirements, researchers have followed two main approaches. The first approach is based on the improvement of the noise characteristics of conventional CCD and CMOS APS circuits, often reducing the operating temperature or utilizing complex high-precision analog readout circuitries. The second is based to single photon detectors used in conjunction with a variety of imaging techniques, such as Time-Correlated Single Photon Counting (TCSPC). The latter approach is the focus of this paper.
II. SINGLE PHOTON DETECTORS
Non-solid-state devices, such as Photomultiplier Tubes (PMTs), have thus far been the detectors of choice, due to noise, dynamic range, and timing performance [7] . However, cost and size considerations, as well as the need for cumbersome optical scanning, have prevented wide adoption of PMT based bio-imagers. To avoid scanning, much larger arrays of miniaturized single photon detectors are necessary. While solid-state single photon detectors have been known for decades, only recently researchers have succeeded in designing compact single photon pixels in CMOS [8] , [9] .
These sensors are generally based on a device known as Single Photon Avalanche Diode (SPAD). A SPAD is often implemented as a p-n junction reverse biased above breakdown voltage, thus causing the internal gain to become virtually infinite, thereby ensuring Geiger mode of operation. In Geiger mode, when a photon impinges the semiconductor surface, it may trigger an avalanche and cause a voltage pulse to be generated. This pulse can then be used to evaluate the impinging photon flux or to determine the exact arrival time of a photon.
The main limitation of SPAD arrays is the efficiency of accessing individual pixels. This is due to the dynamic nature of their output that cannot be stored as a charge or a voltage signal, as in CCD and CMOS APS architectures. In [9] for example, a sequential access is used, thus enabling to process only one pixel at a time. While optical scanning is eliminated, relatively low frame rates can be achieved. In sequential readout mode, an array of N rows and M columns can sustain a maximum frame rate of 1/ MN, where is the sum of the readout and exposure times of a pixel. For example, in a 32x32 array a maximum frame rate of 250fps can be sustained for a pixel exposure time of 4μs [10] .
In this paper we describe an integrated SPAD array of the next generation where sequential access has been replaced with column-wise parallel access and row-wise nonsequential event-driven readout. Details on the principle and implementation of SPADs are found in the literature [8] , [9] and will not be described further.
III. EVEN-DRIVEN READOUT ARCHITECTURE
In this architecture the column operates similarly to a digital bus, with a few differences. The operation of the bus assumes that the event associated with photon absorption has a relatively low probability. Thus, it can be handled one at a time independently. When a photon is absorbed in a given pixel, it may generate a pulse. This event in turn causes an asynchronous ownership request of the readout channel by the pixel. Thus, the corresponding pulse is pushed through T P onto "nOUT", a high impedance line forced high by a pull-up resistor. Simultaneously, an address associated with the pixel row is generated via T C(1) ,…,T C(N) and sent onto "nADDR". In order for every pixel to share the same interconnects, every address line also requires a pull-up device. See the pixel diagram shown in Fig. 1 At the bottom of the readout channel, the pulse associated with an impinging photon is used to latch the address "nADDR" associated with the firing pixel. The latching mechanism is shown in detail in Fig. 2 . The figure shows a complete column, comprising of a set of pull-up resistors, metal interconnect, and the latch at the bottom. Once address "ADDR" is secured, the pulse is routed to counters for flux evaluation or to time discriminators for generic timing analysis. The result of the evaluation is then saved in the appropriate memory space while "nRECH" signals availability of the channel at readout cycle completion, which includes the recharge of the SPAD in the pixel that fired.
The adjustable delay line in Fig. 2 is required to avoid setup time violations. The readout scheme has a built-in collision resolution mechanism to avoid bus contention issues. A column-wise finite state machine controls all the readout phases, while the time required for a readout cycle to complete is known as column dead time.
There are several advantages to this scheme. First, the readout is scalable both column-and row-wise at no or little hardware cost. Second, due to the sparsity of photon arrival distribution, processing may be performed ondemand, thus drastically reducing power consumption in low levels of illumination. Third, the scheme provides a mechanism for the control of some of the parameters associated with SPADs [9] , such as dead time and dynamic range, with a minimal impact on timing accuracy. 
IV. IMAGING SYSTEM ARCHITECTURE AND CHARACTERIZATION
To demonstrate the suitability of the approach and its scalability, a 64 48 SPAD array was implemented in a 0.8μm CMOS technology. The chip area is 7.7 3.6mm
2 . Fig. 3 shows the sensor photomicrograph. The complete imaging system prototype consists of the sensor chip, an ARM™ processor core embedded in an ALTERA Excalibur™ FPGA, volatile and non-volatile memories, communication links, and a power supply. Timeuncorrelated detection is performed using banks of 16-bit counters in the FPGA.
TCSPC mode of operation requires high-resolution time discrimination. In this prototype we have used a 20ps Time-to-Digital-Converter (TDC) designed by us in a 0.18μm CMOS technology. The FPGA also implemented the control of the chip. The timing accuracy of the sensor is characterized in terms a histogram known as Instrument Response Function (IRF). The IRF represent a statistic of the arrival time of a photon contained in a pulse of light. Such pulse is generated by a high-performance laser source capable of generating femtosecond pulses. The jitter is quantified by means of the Full Width at Half Maximum (FWHM) of the IRF.
FIG. 3. PHOTOMICROGRAPH AND BLOCKS OF THE SENSOR CHIP (LEFT). PHOTOMICROGRAPH OF THE PIXEL (RIGHT).
The measurement was obtained illuminating the sensor with a Ti:Sapphire laser (COHERENT, 400 nm, 4.7 MHz) and evaluating the time difference with a TCC900 photon counting card (Edinburgh Instruments). Fig. 4 shows a linear plot of the IRF resulting from the measurement. The laser power was adjusted so as to result in a frequency response of the sensor of about 10kHz. The noise performance of the sensor was evaluated through the intrinsic Dark Count Rate (DCR). The DCR corresponds to the average frequency at which a SPAD fires when not exposed to light. The measurement was done on all the pixels to characterize the complete array in space. The result is shown in the plot of Fig. 6 . Fig. 7 shows a statistics of the DCR across the entire chip. While the average of the chip is approximately 370Hz, 91.5% of the pixels exhibit a DCR of less than 100Hz and 21.7% less than 10Hz. Due to the digital output of the chip, it becomes trivial to utilize these statistics to select the best performing pixels through simple hardware and/or software filters.
FIG. 7. DCR S TATISTICS (HISTOGRAM AND CUMULATIVE) FOR THE ENTIRE SENSOR CHIP.
The sensitivity if the sensor is measured in terms of the probability that an impinging photon is detected given a known photon flux. This parameter is known as Photon Detection Probability (PDP). The measurement was performed using a Oriel MS257 monochromator. The plot of Fig. 8 shows the PDP as a function of wavelength. The maximum is reached in a large spectrum ranging from 430 and 550nm. Near infrared and UV radiation exhibits PDPs varying from 1 to10% . 
