Abstract: Many demanding photonic applications require the acquisition of images at very low light-level conditions and at high speed. Advanced imagers available on the market are generally not able to provide both performances in one detector. We present a 2-D imager based on a 32 Â 32 array of Bsmart pixels,[ each comprising a single-photon avalanche diode detector, an analog front end, and a digital processing electronics, which provides single-photon sensitivity, high electronic noise immunity, and high readout speed. The imager can be operated at a maximum of about 100 000 frame/s with negligible blind time between frames, provides high Photon-Detection Efficiency in the visible range and dynamic range, and low Dark-Counting Rate, even at room temperature. To easily integrate the imager into different applications, we developed a complete single-photon camera system, which fully operates the array simply through a USB 2.0 link and user-friendly software to configure camera parameters and operating modalities, as well as to perform readout.
Introduction
Many applications require the acquisition of very weak optical signals, generally composed by few photons mostly in the visible and near infrared (400 nm-850 nm) wavelength range. In order to detect such low intensity signals, a very low noise detector is necessary. In photodetection systems, sensitivity is mostly limited by the noise introduced by the very first electronic stage sensing the detectors. If single-photon sensitivity is required, either a drastic decrease of the front-end electronic noise or an extremely long integration time is required. Since the latter is not generally compatible with most applications, efforts must aim to reduce circuit noise or to somehow amplify the electric signal generated by the detector before it reaches the first noisy electronic component. The typical example is the Photo-Multiplier Tube (PMT), where a series of dynodes are exploited to provide an internal mechanism to amplify the single photoelectron into a macroscopic signal of thousands of electrons. The disadvantages of PMTs reside into the limited number of independent Bpixels[ (i.e., spatial resolution), its overall dimensions, and the required high operating voltages (some kV). Micro-Channel Plates (MCPs) are composed by many Bchannels,[ but as PMTs, they suffer from magnetic field interferences, require high voltages, and are fragile devices.
A solid-state device characterized by an internal amplification process is the Avalanche PhotoDiode (APD), which is a p-n junction that is reversely biased slightly below breakdown to get an amplification which boosts the photo-generated electron-hole pair signal. Unfortunately, the obtained amplification is neither high enough (the amplification is generally much lower than 1000) nor noiseless (excess noise) thus preventing the detection of single photons.
Imagers manufactured in standard charge-coupled device (CCD) [1] and Complementary MetalOxide-Semiconductor (CMOS) Active Pixel Sensor (APS) [2] technologies are usually organized as arrays of pixels, each one delivering an Banalog[ electrical signal proportional to the light incoming on the detector, during defined integration time slots. Usually, the best sensitivity is attained by CCD imagers, where carriers photogenerated within the photosensitive area are transported over the entire CCD line, eventually reaching the readout node in a serial mode, one photogenerated packet of carriers after the other. This is the reason why CCD are typically slow (low frame-rates) and why there is usually a delay between the arrival of the first and the last packet. In the readout node, a charge-to-voltage conversion is performed before providing such analog information to the off-chip electronics.
Advanced CCDs can reach single-photon sensitivity, but their speed is fundamentally limited by the way the CCD is read out and noise. In the Electron-Multiplying CCD (EM-CCD) [3] , [4] , internal gain is exploited through impact ionization within a multiplication register added to the output register, with a total gain of about 1000-2000. The output register bandwidth is increased to increase the frame rate, but it comes at the expense of increased noise floor. Although single-photon counting may be possible, the ultimate limit depends on amplified clock-induced charge.
Single-Photon Avalanche Diodes
Single-Photon Avalanche Diodes (SPADs) [5] , [6] are p-n junctions that are reversely biased well above breakdown. In that way, the detector works in the so-called Geiger-mode, as the absorption of a single photon is enough to generate a macroscopic (tens of mA), standard, and easily detectable current. Simple analog or digital electronics can be used to detect the avalanche triggering, and no readout noises affect the measure [7] , [8] . Ignitions caused by thermal generation or tunneling effects within the semiconductor are called dark counts, and their variance represent the SPAD intrinsic noise. Moreover, ignitions due to carriers trapped first (during an avalanche current flow) and then released by trapping centers are called afterpulses and impair detection linearity and boost detector noise [5] , [6] .
The triggered avalanche is a self-sustaining phenomenon; therefore, a proper front-end electronics, which quenches and resets the detector, is necessary in order to operate the SPAD. The front-end electronics is generally referred as Bquenching[ circuit [5] , [8] . State-of-the-art SPADs are fabricated in custom technologies, have spectral efficiency tailored to the wavelength of interest, and provide the best performances in terms of dark counts and afterpulsing. Unfortunately, the process used to make custom SPADs cannot be easily employed to integrate on-chip microelectronics too. Therefore, in SPAD arrays fabricated with custom technologies, the sensor chip must be connected to off-chip quenching electronics. Such bonding wires and routing add parasitic capacitance that increase avalanche charge [9] , detector afterpulsing, optical crosstalk, and power consumption. Eventually, SPAD arrays in custom technologies are limited to a maximum number of tens or hundreds of pixels at most.
In recent years, some groups reported on SPADs fabricated in standard high-voltage [10] , [11] or even low-voltage [16] CMOS processes with sufficiently high degree of purity. Such CMOS compatibility opened the way to the development of arrays of SPADs and to the monolithic integration of SPADs with quenching/reset circuits [17] and with additional electronics [18] . Fig. 1 shows the cross section of a CMOS SPAD fabricated in standard 0.35 m high-voltage CMOS technology, based on a 0.8-m structure previously reported in [18] . The active p þ Àn junction is formed between an n-well (e.g., a standard high-voltage p-type MOS (PMOS) well) and a shallow p þ implantation (e.g., a PMOS source/drain region). A deeper diffused guard-ring (e.g., a high-voltage n-type MOS well) raises the breakdown voltage at the edge of the junction far above that of the useful active area ðV BD Þ. In our CMOS technology, the former is 60 V, while the latter is 24 V. We chose an SPAD diameter of 20 m, which provides a good tradeoff between low dark counts (which increase with the depleted volume) and large active area. The p-substrate is shared with the electronics; hence, it must be kept to ground, in order to avoid direct biasing of the p-substrate/n-well junction.
Single-Photon Avalanche Diode Array
Early CMOS SPAD arrays reported in the literature were either composed by a relatively low number of pixels operated in parallel or by a quite large number of multiplexed detectors [11] , [19] . Both solutions are unsuitable for most applications. In particular, a multiplexed access impairs the high-frame rate capability of SPAD arrays. We present an expandable architecture for photoncounting arrays, based on compact smart pixels comprising both detector and electronics. In every pixel, there is a 20-m CMOS SPAD, which is operated by a very compact quenching circuit we designed in order to maximize the performances while reducing the electronics overall dimensions. All array pixels are read out in parallel, with negligible (20 ns) deadtime among integration time-slots (i.e., between adjacent frames).
Compared with EM-CCDs, the proposed SPAD array has the major drawback of larger pixel dimension (100 m instead of few micrometers), lower fill-factor (if no microlens arrays are employed), and a limited number of pixels (some thousands instead of millions). However, the advantages are real single-photon sensitivity, all-digital in-pixel processing, global-shutter acquisition mode, no readout noise, and very high frame-rate.
Pixel Architecture
We conceived the pixel with three goals: to maximize the detector active area; to reduce transistor count and sizes; to act as the basic building block for easily expandable array architectures. Each pixel of the array comprises both the SPAD detector and the analog front end and the digital counting electronics. Fig. 2(a) shows a simplified diagram of the pixel. The in-pixel SPAD's cathode is connected to a high-voltage V REV ¼ V BD þ V EX , where V EX is the SPAD excess voltage. Depending on the application, V EX can range from 3 V up to 6 V. All pixel electronics work with a single supply of 3.3 V. The SPAD anode is connected to a variable-load quenching circuit (VLQC) [20] , which senses the avalanches, quenches, and then resets the SPAD. The reason why we decided to operate the SPAD from the anode can be understood looking at Fig. 1 ; the wide depleted region across the p-substrate/n-well junction makes the cathode parasitic capacitance definitely larger than the anode one. Since the capacitance connected to the terminal used to operate the SPAD sets the charge flowing through the junction during every avalanche process [9] , the smaller the terminal capacitance, the shorter the avalanche duration, and, hence, the related issues. Moreover, in order to further minimize the parasitic capacitance, the VLQC is designed with the minimum number of transistors connected to the SPAD [20] .
A Linear Feedback Shift Register (LFSR) counter follows the quenching circuit and is used to count detector ignitions, hence, the number of detected photons and dark counts. A latch-stage [also shown in Fig. 2(a) ] feeds the data bus and is employed to temporary store the photon-counts for reading out the array while the pixel starts acquiring a new frame (see next section). Fig. 3 shows the operations performed by the pixel every time the SPAD is ignited: (i) The VLQC senses the detector's ignition, swiftly quenches it, and delivers a pulse to the following electronics; then, after an user-selectable hold-off time (40 ns-800 ns), it resets the SPAD; (ii) the ancillary electronics forms a standard pulse suitable to increment the 8-bit counter. Fig. 4 shows the operations eventually performed at the end of every frame: (iii) The array global electronics generates a Stop pulse, which transfers the number of counted photons to the temporary storage register; (iv) finally, the global electronics generates a Start pulse which resets the counter, thus marking the beginning of the next frame.
Pixel Operations

Array Architecture
The pixel described in the previous section is actually a basic building-block for SPAD arrays with many pixels. We devised an array architecture capable of fully parallel operation of all pixels, thus allowing free-running imaging acquisition at very high frame-rates. . Frame acquisition begins with a global Start pulse, sent to all array pixels, which resets the pixels internal counter. During the integration frame time, all pixels work in photon-counting mode and operate independently; each counter accumulates the number of photons (and dark counts) detected by the corresponding SPAD. At the end of the frame, a global Stop pulse is applied to all pixels to store the pixel actual count to the pixel memory register [see The data of the previous frame are read out during the acquisition of the new frame. The array global electronics sequentially addresses all pixels, as shown in Fig. 6 , as in a standard memory array. Compared with CCDs, the impact of data transfer overhead to the maximum achievable frame rate is dramatically reduced. The minimum integration time (the maximum frame rate) is set by the time necessary to download all data. With a system clock of 100 MHz, each pixel is read out in 10 ns, and hence, a complete 32 Â 32 pixels array (1024 pixels) is read out in just about 10 s, giving a frame rate of about 100 000 frames/s. Such an acquisition scheme acts as an electronic global shutter, which avoids any image smearing or rolling shutter related problem. Note that the proposed architecture allows reaching even faster frame rates by simply placing more output 8-bit data buses, instead of only one, as in the fabricated chip. In fact, with four data buses instead of just one, the frame-rate will increase four times.
Note that the pin-out count is very limited: The 32 Â 32 SPAD array with 1024 pixels requires just the single 8-bit data bus, few synchronization signals (master clock, frame start, frame stop), and the power supplies þ3.3 V for the logic electronics and V H to bias the SPADs.
Gate Modes
Some experiments require the possibility to gate the imager on and off very quickly. In the SPAD camera, we implemented two different gating modes.
Software Gating
The software gating is shown in Fig. 7(a) . Counts are integrated by the counter only when the GATE signal is at a logic high level. Basically, an electronic switch enables the VLQC output signal to reach the counter. The term Bsoftware[ refers to the fact that the counter is disabled during the gate-off time period, even though the SPAD keeps running, i.e., gets ignited due to incoming photons or dark counts, since both the SPAD and the front-end electronics are always enabled.
Hardware Gating
The hardware gating works directly on the SPADs bias voltage: During the gate-on period, the voltage is kept above breakdown at V H ¼ V BD þ V EX , while during gate-off, V H is swiftly pulled and kept below breakdown. Therefore, no avalanches can be ignited during gate-off. Both raising and falling edges last few nanoseconds; hence, it is possible to promptly gate-on the imager right after a laser pulse excitation, without the problem of saturating the array with the intense laser stimulus. Note that this is not an issue in fluorescence measurements, since excitation light can be separated from emitted light through proper wavelength filtering. Instead, when excitation and emission Fig. 6 . Array operations during readout, data is read out sequentially by selecting the storage registers in a row-by-column addressing scheme. Note a single 8-bit data bus is output from the imager.
wavelengths are the same (as in photon migration measurements and near-infrared spectroscopy), this problem can be effectively solved with fast-gating. Fig. 7(b) shows the circuit used for hardware gating the SPAD imager. The bias-tee component is composed by an inductor and a capacitor. When the fast pulse generator output is still, the capacitor is an open circuit, the inductor behaves as a short-circuit thus the power-supply V H1 feeds the imager through the V H pin. Since V H1 is lower than the SPAD breakdown voltage V BD , photons are not able to trigger the array pixels. On the contrary, when the pulse generator applies the fast rising edges, the capacitor acts as a constant voltage level shifter, while the inductor becomes an open circuit, the result being that the voltage at the bias-tee output is the sum V H1 þ V H2 . Hence, the SPAD is enabled above breakdown. An impedance-matched transmission line terminated with a 50-resistor avoids reflections of the steep rising edge. The capacitor C B filters the voltage V H1 at the resistor terminal.
During both gate-on and gate-off the voltage V H tends to decay to V H1 , depending on the bias tee capacitor size and imager current consumption. Therefore, with the hardware gate, the gate-on time width cannot be arbitrarily long, and the gate-off duration should be long enough to allow the voltage V H to recover to V H1 .
Gate-Mode Selection Criteria
There are reasons to prefer one modality, depending on the specific application. The parameters to be taken into account are the gate-on duration T on and the gate duty-cycle (defined as the ratio between the gate-on time duration and gate period), compared with the deadtime T dead , i.e., the time the SPAD is held off after each ignition by the electronics.
Very Short Gate-On and High Gate-Off Durations
The situation when T on ( T dead is the best scenario for hardware gating. That is true for several reasons: first of all because with software gating, a photon that triggered the SPAD during gate-off may keep the system Bdead[ during the adjacent gate-on window, thus decreasing the overall detection efficiency. In applications such as functional spectroscopy [12] , but also in optical mammography [13] and molecular imaging [14] , [15] , there is the need to discriminate between early-(often due to reflections and scattering in shallow layers of the sample under investigation) and late-(which traveled much deeper in the sample) arriving photons to increase the contrast of measured scattering and absorbing perturbations within the sample. The strong presence of the so-called early photons[ severally limits system performance since their number overcomes (often of a few orders of magnitude) that of the Blate photons[ [21] . Thanks to hardware gating, the SPAD cannot be triggered when gated-off; hence, every gate-on window is indeed available for detecting photons, as shown in Fig. 8(a) . Moreover, the avalanche triggered during one gate-on window does not mask the next gate-on window because of the longer gate-off duration.
A further advantage is that the detection system gets completely rid of afterpulsing. Afterpulsing is a detrimental effect due to charge trapping in the detector depleted region during the avalanche current flow. When those charges are released after the avalanche ends, they can trigger again the SPAD. To avoid such a problem the common solution is to enforce a long deadtime after each SPAD ignition, thus increasing the probability of trap release during the hold-off period, hence lowering the probability of afterpulses. A deadtime that is too long is impractical with software gating, because the longer the deadtime, the higher the probability of blinding gate-on periods. Instead, when the SPAD is hardware gated with long gate-off durations between gate-on windows, trapped charges have sufficient time to get released without impairing the next gate-on windows.
Very Short Gate-On and Gate Off Durations
The situation when again T on ( T dead but the duty-cycle gets close to 50% (i.e., T off gets similar to T on ) yields to a worsening effect of deadtime over the following T on windows, as shown in Fig. 8(b) . The longer the deadtime, the higher the number of gate-on windows that are kept Bdead[ by the previous SPAD ignition. For software gating, this applies for photons detected during both gate-on and gate-off periods. Instead, for hardware gating, this applies only for photons detected during the gate-on periods. In both situations, it is not possible to lengthen the deadtime very much to strongly cancel afterpulsing without causing a decrease in the overall detection efficiency.
Very Long Gate-On Duration
If the deadtime is negligible compared with the gate-on period, i.e., T on ) T dead , the software gating is the best solution. Photons detected during the gate-off period are not counted, and the enforced deadtime can be long enough to definitely reduce afterpulsing. Moreover, hardware gating is not suitable, because of the discharge of the bias-tee capacitor [shown in Fig. 7(b) ] when multiple pixels get ignited during the long T on duration: The effective bias applied to the SPAD imager would no longer have the standard value, as shown in Fig. 9 . 
Single-Photon Avalanche Diode Camera
We developed a complete camera module with system electronics, optics, and software to be used in high-speed imaging applications. As shown in Fig. 10(a) the SPAD imager is read out by a fast Field Programmable Gate Array (FPGA) device that is able to communicate with the chip at 100 MHz and store data on a 32-Mbyte SDRAM memory. A Cypress USB 2.0 controller is in charge of communicating and transferring data to and from a remote PC at high speed (up to 480 Mbit/s). Power supply voltages are all derived from the USB link, with no need for external power supplies. The imager biasing voltage V H (about 29 V, depending on the desired excess voltage) is generated from the 3.3 V by means of a DC/DC converter. FPGA, RAM memory, and a USB controller are placed on a high-performance FPGA-development board by OpalKelly (XEM3010), whereas the imager and the power supply electronics are housed in piggy-back-mounted board; the two boards communicates through high-density connectors [see Fig. 10(b) ].
The imager chip is glued into a 52-pin plastic leaded chip carrier package, which is in turn connected to the rest of the circuit through a socket. The camera can be provided with a standard C-Mount connector or with off-the-shelf optical objectives. Flexibility is one of the key features of the SPAD camera, in order to allow high configurability to different applications. On the camera side, depending on the target application, the Very high speed integrated circuit Hardware Description Language (VHDL) of the FPGA can be modified to perform the required imager setting and data processing. Also on the computer side, we developed an interface software that allows the user to acquire and present only the important information out of the large data sets the camera can acquire. Eventually, the right combination of FPGA firmware code and PC software can be developed to maximize the performances of the camera in any application. In order to address the most typical experimental needs, we developed three firmwaresoftware modalities.
Single-Shot Modality
In this modality, the camera works at the fastest speed and data are stored into the 32-MB RAM memory available on-board. No communication with the PC is occurring during data acquisition, as the camera-to-computer link speed would not allow the camera to work at top performances. In fact, data-rate from the imager can reach 102 Mbytes/s (one byte for each one of the 1024 pixels at 100 000 frames/s), which is several times the maximum USB 2.0 speed. The user is required to configure the integration time (frame duration) and the number of frames to acquire (up to the RAM memory capabilities). The PC retrieves all data when the acquisition is completed.
Free-Running Modality
The free-running modality is thought of for experiments where the data to be acquired are larger than the 32 MB available on-board; hence, a camera-to-computer communication is required during image acquisition. In this scenario, the USB communication is the bottleneck determining the overall system speed: On average, if the imager generates more data than the link can sustain, data will be lost. In addition, the USB link does not guarantee a constant isochronous data-rate, but as opposed to this, it is characterized by high-speed data bursts, followed by quite long delays. The RAM memory is used to solve the latter problem, working like a data buffer, while the former problem can be solved only by reducing the generated data-rate readout from the imager, which is given by
where B is the number of bits readout per pixel, N R and N C are the number of rows and columns, and T INT is the integration time (frame duration, i.e., the inverse of the frame-rate). In order to avoid data loss, DR IMAGER must be lower than DR LINK , i.e., the USB average data-rate. We experimentally found that the maximum DR LINK the USB connection on the OpalKelly board can sustain is about 25 Mbytes/s. Therefore, the only way to reduce the imager data-rate is to either lengthen T INT (decrease the imager frame rate) or to reduce B (read out less than 8 bits per pixel), N C , or N R . However, the integration time is generally application-dependent; hence, it should not be modified. Instead, in applications requiring short integration times or with very faint optical signals, a dynamic range of 8 bits (corresponding to a maximum count of 255 photons per frame) may be too much. In some cases 4 or even 2 bits will suffice. For this reason, the camera can be configured via software from the computer to use a lower number of bits: The pixel internal counter is always 8 bits, but depending on the internal FPGA routing, it is possible to collect only some of the least significant bits, thus avoiding wasting bandwidth to send a lot of useless zeros. For example, the 32 Â 32 SPAD imager running at 100 000 frames/s can be operated at the DR LINK limit of 25 Mbytes/s by reading out just 2 bits, corresponding to an average photon flux of 400 000 photons/s. Such a feature is also available for the single-shot modality, and it allows extension of the maximum number of frames, which can be packed into the 32-MB RAM memory. Finally, the on-chip global electronics improves imager flexibility by allowing the user to arbitrary select any desired Region-of-Interest (ROI) of the overall array. This makes it possible to further reduce imager data-rate as well as to increase the maximum camera frame-rate, which linearly depends on the number of pixels in the ROI. The camera configuration is performed again through the computer software: Depending on the pixels the user wants to read-out, a proper setup is loaded into the chip at system startup, as shown in Fig. 11(a) .
Live Modality
Having an experimental setup properly running can sometimes be a time-consuming and difficult task: A perfect alignment of optical components plays a fundamental role in the overall experiment performances. For this reason, the live modality operates the SPAD camera as a low-speed realtime imaging system (mostly like an ultra-sensitive webcam), thus allowing the user to have a Blive[ look at what the imager is focused on. As in the free-running modality, during the acquisition, data are transferred to the computer, but single frames are treated in a different way. Since a slow (e.g., movie-like, at about 25 frames/s) frame-rate is now required, several acquisitions, are accumulated into the FPGA in order to increase the movie signal-to-noise ratio. The user can select how many frames the camera has to accumulate as well as the movie frame-rate. No data are saved on the computer.
This in-hardware accumulation of images is not peculiar of the live modality and is available in all the working modalities described so far as an efficient way to improve the imager dynamic range, as discussed in the next section.
Dynamic Range
One of the problems that may impair photodetection is pixel saturation. For instance, in CCDs, a maximum number of electrons can be stored into the potential well, and as this limit is approached, electrons begin to escape the well, thus producing image artifacts such as saturation and blooming effects. In CMOS APS, saturation is reached when the photodiode junction capacitance gets completely discharged by the photocurrent. For SPAD pixels, the saturation is set by the SPAD detector itself and by the following counting electronics.
Concerning the detector, the deadtime T dead imposed by the quenching electronics after every ignition, results in a maximum saturated counting rate equal to 1=T dead . SPAD saturation may have a strong effect on measurement linearity when photon flux approaches the saturation level. In fact, if the measured photon-rate is N meas , a better estimation of the real photon-rate N real is
For example, with a deadtime of 40 ns, giving a saturated maximum count of 25 MHz, if the SPAD gets ignited 1 MHz (counts/s), the real average photon flux should be 1.042 MHz. The corrective factor gets significant, let's say 5%, when the measured count approach 5% of 1=T dead , i.e., at about 1.25 MHz. Higher single-photon counting rate can be attained only by reducing the deadtime.
Concerning the counting electronics, the maximum measurable count is limited by the number of available bits of the in-pixel counter. In our case, the 8 bit counter limits the count to 2 8 À 1 ¼ 255. If the application requires long integration times, the maximum light intensity is going to be limited. However, with the SPAD camera, there is an easy way to solve this problem: Since the camera is designed to work at very high frame-rates, the solution is to keep integrating the image at the maximum speed (shorter frame-time) and then to reconstruct the requested long frame duration by accumulating several acquired images, as shown in Fig. 11(b) . Such an accumulation can be performed off-board on the remote computer or directly on-board, at hardware level in the FPGA, depending on user's needs. Because of the very short minimum acquisition durations (down to 10 s in the presented SPAD imager), the corresponding increase in measured dynamic range can be dramatic. For a required frame-rate of 100 frames/s (i.e., a frame-time of 10 ms), the integration of 1000 frames of 8 bit each will bring to a dynamic range of 255 000, i.e., of 108 dB.
In principle, the same strategy could be applied to CCDs and CMOS APS sensors; however, for the technique to work properly, it is necessary to have no-read-out noise together with negligible deadtime between consecutive acquisitions and, possibly, also very short minimum acquisition time in order to boost the achievable dynamic range improvement.
Camera Characterization
We fabricated the imager using a standard 0.35-m high-voltage CMOS technology. The 32 Â 32 SPAD array overall dimension is 3.5 mm Â 3.5 mm, whereas the active area measures 3.2 mm Â 3.2 mm. The chip layout is shown in Fig. 12 . The high-voltage required to bias the SPAD above breakdown ranges from 27 V to 30 V, depending on the required excess bias, given the SPADs breakdown voltage of 24.6 V. The rest of the on-chip electronics work at 3.3 V.
The higher the excess bias, the higher the pixels Photon Detection Efficiency (PDE) and DarkCounting Rate (DCR) [1] , [6] - [8] . In fact, Fig. 13(a) shows the typical PDE of the pixels, measured at different excess voltages, while Fig. 13(b) shows the cumulative distribution for dark-counting rates of all the 1024 pixels. Measurements were performed at room temperature and at 4 V excess bias. As can be seen in Fig. 13 , about 75% of pixels have DCR lower than 4 kHz, while 92% are below 100 kHz. This means that, when running the camera at 100 000 frames/s, 92% of pixels will count (on average) at most one dark count (out of the available 255 counts every integration frame-time). Even the Bhottest[ pixel with 700 kHz will accumulate a maximum of seven dark counts for every 10-s frame-time.
We characterized also crosstalk among pixels, which can be either electrical (due to spurious capacitive and substrate coupling among pixels electronics) or optical (due to photons emitted by hot-carriers from the ignited SPAD that could trigger neighboring SPADs). We measured a residual crosstalk probability of 3:53 Â 10 À5 between the closest pixels (100 m apart) and 4:33 Â 10 À6 for those 200 m apart, i.e., much less than one spurious ignition every 100 000 ignitions of the neighboring SPAD. Such value is indeed negligible for most applications.
Applications
Low Light Imaging
The high-sensitivity combined with the high-speed of the SPAD camera make it an ideal instrument in security surveillance applications and safety traffic monitoring applications, where no illumination light is present, e.g., at night. In order to test the camera capabilities, we acquired several pictures at night on in a poorly illuminated street. Fig. 14(a) shows one frame acquired by the SPAD camera with a frame-time duration of 2 ms when running the real-time movie at 500 frames/s. It is possible to recognize a car passing by; the three bright spots are the car's tail lights and the license plate. 
Fluorescence Correlation Spectroscopy (FCS)
The basic concept in single-molecule FCS is to excite and collect light from a very small volume of solution and to work in a concentration regime, which results in rare burst-like events corresponding to the transit of a single molecule.
In order to work properly, this technique requires high-sensitivity (possibly at the single-photon level) photodetectors and long acquisition times in order to accumulate many events and achieve the required statistical accuracy. Although long acquisition times may be acceptable in basic research, they reduce the interest in the technique for clinical or biopharmaceutical environments. An obvious way to overcome such limitation is to acquire data from multiple identical spots. The SPAD camera presented in this paper is an ideal candidate for such an application, combining both extremesensitive detectors with possibility of parallel acquisition (up to 1024 parallel detection channels) [22] .
Other Applications
Any applications demanding single-photon sensitivity in the visible wavelength range, combined with high imaging speed find in the SPAD camera a perfect candidate. Among them, it is possible to mention Adaptive Optics, which is an application where the SPAD array have already been tested and used in [23] . Also, Quantum Imaging [25] , Confocal Microscopy [26] , and 3-D optical ranging [27] will take advantage of SPAD arrays, as proved by preliminary measurements.
Future Developments
At present, the SPAD camera detector is equipped with no antireflection coating: further improvements in PDE can be expected with proper tailoring of the air-silicon interface of the chip. Furthermore, at present, the imager fill-factor is just 3.14%, due to pixel diameters of 20 m and 100 m pitch.
It is worth saying that not all applications require high pixel fill-factor; for instance, biological applications requiring confocal imaging or multi-spot FCS [22] require that pixels be kept sufficiently apart to work properly. Instead, many other applications require a fill-factor as close as possible to 100%. To solve this limitation, many approaches are available. The first trivial possibility is to somehow reduce the dimension of the pixel, i.e., of the in-pixel electronics. However, if the goal is to keep increasing the complexity and Bsmartness[ of in-pixel processing, there are only a few available degrees of freedom. For instance, the number of bits can be reduced, but for the same chip technology, only a 5% filling-factor can be achieved. For a given technology, SPAD quality decreases for larger diameter as a result of the bigger active volume, which causes higher darkcounting rates and lower yield. With a practically feasible 50-m diameter SPAD, the filling factor would be 16% for the same amount of electronics.
The use of a more scaled electronics is not successful because if the electronics become smaller, the quality of SPADs will greatly decrease, resulting in higher dark-counting rates and afterpulsing [28] .
Also, advanced array structures where SPADs and electronics lie on different chips then bonded together by the means of 3-D vias are not able to provide 100% of fill-factor, because some basic front-end electronics and routings are still necessary on the SPADs chip. High processing costs make this solution still difficult to adopt.
The only real solution to the fill-factor issue is to adopt an array of microlenses in combination with the imager. The achievable maximum filling-factor can be higher than 80%, and lenses can be tailored to particular wavelengths to achieve even better results [29] .
Conclusion
We designed and fabricated a high-speed single-photon camera based on a monolithic array of 32-by-32 Bsmart pixels[ fabricated in a standard high-voltage 0.35-m CMOS technology. Every pixel comprises all the circuitry required to perform photon counting, from the 20-m diameter SPAD detector to an analog electronic front end and a digital in-pixel counter and memory. Each pixel is therefore a completely independent photon-counting channel. The camera has been completely characterized and shows a Photon-Detection Efficiency topping 43% at 5 V excess-bias, moderate Dark-Counting Rates at room temperature, and good yield. The imager can be easily gated-on and off by either a standard logic input (software-gate) or by driving the SPAD bias voltage (hardware-gate).
The SPAD camera comprises all the required electronics, optics, USB link to a remote computer, and easy-to-use software. By means of software, it is possible to set the camera integration time, dynamic range, active area, and many other parameters of interest. The integration time ranges from few tens of nanoseconds to milliseconds, while the imager architecture allows a maximum frame-rate of about 100 000 frame/s. Eventually, the SPAD camera is a general-purpose imager for single-photon, high-speed, applications such as confocal microscopy, biological imaging, adaptive optics in astrophysics, 3-D ranging, and low-light spectroscopy.
