Abstract-
. Photomicrograph of the SPAD-based QVGA imager. Inset: zoomed-in view of the pixel array and layout sketch of the shared-well SPAD structure. steady progress, toward the photon-counting regime, through increasing pixel conversion gain and optimized readout schemes [6] . Although CISs with sub-one electron (1e − ) noise have been reported recently [7] , [8] , the conditions for single-photon counting (SPC) (read noise of 0.3e − [9] down to 0.15e − [6] ) have yet to be achieved. Fossum [10] projected these CIS trends of subelectron read noise, submicrometer pixel pitch (PP), multimegapixel resolution, and highly oversampled frame rates toward a quanta image sensor (QIS). The pixel concept of the QIS (jot) demands a nanoscale single-photon sensitive device exhibiting a binary state. Although SPADs were considered as jot candidates, the device-scaling issues toward a nanometer pitch precluded further investigation [9] .
In this paper, we demonstrate that the analog pixel electronics and scalable shared-well SPAD devices can be combined to assemble high-resolution, high-FF pixel arrays exhibiting photon shot-noise-limited statistics. In a binary operation mode, the pixel array offers a look-ahead to the properties of future QIS and a number of recent theoretical results are confirmed experimentally [9] . We present a detailed overview of a 320 × 240, 8-μm pitch, 26.8% FF SPAD-based image sensor fabricated in ST Microelectronics 130-nm imaging CMOS technology (Fig. 1) . In this paper, we provide a more detailed treatment to the sensor than the first presented in [11] . We begin with a review of SPAD-based image sensors and 0018-9383 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. explain the dual modes of operation of this sensor's SPAD pixel as both an analog counter and a single-bit memory. The device architecture and operation of the readout are described. Finally, the SPC results are shown along with fast QIS image capture.
II. SPAD IMAGE SENSOR PIXELS
Rochas [12] and Rochas et al. [13] first reported the SPADs in CMOS and subsequently the first fully integrated CMOS SPAD array. Many SPAD sensor architectures have been demonstrated since, in different forms (single-point detectors, line sensors, large array photomultipliers, and image sensors) with a plethora of pixel designs. Unlike CISs with the active pixel sensor (APS)-based readout, no single SPADbased pixel architecture has become dominant. Such image sensors have a different set of design constraints to SPAD-based digital silicon photomultipliers (DSiPMs) or line sensors. All DSiPM pixels and some line sensor pixels have high-FF arrays by placing only quench/recharge and addressing circuitry in pixel, such as [14] [15] [16] [17] and so these are not further considered in this paper. We consider SPAD-based image or line sensors that have the additional in-pixel capability to capture, measure, and store a photonic event. A review of the three categories of SPAD imager pixels follows: 1) all-digital; 2) single-bit memory; and 3) analog. The pixels highlighted in this section are shown in Fig. 2 in terms of PP versus FF against this work. A range of SiPM pixels are plotted alongside for comparison.
A. All-Digital SPAD Pixels
SPADs, when connected through an inverter, become a true digital imaging pixel with an application specific time conversion or counting circuit, each photon immediately represented as a digital pulse whose leading edge signals the photon's time of arrival with a picosecond precision. The first demonstrated was a flash time-to-digital converter (TDC) pixel with a bump-bonded discrete SPAD [18] . The first fully monolithically integrated all-digital pixel for ToF was in a 60 × 48 array with 85-μm PP and 0.5% FF with two 8-b up/down counters [19] . Three architectures of digital SPAD pixel for time-resolved FLIM were developed concurrently [20] [21] [22] : 1) a flash TDC; 2) a time-to-analog converter (TAC) and analog counter with in-pixel single-slope ramp ADC; and 3) a gated-ring oscillator TDC. All three comprise a 32 × 32 array at 50-μm PP with 1.6% FF, with maximum 488 photons/pixel/s. The array was expanded to 160 × 128 in [23] . The first in-pixel delta-sigma TDC was implemented in a 44.65-μm PP at 3.1% FF for indirect time of flight (IToF) 3-D ranging at 128 × 96 resolution [2] . A number of TDC-based and counter pixels have recently been demonstrated at 150-μm PP with maximum 4% FF and low dark count rate (DCR) noise [24] [25] [26] [27] . In a monolithic image sensor, the area overhead of the in-pixel digital logic significantly limits the FF of these SPAD-based image sensors to a few percentages. This low FF can be partially mitigated by microlensing [28] yet the large pitch still remains, which precludes high-spatial resolution imaging arrays. Chip-stacking technology offers a promising solution to this limitation [29] .
B. Single-Bit Memory Pixels
The minimum digital in-pixel logic to capture one SPAD event is a single-bit dynamic or static memory. A SPAD event switches the state of the in-pixel memory within a time-gated exposure window. However, this limits the full well of the pixel to one event (a photon or a dark event) and requires an external frame store to build up a digital image. Such SPAD pixels offer the first practical step toward realizing the digital film sensor or QIS concept proposed in [9] and [10] . The first example of this time-gated binary pixel architecture combines an SPAD, an inverter, a time gate, and a static memory cell for timegated FLIM in a 128 × 128 array at 25-μm pitch [30] and scaled to a 512 × 128 array [31] . The 12T nMOS-only pixel uses an nMOS SRAM to avoid hot well spacing rules. The downside is static power consumption during operation. The SRAM consumption scales linearly with array size, making it impractical for very high-resolution imaging arrays.
C. Analog Pixels
Recent research has investigated two analog pixel approaches for high-resolution SPAD-based image sensors: 1) TAC pixels for time-correlated SPC imagers and 2) analog counters for time-gated SPC imagers [32] , [33] . Fig. 3 shows the two different methods of analog counting: 1) switched current source (SCS) and 2) charge transfer amplifier (CTA). A time-gated SPAD pixel using an SCS analog counter was reported for IToF [34] . The pixel was implemented in a line sensor format of 64×1 pixels, at a PP of 38 μm × 180 μm, 0.3% FF, with 25 CMOS transistors. The first example with ≤10 T in pixel using an SCS counter achieved ∼140 mV/SPAD event in 30-μm PP with 8.7% FF [35] , [36] . Although the use of three pMOS in the pixel limits attaining a higher FF. In a further example, a 32 × 32 array was implemented with 25-μm pitch SCS SPC pixels at a notable 20.8% FF [37] , [38] . The 12T pixel circuit was the first example of a pixel employing only nMOS-only transistors to achieve high FF. A front-end gating circuit generates a picosecond duration input pulse removing the SCS dependence on SPAD dead time. An nMOS-only inverter has static power consumption, making it unsuitable for a scaling to larger arrays. This same group published 40 pixel linear test arrays with both SCS and CTA trials, replacing the nMOS inverter with CMOS devices in [39] and [40] . The PP and FF are estimated at 40 μm × 20 μm at 10% FF. As a precursor to this work, an nMOS-only 11T hybrid counter architecture combining a CTA and SCS was presented in [32] achieving 9.8-μm pitch at 3.1% FF. That CTA pixel is redesigned here, reducing to 9T and implementing SPAD well sharing to achieve 8-μm PP at 26.8% FF. Individually [32] , [39] , and [40] all conclude that CTAs have lower variability reporting ≤2% pixel response non-uniformity (PRNU), versus SCS ≥ 8% PRNU.
III. IMAGE SENSOR OVERVIEW

A. SPAD-Based Pixel
This section details the operation of the hybrid CTA and SCS analog counter pixel shown in Fig. 4(e) . The SPC operation is intended to function in either CTA mode, with small voltages steps (microvolt to millivolt range) and a resulting large full well, or SCS mode with large voltage steps (hundreds of millivolts to volts) and a small full well. In CTA mode, the pixel counter response is bias controllable (as detailed in [32] ) using the V s bias voltage and can be expressed as follows:
where C P is the parasitic capacitance between M5 and M6, C T is the total capacitance of the capacitor (MC in Fig.4 (e)), and V EB is the excess bias (EB) of the SPAD (assuming that across the time gate switch M2, V EB < V TIMEGATE − V TM2 ). By the propagation of errors theorem, the pixel-to-pixel variability is
where V s is assumed to be a constant. In SCS mode, M5 is a source follower and M6 is a current source, both here considered only in the saturation region. The equation for the voltage step to first order is
where the dead time τ D is expressed as
where τ Q is the rise time or quenching period of the SPAD avalanche. On the other hand, the SCS pixel-to-pixel variability is described in the following expressions:
where the second two terms are expanded as
It is apparent from these expressions that the SCS mode inherently suffers from greater variability, yet with a higher achievable voltage step range (with control of either SPAD dead time or V GSM5 ). As a result, the SCS is used only as a dynamic memory for digital single-bit operation and the CTA mode for analog SPC. The use of nMOS dynamic memory removes the scalability limit and static power consumption of nMOS-only SRAM. Fig. 4 shows the image sensor, pixel, and readout. The pixel [ Fig. 4(e) ] analog integrator structure consists of M5 dynamic source follower, discharge transistor M6, and polycapacitor (cap) MC. The counter voltage is output via the conventional nMOS CIS APS readout of M7 source follower, M8 read select, and M4 reset, the 3T structure proposed in [41] following Noble's classic array readout paper [42] . The image sensor has two distinct readout mechanisms: 1) single-channel sequential-read analog CDS and 2) column parallel single-bit flash A/D conversion. Column parallel CDS sample and hold stages [ Fig. 4(d) ] perform conventional APS row-wise sampling. Each of these CDS column buffers is sequentially scanned out using a single-channel analog bus through two single-ended op-amp buffers to an off-chip differential ADC. The buffer crowbar switch implements the deltadifference sampling vertical FPN (VFPN) minimization technique described in [41] .
B. Image Sensor
Furthermore, the single-bit digital readout is designed to operate the sensor, as a digitally oversampled binary image sensor, continuously at kiloframes per second. The columnparallel coarse flash conversion and single-bit digital readout are intended to function with the pixel biased in SCS mode with the highest counter step size. This operates the pixel as a time-gated photon triggered dynamic memory. In this condition, the noise mechanisms and offsets (kT/C noise, source follower 1/ f noise, source follower V t variation, and so on) are much lower in magnitude than the signal, and once input referred, these transistor-based noise sources are rendered insignificant. Hence, the need for CDS is removed and the data conversion is performed in a single step. Singleflash sampling (with no reset sample) allows the row line time to be much shorter than the conventional CIS line timing in the region of hundreds of nanoseconds, as the column does not need to settle twice per row plus the reset time. Incomplete settling of the column lines is permissible, although may lead to higher conversion errors. The schematic of the differential single-bit dynamic latched comparator is shown in Fig. 4(c) . The column bus is connected directly to the positive side of the nMOS input differential pair. A global voltage reference, from an off-chip DAC, is connected to the negative input of the comparator and is common to all columns. The single-bit comparator output is sampled into a 20-b-length serializer [ Fig. 4(b) ]. Each single-bit binary image read out from the sensor is referred to as a field image. The field image is streamed out across 16 serial outputs in 4800 clock cycles. Using a 76.8-MHz clock, a field image is, therefore, transferred in 62.5 μs with the sensor operating at 16 kframes/s with an output data rate of 1.22 Gb/s.
A nanosecond electronic shutter is created by one of the two time-gate pulse generators. Two timing-balanced clock H-trees connect to the time-gate row drivers. This row driver circuit selects one of the two time-gate pulses to drive onto each row of the imaging array. Other row drivers handle the row select and read signals, the pixel reset, and the time-gate disable function. Both the X and Y (row and column) addressing decoders are binary to one hot thermometer code converters. The sensor is controlled by an field programmable gate array (FPGA), which handles exposure control, pixel addressing, readout timing, QIS oversampling, and manages the data pipeline to PC. 
IV. QIS IMAGE FORMATION
As described in [10] , the QIS is an imaging array of photosensitive sites with single-bit output, each site referred to as a jot. Binary field images (or jot bit planes) are oversampled, either spatially or temporally, to form a multibit intensity frame image, where each pixel is composed of a summation of jots. Aggregation is performed in a frame store, summing jot bit planes with a memory location per pixel to the required output bit depth-every doubling of bit depth halves the output frame rate. This has been shown through software postprocessing in [43] and [44] and in a real-time FPGA implementation using this image sensor in [11] . To capture the image in Fig. 6 , the 32 field images are continuously captured each with 2-μs exposure time and temporally oversampled to form the shown image with 5-b depth. It highlights the global shuttering of the field capture where blurring on the edge of the moving fan blades emanates from the temporal summation. It demonstrates that the fast moving objects in a scene can be captured continuously using the 16-kframes/s frame rate single bit capture in conjunction with real-time oversampling.
V. EXPERIMENTAL RESULTS
A. SPAD Characterization
The p-well to shared deep n-well substrate-isolated SPAD is characterized in terms of dark noise, dead time, PDP, and temporal jitter. The median DCR of the image sensor at room temperature is 47 counts/s median at 1.5 V EB, this has been improved from the previous results in [11] by process improvement. The recharge time of the SPAD after an avalanche event is known as the SPAD dead time as it is unresponsive to subsequent photons during this period. This is an intrinsic pile-up distortion mechanism and must be minimized. The advantage of low-PP SPADs is that the diode capacitance is small, and fast recharge is attainable. The recorded dead-time data for three EBs, are plotted in Fig. 7(a) indicating that 1.1-ns dead time is achieved with 1 V EB. The PDP, the SPAD-equivalent of intrinsic quantum efficiency, is measured and plotted in Fig. 7(b) . The PDP is in keeping with previously published results with 39.5% peak at 480 nm [45] . The integrated jitter with an 80-ps 425-nm laser impulse is 184.6-ps full-width at halfmaximum (FWHM), and subtracting the quoted laser driver jitter in quadrature yields 166.4-ps FWHM.
B. Quanta Image Sensor Characterization
The quanta imaging performance is captured with the sensor exposed to uniform constant illumination with 1 V EB at VQ = 1 V (12-ns dead time). The recorded bit plane density (D) is calculated as those pixels registering an SPAD event, and normalized. Also, the exposure time H is normalized using 3.7-μs field exposure α 1.0H. Fig. 7 (c) displays the measured D versus H , and plotted alongside is the ideal DlogH QIS line and an ideal linear line. The overexposure latitude is calculated at 4.6×, which matches QIS theory [9] . The lower graph shows the noise versus exposure, highlighting at lower exposures the sensor is photon shot noise limited, and at higher noise, the shot noise is compressed as expected in a QIS. On average, the sensor measures 0.17% higher noise than the ideal, representing a bit error rate of 0.0017 and an equivalent read noise of 0.168e − using the equation in [10] . Fig. 8 demonstrates the measured signal-to-noise ratio (SNR) of the sensor with 18-dB SNR at 1.0H and 54-dB SNR peak at 10.0H indicating the shot-noise-limited performance in the underexposure region <1.0H and shot noise compression in the overexposure region >1.0H as predicted by QIS theory.
C. Single-Photon Imaging
The off-chip 14-b ADC temporal noise is measured as 167.3-μV rms. Read noise is measured at 916-μV rms. The dark fixed pattern noise (1 σ ) of the sensor is 91-μV horizontal FPN, 80.6-μV VFPN, and 99-μV pixel to pixel FPN. The maximum sensitivity in CTA mode is measured at 14.2 mV/SPAD event and the histogram of the image sensor output in this condition is shown in Fig. 9 . The counter step size is bias controllable and the step size can be reduced to attain a higher maximum counting capability or effectively a higher full well. However, this is to the detriment of read noise. Teranishi [6] describes the condition required for true electron counting at a maximum of 0.3e − read noise. Fossum [9] promotes that photon counting can be achieved with maximum 0.15e − read noise. With higher read noise, discrete singlephoton peaks cannot be accurately resolved. At the maximum sensitivity, the effective full well is ∼68 SPAD events with 0.06e − read noise. SPC imaging is achieved with this sensor at the 0.15e − input-referred read noise limit with 6.1 mV/SPAD event sensitivity or higher, with an effective full well of 163 counts. Moreover, a 0.3e − read noise limit is attained at 3.05 mV/SPAD event or higher, with a higher effective full well of 327 counts.
To demonstrate the SPC image capture of a calibration chart, the pixel array is biased in CTA mode, V s = 0.15 V and V g = 0.7 V, and 3 V EB with 5-and 20-μs exposure time in Fig. 10(a) and (b) , respectively. The histograms of these images are shown below, indicating the discrete peaks of photon counting. To demonstrate global shutter imaging, a fast moving fan is captured under 400 and 5 lx illumination in Fig. 10(c) and (d) , respectively. The contrast of the four images has been scaled independently and they are captured with a 90°field of view (FOV), F#2.0 lens. The large FOV creates a fish eye distortion of the calibration chart in the images.
VI. CONCLUSION
This paper demonstrates that the compact layout of CMOS SPADs with analog pixel electronics achieves a high-spatial resolution image sensor with the highest FF of a SPAD-based image sensor pixel to date. The pixel design is scalable to megapixel arrays and is a candidate for the realization of stacked SPAD image sensors. Single-photon sensitive imaging is realized in a hybrid of two imaging modalities: in a QIS mode with single-photon full well at high frame rate, and with multiphoton full well with analog counting.
The time-domain characterization of the time gating or shuttering will be reported in a future publication. The maximum frame-rate achievable with this sensor will be able to increase to an extent through timing optimization but is fundamentally limited by the number of IO channels. Greater numbers of output channels will increase the frame rate although the scaling of power consumption is a major concern in the QIS paradigm.
In binary image sensor operation, theoretical QIS DLogH intensity and noise characteristics are confirmed experimentally. Furthermore, the images are shown, which demonstrate SPC imaging, with sub-0.15e − read noise and shot-noiselimited statistics. In [9] and [46] , the concept of multiphoton QIS emerged and it is envisaged that the analog CTA-based SPC in conjunction with fast binary readout will be employed to demonstrate this in the future work.
