In several emerging fields of study such as encryption in optical communications, determination of the number of photons in an optical pulse is of great importance. Typically, such photon-number-resolving sensors require operation at very low temperature (e.g., 4 K for superconducting-based detectors) and are limited to low pixel count (e.g., hundreds). In this paper, a CMOS-based photon-counting image sensor is presented with photon-number-resolving capability that operates at room temperature with resolution of 1 megapixel. Termed a quanta image sensor, the device is implemented in a commercial stacked (3D) backside-illuminated CMOS image sensor process. Without the use of avalanche multiplication, the 1.1 μm pixel-pitch device achieves 0.21e− rms average read noise with average dark count rate per pixel less than 0.2e− ∕s, and 1040 fps readout rate. This novel platform technology fits the needs of high-speed, high-resolution, and accurate photon-counting imaging for scientific, space, security, and low-light imaging as well as a broader range of other applications.
INTRODUCTION
High-performance photon-counting detectors are widely sought after for applications such as low-light, scientific, and space imaging, as well as automotive sensors and security. Counting error rate, readout speed, spatial resolution, quantum efficiency (QE), and dark current (or dark count rate) are all key factors that contribute to the performance of these sensors. The photon-counting technologies currently available on the market include single-photon avalanche diodes [1] [2] [3] [4] [5] (SPADs) and electron-multiplication charge-coupled devices [6] (EMCCDs). Both devices rely on electron avalanche multiplication to generate a large voltage signal from a single photon. These structures require a high operating voltage to create the critical electric field needed for the avalanche effect, which is not typically compatible with advanced CMOS technology. Hence, these devices cannot take full advantage of advanced CMOS processes, resulting in larger detector size with lower spatial resolution and higher power dissipation. The use of avalanche multiplication also makes both devices more sensitive to dark current, which is usually caused by thermally generated electrons or the re-emission of an electron in an interface trap. At room temperature, the dark count rate for a SPAD-based image sensor ranges from as low as 20 [7] to hundreds of counts/s. The dark current for EMCCDs is often more than 30e− ∕pix∕s [8] , which limits the lowest illumination level they can detect, so external cooling is always required [9] . Additional in-pixel readout circuitry is required for SPAD-based image sensors to realize in-pixel signal integration for photon-number-resolving operations, which leads to a larger number of transistors to both quench the device and condition the output for integration, resulting in a limited fill-factor (<40%) and low QE (<30%) compared to CMOS image sensors (CISs). In an EMCCD image sensor, the signal photoelectrons must be read out through a long CCD array, which limits the readout speed compared to CISs and restricts it from being used for applications where high temporal resolution is required.
Quanta image sensors (QISs) are a third-generation solid-state image sensor technology [10] [11] [12] [13] . Compatible with baseline CIS technologies, they inherit CIS advantages in terms of pixel size, spatial resolution, dark current, quantum efficiency (QE), readout speed, and power dissipation. Beyond CIS and existing photoncounting technologies, the QIS aims to realize accurate photon counting without avalanche gain or cooling, while maintaining low dark current and manufacturing cost.
THEORY A. Quanta Image Sensor
A QIS may contain up to several billions of tiny specialized pixels, each called a "jot," meaning "smallest thing" in Greek. These jots accumulate photoelectrons during an integration period and output a single-or multi-bit value corresponding to the number of collected charges. Compared to a normal CIS pixel, a jot may have a small full-well capacity (FWC) of around 1-200 electrons. In the single-bit case, the array of jots must be scanned at a high frame rate (e.g., 1000 fps) to minimize the chance that a single jot receives more than one photon. After the binary data is collected, image processing is used to combine the jot data over the spatial and temporal domains into image pixels that reflect the photon flux. In this paper, accurate photon-counting imaging is demonstrated with a 1Mjot QIS. The reported jot devices show an average read noise of 0.21e − rms, with a best-case read noise of 0.17e − rms from a subset of the measured jots, enabling accurate photoelectron counting. Additionally, an extremely low dark current of less than 0.2e − jot∕s at room temperature is also demonstrated.
B. Photon Counting and Read Noise
Incident photons are absorbed in silicon and generate electronhole pairs, and these photoelectrons are measured as a voltage signal in a QIS after being transferred to the floating diffusion (FD) capacitor in a jot device. The voltage signal generated by a photoelectron is
where Q e is the elementary electron charge and C FD is the capacitance of the FD node. Typically, the voltage signal generated by one single electron in a CIS is usually small. To be able to observe the absorption of a single photon, the FD-referred total electrical noise of the sensor, called the read noise, needs to be lower than 0.5e− rms [14] [15] [16] . The number of photoelectrons collected by a jot in each frame can be modeled by a Poisson distribution. The read noise in a QIS and other low-noise image sensors is dominated by the noise of the in-pixel source follower (SF). The major contribution is 1∕f noise (flicker noise), which is widely believed to be caused by the carrier number fluctuation due to the trapping and re-emission events associated with the Si-SiO 2 interface traps near the SF channel. 1∕f noise can be well described by a Gaussian distribution [17] . The probability distribution for the normalized output signal of a QIS is given by [15] 
where U is the normalized voltage signal in e − , u n is the read noise in e − rms, k is the number of collected photoelectrons, and H is the quanta exposure, defined as the average number of photoelectrons collected over an integration period. Some simulation results illustrating these distributions are shown in Fig. 1(a) , where the quantization of the photoelectron number disappears when the read noise gets to 0.5e − rms. The valleyto-peak ratio in the probability distribution is very sensitive to the read noise, and the quantization of the photoelectron number is more distinct with lower read noise, which is shown as a lower valley-to-peak ratio. Based upon this effect, a figure of merit called valley-to-peak modulation (VPM) is defined for the experimental characterization of QIS, as well as other photon-counting image sensors [18] . In the experiments, a histogram of the jot signals, called a photon-counting histogram (PCH), can be constructed from thousands of continuous read values under a stable illumination. The read noise can be characterized with the VPM extracted from the PCH, while the voltage signal generated per photoelectron, or conversion gain (CG), can be measured with the peak-to-peak distance. For read noise higher than 0.15e − rms, the broadening of each peak begins to overlap with its neighboring peaks, which leads to counting errors during the thresholding process of a QIS. As shown in Fig. 1(b) , in a single-bit QIS, a threshold is set in the comparators corresponding to the signal level of 0.5e − . The overextension of peak-0 and peak-1 lead to the "false positive" and "false negative" counts, respectively. The total bit error rate is directly affected by the read noise and is slightly different as the quanta exposure H varies, as shown in Fig. 1(c) . To achieve counting error rate of less than 0.1% for different levels of quanta exposure H, it is preferable to reduce the read noise to 0.15e − rms or lower. 
METHODS FOR READ NOISE REDUCTION WITHOUT USING AVALANCHE GAIN
One can either reduce the output voltage noise or increase the CG to reduce the read noise down to photon-counting levels (<0.5e − rms). Our approach involves minimizing the capacitance of the FD node to increase the CG and overcome the voltage noise. As shown in Fig. 2(a) , the FD capacitance in a CIS pixel includes the junction capacitance between the FD n node and the p-type substrate, the overlap parasitic capacitance between the FD and transfer gate (TG) as well as between the FD and the reset gate (RG), the source-follower gate capacitance, and the inter-metal coupling capacitance from the wiring. Previously, a pump-gate (PG) jot device was developed by our group [19, 20] which significantly reduced the TG overlap capacitance with a distal FD and maintained the charge transfer efficiency with a specialized doping profile in the TG and FD regions. As shown in Fig. 2(b) , the specialized doping profile consists of a n-type storage well (SW), a p-well region, a potential barrier (PB) region, and a virtual barrier (VB) region. The PG jot and the tapered reset PG jot (TPG) were prototyped in a 32 × 32 array, and sub-0.3e − rms read noise was achieved with a 4 × improvement in CG over the typical CIS pixels [21] [22] [23] .
A punch-through reset (PTR) structure has been developed and applied to the PG jot to eliminate the RG overlap capacitance and further improve the CG. The PTR technique was previously invented for faster reset [24] in a large CIS array and reduced reset noise [25] , without using correlated double sampling (CDS). The architecture of the PTR diode is illustrated in Figs. 2 
(b) and 2(c).
The PTR diode is a n-p-n junction, and the reset starts when a relatively high positive bias is applied on the reset drain (RD). In the punch-through "on" state, the p-region becomes fully depleted and punch-through occurs. A current path between the FD and the RD is created, and in the "off " state, holes accumulate in the p-region and create a potential barrier to stop current flow between the FD and RD. A similar gateless reset device was recently used to improve the CG of a conventional CIS [26] . Because of the large FWC required by the conventional CIS, the previous use of the punch-through technique with CIS always required a high voltage (>20 V) for proper operation. This high voltage is not compatible with baseline CMOS processes, so the implementation becomes more complicated [27] . Since the FWC needed for a QIS is quite small, a PTR diode for QIS can function with regular CMOS operating voltages, such as 2.5 V [28, 29] .
These read-noise-reduction inventions were implemented in a test chip that contains 20 different 1Mjot QIS imagers. The chip was designed in a TSMC stacked (3D) back-side-illumination (BSI) 45 nm/65 nm CMOS process. The fabrication of the new jots followed the baseline CIS process flow, while implantation modifications were made to realize the desired doping profile for the pump-gate and PTR structures.
OVERVIEW OF THE 1MJOT STACKING QIS CHIP
The QIS chip is designed in a two-layer stacked process; the jot devices are fabricated on one wafer, and the readout circuits and control signal drivers are located on the second wafer. The signal from the jots is sent to the signal-processing electronics through millions of tiny wafer interconnections. A cluster-parallel readout architecture is used for the high-speed and low-power operation required for a QIS. The illustrations of the cluster-parallel architecture are shown in Figs. 3(a) and 3(c). A 1Mjot sensor is divided into multiple independent sub-arrays, or clusters. Each cluster has its own dedicated readout unit, and the clusters function in parallel. The cluster-parallel approach allows for the simultaneous improvement of sensor size and readout speed. Additionally, since the cluster design is independent of the array size, this architecture also helps to maintain the speed and performance 
Research Article
Of the 20 1Mjot arrays implemented on this chip, 10 are built with analog readout circuitry for the purposes of jot characterization while the other 10 use high-speed, low-power, single-bit digital readout. Within the two groups of 10 arrays, different jot designs are used in each array. All the jots are designed with a 2H × 1V shared architecture with 1.1 μm pitch size. As shown in Fig. 3(d) , for the analog readout, the jot reset and signal voltage is first stored in the CDS circuitry, and then each value is amplified by a switched-capacitor programmable gain amplifier sequentially with a gain of 10V/V. An off-chip 14-bit analogto-digital converter (ADC) is used to quantize the analog signal, the output of which is collected by a high-speed PC interface. There are 16H × 4V clusters and 16 parallel outputs in one analog QIS. As shown in Fig. 3(b) , for the high-speed single-bit digital output, the jot outputs are stored in a CDS unit and then connected to the input of a fully differential charge transfer amplifier, which is followed by a low-power d-latch comparator [30, 31] . The CDS units in digital clusters share the same architectures as the ones in analog clusters. The differential CDS signal is compared to an externally supplied threshold voltage in the comparator to determine the binary state of a jot signal. The binary signal is then sent off-chip and collected by a high-speed PC interface. There are 16H × 16V clusters and 32 parallel outputs in one digital QIS.
EXPERIMENTAL RESULTS

A. Demonstration of Photoelectron Counting
The characterization of the jots was performed with the analog output arrays. The PCH-VPM method was used to characterize the read noise and CG of the jots. In the experiments, each jot was continuously read out 10,000 times under a stable illumination to form a PCH, and the VPM was extracted from the PCH and compared to the analytic model, where a best-fit read noise value was identified. The measurements were performed with a fixed integration time of 120 μs. The PCHs from a PTR jot with 0.17e − rms read noise are shown in Fig. 4 under four different illumination levels. With such a low read noise level, the quantization of the photoelectron number can be clearly observed, and the photoelectron counting is demonstrated. The same testing was applied to 16,000 jots of each type to analyze the performance variation of the jots. As illustrated in Fig. 5 , the PTR jots showed 0.21e − rms read noise on average, with 15% rms variation. The variation in the read noise is a combination of the variation in the CG and the variation in the output voltage noise of the in-jot SF. The voltage-referred read noise and CG of the tested jots are presented in a scatter plot [ Fig. 6(a) ]. The dashed lines are illustrated as references showing the electron-referred noise levels. The distribution of the CG and read noise appears to be random and uncorrelated. A rational hypothesis for the variation is that it is a superposition of small, random differences in each jot caused by the fabrication process. For example, small variations in mask dimensions and doping concentrations can lead to differences in CG, and small variations in the number of defects in the in-jot SF can lead to different voltage noise levels in the SF. The TPG jots showed 0.23e − rms read noise on average, with 15% rms variation. As shown in Fig. 5 , the PTR jots have a lower read noise because of their higher CG: 345 μV∕e − for the TPG jots and 368 μV∕e − for the PTR jots on average, both with about 2% rms variation. The ∼7% improvement in CG comes from the reduction of the RG overlap capacitance in the PTR jots.
In a single-bit or multi-bit QIS, the n-bit counting result is created by quantizing the jot signal using 2 n − 1 threshold levels. As a result, the counting result is a function of quanta exposure H and read noise, and can be obtained from Eq. (2) by integration [16] . As illustrated in Fig. 6 , the counting result deviates from the quanta exposure as the read noise increases, especially in the sparse illumination region. This effect is caused by the accumulation of false positive counting errors, and the level of alignment between the average counting results and the actual quanta exposures reflects the average counting accuracy. In the 
Research Article
Vol. 4, No. 12 / December 2017 / Opticaexperiments, an array of jots was exposed to different illumination levels, and the average count and quanta exposure were extracted from each illumination level. As shown in Fig. 6 , the experimental data from an ensemble of 16,000 TPG jots matches the theory for an average read noise of 0.23e − rms. The same experiment was also performed with a single PTR jot, and the results match the analytic model for a read noise of 0.17e − rms. It may be noticed that the measured curve for the 16k jots does not extend to the region of H < 0.1e − . This is because some of the TPG jots exhibit an excessive dark current besides the SW dark current. The excessive dark current is proportional to the duration of the TG pulse width but not correlated to the SW integration time. A similar but stronger effect was discovered with the previous TPG jot test chip, and the details were discussed in Ref. [22] . The suspected cause for this dark current is that the VB region in the pump-gate structure becomes fully depleted while the TG is "on", which substantially increases the thermal generation rate in that region.
B. Dark Current
In sparse-light conditions, dark current limits the accuracy of a sensor's photon-counting capability. For example, to measure an illumination level of H 0.1e − at 100 fps, the dark current needs to be lower than 10e − ∕s∕jot. The dark current generation process can be well modeled by a Shockley-Read-Hall process [32, 33] . According to this model, mid-gap traps have the highest generation rate and are widely considered to be the major source of dark current in image sensors. Furthermore, the Si-SiO 2 interface is considered to be a major source of the mid-gap traps, as the density of defects is much higher than that of the silicon bulk. The pinned photodiode (PPD) is a wellknown device structure that reduces the dark current by covering the surface interface of the photodiode with a shallow p pinning layer [34, 35] . The pump-gate structure used in the jots can help further reduce the dark current generated from the TG region. Like the PPD, in a PG jot, the surface of the silicon is covered by a shallow p pinning layer. Beyond the PPD, a vertical PB is created between the SW and the surface interface underneath the TG [ Fig. 7(b) ]. During the integration period, the PB region can protect the SW from the dark current generated under the TG, and can effectively steer the dark current towards the FD. As shown in Fig. 7(a) , the dark current of the PG jots is as low as 0.16e − ∕s∕jot on average at room temperature (23°C) or 2.12 pA∕cm 2 , and 1.06e − ∕s∕jot at 60°C temperature or 13.9 pA∕cm 2 . Since dark current electrons are also quantized, a PCH of the integrated dark current was collected [ Fig. 7(c) ] from 256 × 64 jots using 100 frames with a 1.28 s integration time under room temperature.
C. Quantum Efficiency
The CMOS BSI technology can greatly enhance the photosensitive area, or fill-factor, in CMOS image sensors, yielding significant improvements in the QE [36] [37] [38] . The QE of the jot devices was measured in the visible light region. In the experiments, an integrating sphere was used to ensure uniform illumination over the detectors. A group of white LEDs was used as the light source, and the wavelength of the illumination was selected by a narrowband (25 nm) bandpass filter. A NIST traceable calibrated optical power meter was used to measure the photon flux for reference. The measured QE is defined as the ratio between the number of photoelectrons counted by the jots and the number of incident photons given by the reference meter during the integration time. Because of the small FWC (∼200e−) for the jot devices, the Research Article photon flux in each read was kept low (H < 10) to avoid saturating the jots. On the other hand, to ensure the accuracy of the reference meter, relatively strong illumination (>1 μW∕cm 2 ) was chosen. To satisfy both needs, a short integration time (70 μs) was applied for the jots. Under these conditions, the quanta exposure was kept at around 5e − for each read. As shown in Fig. 8(a) , the QE for the jot devices is between 70% to 80% for the visible light regime. Note that the measured jots are monochrome, meaning there is no color filter array deposited on the sensor surface. Moreover, as micro-lenses are absent in the measured jots, their addition may help further improve the QE in the future. Technology computer-aided design (TCAD) simulations were used for a better understanding of the experimental results. In the simulation, the backside of the jot device was prepared with a standard anti-reflection coating based on silicon nitride. Both results are illustrated in Fig. 8 . As shown, the experimental QE has a good match with the simulation for the 550 to 650 nm wavelength regime. On the other hand, the experimental QE is lower than the simulation results in the blue light region (450-500 nm). As the blue photons are absorbed closer to the backside surface, more interaction with the backside interface traps is expected. The random trapping events can cause the observed reduction of QE. It is hard to model this effect in the simulation without prior knowledge of the trap density and energy, and this might lead to the difference in QE in the blue light region. In general, the loss of incident photons can be categorized into three sources: (1) photons that are reflected off the backside surface; (2) photons that transmit through the silicon substrate; (3) photoelectrons that are lost in the silicon (recombined or collected by the in-jot transistors). The simulated distribution of the incident light is shown in Fig. 8(b) .
D. High-Speed Single-Photon Imaging
The high-speed operation of the 1Mjot QIS can be achieved with the single-bit output mode. In this mode, an external reference voltage is supplied to the comparators for thresholding the jot output into a binary number. The reference voltage corresponding to 0.5e − was calibrated using the average CG measurement results. A sample binary image taken with the QIS is shown in Figs. 9(a)-9(c) . When the image was taken, the sensor was operating at 1040 fps, and the power dissipation of the whole sensor was 17 mW. Some fix pattern noise was observed in the results, and the cause is still under investigation. We believe that it can be fixed with some wiring improvements in the layout. Binning of jot bits was applied to create the grayscale image pixels. In this example, 8 × 8 × 8 jots are combined to create one image pixel, and a QIS de-noising algorithm recently developed at Purdue was applied to the final image [39, 40] . The result is shown in Fig. 9(d) .
CONCLUSION
In this paper, the concept behind the QIS is reviewed, the fundamental requirements for photon counting are discussed, and the characterization results of the QIS prototype chip are presented. A summary of the discussed results is shown in Table 1 . To address the disadvantages of the current state-of-the-art singlephoton detectors, jot devices with ultra-low read noise were developed for photon counting at room temperature without using electron avalanche gain. The read noise has been reduced to as low as 0.17e − rms through several inventions, and the photoncounting capability of the jot devices is demonstrated with a 1Mjot QIS prototype chip. Given its importance in high- quality photon-counting imaging, the ultra-low dark current is demonstrated both at room temperature and in a heated environment (60°C). Moreover, the QE in the visible-light wavelength range is reported and discussed. The high-speed single-photon imaging was tested, and 1040 fps readout speed is demonstrated at 1Mjot resolution. The QIS technology is qualified for highspeed photon-counting imaging with high spatial resolution, and we expect it will be widely adopted in scientific and space imaging, life science, security, automotive, and other applications in the near future. Research Article
