Abstract-For a 4T pixel-based CMOS image sensors (CIS) readout chain, with column-level amplification and CDS, we show that the input-referred total noise in a standard 65 nm process can be reduced to 0.37 e − rms . Based on transient noise simulation using Eldo, the deep sub-electron noise performance have been reached using only circuit techniques and optimal device choices. The simulation results have been favorably compared with analytical noise calculations. The shot noise associated to the gate tunneling current has been simulated and the possibility of photoelectron counting in this 65 nm process has been demonstrated.
I. INTRODUCTION
Low-light performance of CIS is becoming a major concern for consumer low-cost products as well as niche application requiring deep sub-electron noise performance [1] . It has been shown that read noise levels as low as 0.48 e − rms [2] can be reached with standard process using circuit technique and parameter optimization. Process refinements [3 , 4] can also reduce the noise slightly below 0.3 e − rms . However, the former requires the reset to be performed with a high voltage clock of 25 V, while the latter obtaines the low-noise reduction at the cost of a low pixel full-well capacity. Recently, more advanced technology nodes under 100 nm have been introduced for CIS [5] . It has been predicted analitically [6] that the read noise can be reduced by taking advantage of technology downscaling. In this paper we investigate the read noise of a CIS readout chain integrated in a 65 nm process. Fig. 1 shows low-noise CIS readout chains, based on 4T pixels with different types of source follower (SF) stages, column-level amplification and correlated double sampling (CDS). The 4T pixels embed a pinned photodiode (PPD) and a transfer gate (TX) used to transfer the photoelectrons to the sense node (SN) after reset (RST). The RS switch connects the pixel to the column for readout. The column-level amplifier controls the bandwidth and reduces the noise contributions of the next stages. CDS consists in sampling the signal at the output of the column-level amplifier right after the sense node reset and right after the photoelectrons transfer. The differentiation of these two samples cancels the kTC noise sampled at the sense node.
II. READ NOISE & SCALING EFFECTS
The read noise calculation in such CIS readout chains has been detailed in [6 , 7] . It has been shown that the major contributors to the input-referred thermal noise are the pixellevel SF stage and the column-level amplifier. The inputreferred thermal noise charge variance can be expressed as
where A col is the ratio of the capacitances C in and C f and
1. C L is the load capacitance, C in the integration capacitance. A CG is the conversion gain, C ox the SF oxide capacitance per unit area, C P the sum of all the parasitic capacitances connected to the SN and C e the extrinsic capacitance per unit width of the SF transistor, including the fringing field and the overlap capacitances. W and L are the width and lenght of the SF. k is the Boltzmann constant, T is the absolute temperature, γ SF and G m,SF with γ A and G m,A are the noise excess factors and the transconductances of the inpixel SF transistor and the column-level amplifier, respectively. The 1/f noise power spectral density (PSD) [8] is inversely proportional to the gate area. Hence, the contribution of all the transistors located outside the pixel can be designed with much larger gate area than the in-pixel SF, making the latter the dominant 1/f noise source. Under this condition, the inputreferred 1/f noise charge variance can be expressed as [7] 
where α 1/f is the unitless circuit design parameter reflecting the impact of the CDS on the 1/f noise. As shown in [6] , α 1/f can be calculated numerically and for enough settling time between two samples it ranges between 4 and 5. K F is the flicker noise parameter expressed in [8] as
where, q is the electron charge, λ the tunneling attenuation distance ( 0.1nm), N t the oxide trap density and K G is a bias dependent parameter close to unity when the transistor operates in weak and moderate inversion [8] .
Both transient noise simulations and experimental results [7] show that the thermal noise can be efficiently reduced to be negligible compared to the 1/f noise. This can be achieved by high column-level gain and bandwidth control. Despite the CDS impact [7] , the residual 1/f noise remains dominant in conventional low-noise CIS readout chains. Equation (2) shows the different design and process parameters involved in the input-referred 1/f noise. It suggests that it is possible to take advantage of technology downscaling in order to reduce the 1/f noise through a higher C ox and a lower minimum gate width, assuming a constant N t . In this work, a 65 nm process is used in order to investigate this idea. 3.3 V transistors are used in stateof-the-art CIS. It has been shown in [2] that a lower voltage transistor can also be implemented as SF without degrading the dynamic range. In this 65 nm process, the transistors that can be used in the in-pixel SF are shown in Table I , with their parameters relevant to this analysis. The 3.3 V in-pixel SF traditionally used in 180 nm CIS process feature a C ox of about 4 fF/μm 2 with a N t of 1.5 · 10 17 eV −1 · cm −3 for nMOS and 3 · 10 17 eV −1 · cm −3 for pMOS. All the transistors shown in Table I feature a higher C ox and their N t is lower with respect to the 3.3 V nMOS from a typical 180 nm CIS process. Consequently, based on (2), a better 1/f noise performance can be expected from this 65 nm process. Specifically the pMOS2.5 has the best N t /C 2 ox ratio, which makes it the best candidate for low-1/f-noise performance, followed by the pMOS1.2 and the nMOS2.5. The input-referred flicker noise calculated from (2) with the parameters given in Table I Fig. 1 shows the schematic of the simulated low-noise CIS readout chains. Each pixel is based on a different source follower: nMOS2.5, pMOS2.5 and pMOS1.2. The pMOS based pixels use pMOS2.5 row selectors. For the pMOS1.2 transistor, the bulk and the drain are shifted in order to keep the voltage between its terminals below 1.2 V. The three pixels are sharing the same column-level readout chain made of a column-level amplifier designed using a fully cascoded single-ended amplifier and CDS. The bandwidth of the column-level amplifier has been set to 256 kHz, for a gain of 64 and a load capacitance of 200 fF. Consequently, the minimum time interval T CDS for enough settling of the signals is about 4 μs. The CDS is implemented with an analog circuit. The corresponding readout chain timing diagram is shown in Fig. 2 . In an analog CDS, a first sample is held in a capacitor after resetting the pixel; then, the TX is turned-on and after a time equal to T CDS , a second sample is stored in an other capacitor. The two samples are differentiated after the rising edge of the signal S SH3 . The auto-zero (AZ) is performed in order to reset the feedback capacitor [9] .
Given that the readout chain is a time variant system, the most realistic way of simulating the noise is the transient noise simulation. In this paper we used the Eldo simulator that allows the analysis of the 1/f noise and thermal noise separately.
IV. RESULTS

1) Thermal noise:
The input-referred thermal noise, obtained from transient noise simulations, as a function of the columnlevel gain A col is shown for each of the three SF configurations, in Fig. 3 . The latter curves show how the column-level gain decreases the thermal noise, as expected analytically by (1) . Note that the contributions of the pixel and the column-level amplifier have similar values. For a column-level gain of 64, a C L of 200 fF and a bandwidth of 256 kHz, the input-referred thermal noise of each configuration is below 0.3 e − rms , as for the 180 nm from [2] . In fact, both the readout chain based on pMOS2.5 and nMOS2.5 feature an input-referred thermal noise of 0.22 e − rms , while the pMOS1.2 features a noise level of 0.24 e − rms . These simulation results are compared with the input-referred noise calculated using (1) , showing an excellent matching. For the noise calculation, both noise excess factor γ SF and γ A are considered to be equal to 1, C P has been obtained by simulation as 0.72 fF, C e has been considered to have a value of one tenth of C ox , G m,A is equal to 30 μS and G m,SF 13 μS for pMOS2.5, 30 μS for nMOS2.5 and 23 μS for pMOS1.2. The pMOS2.5 and nMOS2.5 feature a minimum gate width of 0.4 μm and a minimum length of 0.28 μm, while the pMOS1.2 features a width of 0.2 μm and a length of 0.3 μm. All the width and length values were chosen to optimize the input-referred total noise. The simulation and calculation results show that the downscaling does not increase the thermal noise and the analysis leading to (1) is still valid for this 65 nm process. The result of this analysis is that the thermal noise of the readout chains with all type of SFs could efficiently be reduced using column gain and bandwidth control. As it will be shown in next subsection, the 1/f noise is confirmed to be dominant.
2) 1/f Noise: The input-referred noise obtained by transient noise simulations for the three different configurations are shown in Fig. 4 . The 1/f noise of the pMOS2.5, pMOS1.2 and nMOS2.5 is calculated for a column-level gain of 64, a C L of 200 fF and a bandwidth of 256 kHz, and behaves as expected theoretically. The mismatch between the simulated and the calculated values can be explained with the different values of the parameter K G , which has been considered constant and equal to unity in calculation. Indeed, K G depends on the inversion coefficient [8] , which is not the same for the three types of transistors. The nMOS2.5 shows a high N t and low C ox , hence it features the highest noise level. The pMOS1.2 features approximately the same N t as the nMOS2.5 but a twice larger C ox , resulting in a twice better rms noise performance. But for the pMOS2.5, even if its C ox is not as high as the pMOS1.2, it features a much lower N t , which makes it the lowest noise device with an input-referred 1/f noise of 0.32 e − rms .
3) Shot Noise: With a scaled gate oxide down to 3 nm and below, the gate leakage current due to the carrier direct tunneling becomes important [10] . From Table I , we can observe that this is the case for pMOS1.2. As it is shown in Fig. 5, in BSIM4 , the gate tunneling current components include the tunneling current between gate and substrate and the current between gate and channel, which is partitioned between the source and drain terminals. Since these leakage currents are due to barrier control processes, they give rise to shot noise. The input-referred charge variance due to the total leakage currents shot noise is expressed in [6] . The shot noise current sources feature a white PSD and when integrated in the SN capacitance, they give rise to a variance increasing linearly with T CDS [6] . The BSIM4 model parameters igcMod and igbMod allow the activation of the gate leakage current components. This makes possible the separation between thermal noise and gate tunneling current shot noise in the simulation. The simulation shows that this shot noise is completely negligible for thick oxide transistors nMOS2.5 and pMOS2.5 and for the thin oxide pMOS1.2, with a column-level gain of 64, a C L of 200 fF and a bandwidth of 256 kHz, the input-referred charge noise variance increases dramatically to reach 1.88 e − rms . 
4) Total noise:
The input-referred total noise is shown in Fig. 6 . It appears clearly that the pMOS2.5 features the best noise performance of 0.39 e − rms , as expected theoretically. This noise relays between the photoelectron counting (0.3 e − rms ) and the photoelectron detection limit (0.4 e − rms ). In order to further reduce the thermal noise of the readout chain, we used in addition to the A col the increase of C L . This results in a lower bandwidth and high T CDS . Fig. 7 shows that the simulated total input-referred noise can be further reduced to reach 0.37 e − rms . Based on the noise performance of the readout chain based on the pMOS2.5 SF, it is interesting to investigate the possibility of photoelectron counting. Fig. 8 shows the histogram of the input-referred signal of the readout chain based on the pMOS2.5 when injecting 5 e − at the SN. It demonstrates that reasonably accurate photoelectron counting can be performed. 
V. CONCLUSION
The analytical noise calculation of the thermal (1) and 1/f noise (2) is valid for this 65 nm process and shows a good match with the transient noise simulation results. For the simulated CIS readout chains based on different in-pixel transistor types, the thermal noise could be reduced to levels close to 0.25 e − rms using only column-level gain (64), C L of 200 fF and bandwidth control (256 kHz). The 1/f noise analysis shows that the higher value of C ox compared to the older technology nodes, benefit to the 1/f noise reduction. The best choice does not correspond directly to the highest C ox , indeed the N t should also be taken into consideration. On the other hand, the oxide thickness of 2.8 nm, corresponding to the device featuring the highest C ox , made the gate tunneling current shot noise increase sharply and dominate the other noise sources, precluding sub-electron noise performance.The best flicker noise performance of 0.32 e − rms is obtained for the device pMOS2.5, which has the lowest N t . The obtained value for input-referred total noise of 0.37 e − rms for this device makes the photoelectron counting possible.
