Abstract-The stringent performance requirements of many infrared imaging applications warrant the development of precision high dynamic range, high speed focal plane arrays. In addition to achieving high dynamic range, the readout circuits for these image sensors must achieve high linearity and SNR at low power consumption. We first review two high dynamic range image sensor schemes that have been developed for visible range imaging and discuss why they cannot meet the stringent performance demands of infrared imaging. We then describe a new dynamic range extension scheme, Folded Multiple Capture, that can meet these performance requirements. Dynamic range is extended using synchronous self-reset while high SNR is maintained using few non-uniformly spaced captures and leastsquares fit to estimate pixel photocurrent. We conclude with a description of a prototype of this architecture targeted for 3D-IC IR focal plane arrays.
Abstract-The stringent performance requirements of many infrared imaging applications warrant the development of precision high dynamic range, high speed focal plane arrays. In addition to achieving high dynamic range, the readout circuits for these image sensors must achieve high linearity and SNR at low power consumption. We first review two high dynamic range image sensor schemes that have been developed for visible range imaging and discuss why they cannot meet the stringent performance demands of infrared imaging. We then describe a new dynamic range extension scheme, Folded Multiple Capture, that can meet these performance requirements. Dynamic range is extended using synchronous self-reset while high SNR is maintained using few non-uniformly spaced captures and leastsquares fit to estimate pixel photocurrent. We conclude with a description of a prototype of this architecture targeted for 3D-IC IR focal plane arrays.
I. INTRODUCTION
Precision high dynamic range (HDR), high speed imaging is finding growing applications in the automotive, surveillance, tactical, industrial, and medical and diagnostic instrumentation (e.g., fluorescence detection and spectroscopy) arenas. These applications can be broadly segmented into those operating in the visible range (typically 400nm < A < 800nm) and those operating in the infrared (IR) range (typically 4,um < A < 12,um). Precision HDR, high speed IR imaging applications, specifically, are fraught with challenges. In addition to the ability to capture scenes with large variations in irradiance due to object temperatures, the imaging system must be able to deal with undesirable scene disturbances, due to, for example, sun reflection or laser jamming. The imaging system must also have highly linear, shot noise limited readout in order to achieve the stringent sensitivity requirements. In [1] , it is argued that low power IR focal plane arrays (FPAs) with > 120dB dynamic range operating at 1000 frames/sec are needed for such applications. These performance requirements are far more aggressive than is achievable with present-day IR FPAs.
Several HDR extension schemes have been developed in recent years mainly for visible range imaging applications, e.g., [2] - [8] . While these schemes require pixels that are too large to be practical for such applications, they are better suited to IR FPAs where pixel sizes are inherently larger due to the longer wavelengths and the use of bumpbonded detectors, e.g., [9] . Moreover, with the advent of 3D-IC technology whereby multiple wafers can be stacked and vertically interconnected, the effective pixel area available to implement these schemes is increased [1] . However, as discussed in [10] , [11] , none of the existing HDR schemes can meet the aforementioned IR FPA performance requirements. In [12] , a HDR extension scheme denoted by Folded Multiple Capture (FMC) that can achieve all the requirements stated in [1] is presented. Low power consumption is achieved while maintaining high SNR by using digital signal processing to relax the demands on the analog front-end (AFE). FMC also provides tolerance to disturbances in the scene that generate large transient spikes of photocurrent. A proof-of-concept of the FMC architecture has been fabricated and is readily extendable to a fully integrated imaging system using 3D-IC technology [13] .
The rest of the paper is organized as follows. In Section II, we begin by reviewing the fundamentals of image sensors and introduce needed terminology. We then discuss the stringent fidelity requirements in IR imaging applications and review two dynamic range extension schemes that have been developed for visible range imaging. In Section III, we discuss the architecture and operation of FMC, implementation of a prototype, and experimental results obtained.
II. BACKGROUND An image sensor consists of an array of photodetectors followed by circuits for readout. Sensor performance is therefore a function of both the photodetector used and the readout circuits. Each photodetector in a conventional image sensor, e.g., CCD, CMOS APS, or IR FPA, converts incident photon flux into photocurrent tph. In visible range imaging, the incident photon flux corresponds to light reflected off of objects in the scene, while in IR imaging, the incident photon flux corresponds to object thermal radiation. A simplified Signalto-Noise ratio (SNR) Figure 1 for Medium Wavelength IR (MWIR), 4,um < A < 5,um. The nonlinear relationship between temperature and photocurrent, coupled with the fact that in general the background temperature produces a large photocurrent, serve to explain why high DR image sensors are required for IR imaging applications.
A demanding imaging scenario that elucidates the need for precision imaging in IR is one that involves scenes having very small variations in temperature around a much larger background temperature. For example, for imaging the human body the target temperature range is within only ±2K around a nominal background of about 310K. Thus sensitivity is critical, necessitating shot noise limited readout with high SNR. Such sensitivity is typically quantified in the temperature domain as Noise Equivalent Temperature Difference (NETD) [14] . Assuming shot noise limited operation for a given integration time, NETD can be derived as
NETD(T) h() h(T).
In general,
The numerator corresponds to the minimum detectable photocurrent and the denominator translates it into the temperature domain. Note that achieving NETD in the order of lOmK, as is often the case in medical imaging applications, requires detection of photocurrents within the range of a few femtoamps in the presence of a nanoamp offset. This accuracy can be achieved with tint = lmsec using a long wavelength IR FPA if equation 1 holds. In practice the charge-handling capacity of the pixel, constrained by its area, limits the achievable SNR and thus frame averaging is typically performed.
Since calibration is typically performed in IR FPAs to compensate for large variations in detector parameters, such as dark current and gain mismatch, linearity of the readout is essential. Further, the frame averaging alluded to above relies on linear readout. Averaging is only effective in minimizing NETD if the additive noise is zero mean and uncorrelated, such as for temporal noise. This is illustrated in Figure 2 where simulation results are shown for two different readouts imaging a scene with four temperature patches. For the linear, shot noise limited readout, the NETD achieved after averaging 100 frames is 5mK and the output after applying an edge-detection algorithm is satisfactory. For the readout with nonlinearity, however, the NETD achieved is 4OmK and the resulting output after applying the same edge-detection algorithm is clearly unsatisfactory. photocurrent, providing long integration times for pixels with small photocurrents and short integration times for pixels with high photocurrents [2] , [4] , [6] , or by recycling the integrator and therefore extending dynamic range using self-reset or charge subtraction [5] , [7] , [16] . We briefly review two such schemes below.
Multiple-Capture
The multiple-capture scheme [2] can achieve high DR with high SNR but at moderate speed. This scheme increases dynamic range by sampling the signal nondestructively multiple times during integration. The HDR image can be constructed using the last-sample-before-saturation algorithm [17] as illustrated in Figure 3 .
To define DR and SNR, we assume uniform sampling time tcapt and that the filter only performs last-sample-beforesaturation and digital CDS. The maximum nonsaturating signal is given by imax qQmax/tcapt and the minimum detectable signal is given by imin q /ReadOuttint. Thus It can be shown that for iph > Qmax/tint, SNR> Qrnax/2 [10] . Note that this scheme provides high SNR at both the high and low ends. The accurate timing of the capture times and cancelation of the reset noise and offsets via digital CDS guarantee the high SNR of this readout architecture.
DR at the high end is directly related to tcapt. The general implementation of the multiple-capture scheme requires perpixel ADC [17] , and as discussed in [10] , multiple-capture achieves only high dynamic range at moderate speeds due to the limitation in the per-pixel ADC speed/resolution performance. Increasing imax requires decreasing tcapt. Generally for a given pixel area, the ADC speed can only be increased by reducing resolution, which results in SNR reduction. Synchronous Self-reset with Residue Readout
The synchronous self-reset with residue readout scheme proposed in [16] promises high dynamic range at high speed with low power consumption, but cannot achieve high SNR. The scheme is described in Figure 4 . The photocurrent is integrated and converted into voltage v(t), which is periodically compared to a reference voltage Vmax. If V(t) > Vmax, the comparator switches, the integrator is reset, and the counter is incremented. At the end of integration, the digitized value of v(tint) and the reset count are combined to estimate the photocurrent. Let nReset be the number of resets, then 
Vmax
To compute DR and SNR, we first compute the distortion due to the underestimation of charge resulting from saturation before synchronous resetting takes place (see the waveform in Figure 4 DR at the high end increases as tclk is decreased, which is possible if a simple, low power regenerative comparator is used. However, as discussed in [10] , synchronous self-reset suffers from low SNR at both the high and low ends. At the high end, it suffers from the underestimation of charge and large gain FPN due to comparator and self-reset offsets. It suffers at the low end since CDS is not performed.
As discussed, multiple capture achieves high SNR over the extended range, but cannot achieve the required 120dB of dynamic range at 1000 frames/sec. On the other hand, synchronous self-reset can achieve very high DR at high frame rates, but suffers from poor SNR at both the low and the extended ends. Figure 5 plots SNR versus photocurrent for the two schemes. Note the drop in SNR for synchronous selfreset in the extended range.
In the following section, we discuss the new Folded Multiple Capture HDR scheme [13] , which by combining features of the synchronous self-reset and multiple capture schemes discussed above, can satisfy the precision imaging requirements in IR with low power consumption and robust circuits. We first discuss the architecture and operation of FMC. We then describe a prototype of the architecture and experimental results obtained. 
III. FOLDED MULTIPLE CAPTURE
A block diagram of the FMC architecture is shown in Fig. 6(a) . Each pixel consists of an integrator, with reset that is controlled by a comparator, a counter, and a sampleand-hold (S&H). The S&H output is digitized by a fine ADC, whose output along with the counter values are fed to a filter that generates the photocurrent estimate. At each clock cycle, the integrated photocurrent, v(t), is compared to a threshold voltage Vth. The integrator is reset when the comparator output flips creating the folded waveform shown in Fig. 6(b) . Meanwhile, the integrator output is sampled and digitized at predefined sampling or capture times tl, t2.,... , tn. The capture times are synchronized with Clk, shifted by tClk/2 to avoid simultaneous reset and capture. The counter is incremented by the clock and reset by the comparator output signal. Its value, which corresponds to the effective integration time tlast-i (the time from the last reset), is read out at each capture time. The slope of the linear least-squares fit of the digitized capture values and their corresponding integration times is used to estimate the photocurrent (see Fig. 6(c) ). In effect, FMC performs n regular captures during an exposure time and combines them to achieve a high fidelity estimate of the photocurrent. Dynamic range is extended by 2tint/tClk over the integrating capacitor dynamic range. For example, for tint1tClk = 1000, DR increases by 66dB. Fig. 7 shows example waveforms for tint/tClk = 8 and four capture times.
A low input photocurrent (see Fig. 7(a) ) results in no reset and the scheme reduces to a conventional FPA with Fowler readout [15] . A high photocurrent (see Fig. 7(c, d) ) results in periodic reset. Unlike other self-reset schemes discussed earlier, however, the number of resets is not used to estimate the signal.
For low power, the number of captures used in achieving the high fidelity estimate of the photocurrent must be small. A surprising fact about FMC is that only 3 to 4 scene-independent globally set captures are needed to achieve uniformly high SNR. We wish to select capture times to guarantee a minimum SNR of Qrnax/2, for photocurrents > qQmax/tint. Note that v(t) settling time of the S&H circuit.
A. Implementation A prototype of the FMC architecture has been implemented in a 0.18,um CMOS double-poly, five metal-layer process. A block diagram of the pixel readout circuit is depicted in Fig. 8 . To maintain compatibility with IR detectors, we use a Capacitive Trans-impedance Amplifier (CTIA) as an integrator. The comparator is implemented using a regenerative architecture. A slow fall NAND gate is used to reduce the random charge injection on the feedback capacitor. The S&H block consists of a source follower with dynamic bias control followed by the sampling circuit which is followed by column readout circuitry. All analog circuits operate at 3.3V. The digital portion of the pixel consists of a 9-bit ripple counter with an output register. All digital circuits operate at 1.8V and a level shifter is used to drive counter reset. After each capture, the analog capture values and the latched counter values are readout serially from each column off-chip.
The chip micrograph is shown in Fig. 9 . Four columns have pixels with NWELL/PSUB diodes and the fifth has pixels driven by external current sources. Provisions have been made for bump-bonding IR detectors adjacent to the diodes. The analog and digital periphery circuits are placed at opposite ends of the pixel array. The Timing Control block generates all control signals. The clock rate (and thus dynamic range) and capture times are programmable via a scan chain. Each pixel occupies an area of 30,um x 150,um (40% analog, 60% digital). The digital section is implemented using standard cells and is readily miniaturized with custom design. Analog area is dominated by the CTIA and S&H to meet the linearity requirement. In a 3D-IC implementation of the fully integrated imaging system [1] , each pixel is estimated to be 30,um x 30,um with 2 analog and 1 digital circuit layers.
B. Experimental Results
A uniform LED illuminator is used as the light source for characterization. The chip analog column outputs, digitized using an on-board ADC, and the chip digital column outputs are transferred to a PC via an FPGA-based data acquisition board. Least-squares fit of the digitized capture values and corresponding effective integration times to estimate photocurrent is then performed in software.
The linearity and SNR are characterized locally at multiple random intervals. Experimental SNR versus piph results are shown in Figure 10 . Read noise is expected to be lower with test setup improvements. Table I summarizes the chip characterization results. The power consumption per pixel is 25.5,uW and dominated by the CTIA. This corresponds to energy consumption of 25.5nJ for each pixel readout with DR = 138dB and SNR = 60dB. Note that the CTIA power consumption can be significantly reduced, e.g., using switched biasing, with knowledge of the detector parameters. 
