techniques used in biology have a range of required specifications. On one end of the spectrum are photon-starved processes, such as multiphoton [2] and confocal fluorescent imaging [3] , which need very high sensitivity and low dark current. On the other hand, certain functional imaging techniques that look at fast dynamic changes, such as action potentials, require fast frame rates and a high dynamic range apart from good sensitivity. Examples include imaging voltage [4] and calcium [5] sensitive dyes.
While imaging in vitro systems, such as fixed specimens or tissue slices, can be used to study anatomy and functional connectivities, imaging in live animals has opened a window to investigate physiological processes in their native, unperturbed state. Most in vivo imaging is performed in anesthetized and restrained animals. However, a fast emerging area is imaging in awake and behaving animals. The majority of the work in this area has used optical-fiber bundles or electrical cables, tethering the animal to traditional imaging system components [6] [7] [8] . This relaxes power and size constraints on the photodetector, allowing the use of commercially available large, power-hungry cooled CCD or CMOS cameras. However, the tethers imposed by these systems greatly limit the nature and duration of imaging studies that can be performed. A few groups are attempting to move the entire imaging apparatus into a compact device that can be affixed to an animal for chronic imaging [9] [10] [11] , avoiding the use of optical fibers and cables. Photodetectors for these systems need to be compact and low powered in addition to having sufficient sensitivity and high signal-to-noise ratio (SNR) performance comparable to detectors traditionally used for biomedical imaging.
Though charge-coupled device (CCD) and complementary metal-oxide semiconductor (CMOS)-based image detectors were both invented around the late 1960s, their development took different routes [12] . From their inception until the 1990s, fabrication technology was not developed enough to take advantage of the key benefits of CMOS detectors-the ability to integrate circuits on the image plane. CCD detectors [13] , [14] were optimized for applications requiring very high sensitivity and superior low-light performance. As a result, almost all low-light biomedical imaging is performed with CCD-based imagers. Today, CMOS detectors [15] , [16] offer compact, single-chip, low-power integrated systems capable of not only detecting photons but also performing signal and image-processing operations [17] , [18] . However, they still lag behind CCD detectors in sensitivity and low-light performance. For this reason, CMOS detectors are used only in medium-to high-light applications where dynamic range or speed rather than sensitivity is critical [19] .
The requirement for high performance with a small powerand size-footprint has led to some recent work on high-sensitivity CMOS imagers for biomedical applications. Ng et al. presented a 176 144 imager with 7.5-m pixels in a 0.35-m CMOS process as part of an integrated system that could be implanted in deep brain structures for fluorescent imaging [20] . The minimum-detectable signal was 100 nW/cm at 470 nm with a frame rate of 0.31 frames/s. Eltoukhy 128 imager with 7 m pixels in a 0.18-m CMOS process with an integrated fluorescence emission filter for contact imaging [22] . At 30 frames/s, the minimum detectable intensity was 400 nW/cm at 450 nm with an SNR of 15 dB. Park et al. presented a 32 32 imager with 75-m pixels in a 0.5-m CMOS process for voltage-sensitive dye imaging [23] . At 40 frames/s, the minimum-detectable signal was about 10 lux 1 with an SNR of 35.2 dB. None of the aforementioned systems combine the sensitivity, resolution, speed, and flexibility required for a detector capable of imaging single-cell fluorescence at low illumination levels. We present a 124 132 imager with 20.1-m pixels in a 0.5-m CMOS process, capable of imaging single-cell fluorescence at light levels equal to those required by cooled CCD cameras. The five-transistor pixels feature n-well/p-subphotodiodes and a capacitive transimpedance amplifier (CTIA) for signal amplification [24] and noise reduction [25] . The novel design splits the amplifier with only nMOS transistors inside the pixels. Each column shares the pMOS transistors of the CTIA. This paper is organized as follows. Section II describes the circuit design of the imager including the photodiode, the active pixel sensor circuit, and peripheral circuits. Circuit analysis including simulation and noise analysis is presented in Section III. Section IV shows measurement results from the array and Section V concludes this paper.
II. CHIP ARCHITECTURE
The design of the chip can be divided into three distinct modules: 1) the photodiode, 2) the capacitive transimpedance amplifier (CTIA), and 3) the peripheral circuits. All metal interconnects were done in metal1 and metal2 layers. Nonphotosensitive areas of the chip were covered by metal3 and care was taken to design the sensitive analog output pads without protection diodes to prevent nonspecific photo-induced output from outside the pixel array.
A. Photodiode
Based on our prior work characterizing photodiodes in 0.5-m CMOS technology [26] , the imager array was designed with n-well/p-sub photodiodes. Some of the advantages of this topology are: 1) low-doped n-well creating a wider depletion region, increasing the collection efficiency of the junction; 2) since n-wells are created by diffusion, the junction tends to be deeper than n+/p-sub junctions with significant sidewalls, further increasing collection efficiency; 3) wider depletion region leads to a smaller capacitance, increasing the charge-to-voltage conversion ratio. The pixel was designed with a pitch of 20.1 m. The photodiode area was 170 m , leading to a fill factor of 42%. While this seems small, scalable design rules for the process used require a minimum of between n-wells. In 0.5-m CMOS 0.3), the largest photodiode that can fit in a 20.1 m pixel is m m with a fill factor of 53.5%. The perimeter of the photodiode, a measure of the lateral sidewall dimension, was 58 m. The photodiode was designed to occupy a roughly square area in a corner of the pixel with a two-sided border containing circuits and metal interconnect lines. Corners of the photodiode were cut to reduce dark current. The layout of the pixel showing the photodiode location and geometry is shown in Fig. 1 .
B. Capacitive Transimpedance Amplifier (CTIA)
CTIAs have been used in image sensors initially as column amplifiers [27] and as in-pixel amplifiers [24] . These designs included entire CTIA layouts in the pixel, requiring area-expensive pMOS transistors in the pixel. Our design partitions the amplifier and includes only nMOS transistors in the pixel. The pMOS transistors that complete the amplifier are shared by all pixels in a column. This partitioning follows from the detector implementing a rolling shutter where only one row needs to be fully active at any time. Fig. 2 shows the schematic of a single pixel enclosed within the dotted line containing nMOS transistors M1-M5, feedback capacitance , and the photodiode. Also shown are the column level pMOS transistors that complete the CTIA (M6, M7).
Transistors M1 , M2 , M6 , and M7 implement a cascoded high gain inverting amplifier which constitutes the CTIA when a particular row is selected with M3 and M4 , which are minimum-sized switches, being shorted by a logical high on the row select line (RS). Transistor M1 was sized and biased to be in subthreshold, maximizing the gain-to-current ratio for maximum energy efficiency and minimum noise. M6 and M7 were sized relatively larger to ensure sufficient drive capability for the long output lines. Poly1/poly2 capacitor acts as the feedback element in the closed-loop CTIA with a value set to 5 fF . Transistor M5 was a minimum-sized switch which served to set the amplifier output to its inversion point. Bias voltages , , and were generated off-chip. Layout and relative positions of the transistors and the capacitor are shown in Fig. 1 and a simplified block diagram is shown in Fig. 3(a) . The CTIA output was connected to a column-level delta difference sampling (DDS) circuit that is described in the next section.
C. Peripheral Circuits
The peripheral circuits for the imager array consist of pMOS transistors that complete the column-level CTIA, sample-andhold-based circuits for column-level delta difference sampling, circuits for row and column scanning, and output buffers.
1) Delta Difference Sampling: Fig. 3(b) shows the schematic of the circuit for performing DDS which calculates the difference between a pixel's light-dependent signal value and the subsequent reset value as a measure of the photon flux [28] . Capacitors and were poly1/poly2 capacitors sized to 150 fF, laid out as a parallel combination of six 25-fF unit capacitances for good matching [29] . Transmission gates T1 and T2 and nMOS switch M1 were realized with minimum-sized transistors. T1 and T2 were driven by nonoverlapping clocks generated on-chip from an external clock HOLD. M1 was driven by an external signal SAMPLE. The amplifier was realized as a single-stage cascoded inverting amplifier as shown with M2 sized and biased in subthreshold. Since the drive requirements for the amplifier are smaller than those for the in-pixel CTIA, M3, M4, and M5 were sized small to pitch match the circuit to the pixel. The DDS output was multiplexed to an output buffer controlled by the column-ring counter, described in the following section. Bias voltages , , , and were generated on-chip using a resistor chain between Vdd and ground. Note that these biases were different from the ones for the in-pixel CTIA.
2) Row and Column Scanners: The row scanner consisted of circuits for selecting and resetting an addressed row shown by signals and in Fig. 2 . For addressing rows, a 124-b circular shift register was implemented. The column scanner consisted of the pMOS transistors of the pixel CTIA, the DDS circuit, and a 132 to 1 multiplexer for connecting the output of an addressed column to the output buffer. The multiplexer was implemented as a switch array with one transmission gate for each column. A similar multiplexer also connected the pixel output directly, without the DDS circuit, to a second output buffer. Column addressing was done by a 132-b circular shift register. The row and column shift registers were driven by nonoverlapping clocks generated from external signals and . Both registers could be programmed with a desired sequence from an external pin and the respective clock signals.
3) Output Buffer: A very simple output buffer was implemented as a single large nMOS transistor . The gate of the transistor was driven by the output of the 132 to 1 multiplexer in the column scanner. Both the source and the drain of the transistor were brought out to pads on the die. An off-chip resistor was used to configure the transistor as a source follower to buffer the output of the DDS circuit. A similar circuit buffered the output of the pixel directly without delta differencing.
III. CIRCUIT ANALYSIS AND SIMULATIONS

A. Pixel Operation
Consider the simplified circuit of the pixel when its row is selected as shown in Fig. 3(a) with and referring to the photodiode and feedback capacitances. Let the CTIA, composed of in-pixel nMOS transistors and column level pMOS 
Equation (3) describes the operation of the CTIA pixel. During the pixel reset, signal is high, forcing the photodiode and the amplifier output nodes to the inversion point of the amplifier. Once is released, the amplifier pins the output node of the photodiode and forces the photocurrent to integrate on . is a design parameter unlike , which is dependent on the size and the nature of the photodiode junction. Effectively, by implementing , a gain of can be achieved over the operation of the standard three-transistor (3T) pixel [16] . This increases the sensitivity of the pixel by improving the charge-to-voltage conversion by a factor given by the CTIA gain. Another departure from 3T operation is that due to the inverting nature of the CTIA, the output of the pixel charges upward toward Vdd in response to light as opposed to discharging toward ground. Mathematically, . 
B. DDS Operation
The DDS operation [28] consists of two phases. With reference to Fig. 3(b) , in the first phase, and are high while is low. Let the input to the DDS circuit in this phase. Capacitors and store charges and , respectively, where is the inversion point of the amplifier. In the next phase, and go low while goes high. Let the input to the DDS circuit in this phase. Now, the charges stored in and are and , respectively. This follows from the fact that during the transition between the two phases, the input node of the amplifier is effectively floating and cannot change. Due to conservation of charge, and assuming , we can write (4) which simplifies to (5) Thus, the circuit effectively computes the difference between voltages applied to its input at two phases, offset by a bias . The following section describes how the circuit computes the light-dependent signal generated by the pixel. Fig. 3(c) shows a timing diagram for the peripheral circuits and serves to illustrate the readout sequence of the entire array including pixel addressing, DDS operation, and output multiplexing. Prior to the times shown in the diagram, the row and column-ring counters are loaded with a single 1 at the least-significant bit (LSB) position. Consider an arbitrary starting point . The rising edge of causes the row ring counter to increment and select the th row of the array by setting high. The CTIAs of all the pixels in row are now complete and their outputs are connected to the respective column level DDS inputs. Following this, is pulsed high which constitutes the first phase of the DDS operation, storing pixel outputs after photocurrent integration in addition to the pixel reset value. Next, nonoverlapping clocks and are inverted, leading to the second phase of DDS operation where the computation of (5) is performed. Now the output of each DDS circuit reflects the difference between the integrated photocurrent and the subsequent reset value. While this is similar to corelated double sampling (CDS) [30] , which reduces reset noise, DDS operation adds to the noise since consecutive reset levels may not be the same.
C. Peripheral Circuit Operation
By virtue of the single 1 loaded into the column-ring counter, the output of the 132 to 1 multiplexer is equal to the DDS output of the first pixel in the th row. This value is read out by an off-chip analog-to-digital converter (ADC). Next, 132 pulses on allow the sequential acquisition of all the pixels in that row. Finally, the clocks are inverted again and is pulsed, leading to the deselection and selection of the th and the th row, respectively. This cycle is repeated 124 times at the end of which the entire array has been read out. It follows that the integration time is where . Without a mechanical shutter, this imposes a lower limit of the exposure time dependent on ADC speed.
D. Noise Analysis
Photodetector noise consists of several sources [31] . Temporal noise sources include shot noise, reset noise, and read noise. Shot noise is inherent in the production of photons from any source. Reset noise is the noise sampled onto the photodiode capacitance during pixel reset. Read noise is composed of thermal and flicker noise in the readout chain of the array. Spatial noise originates from fabrication mismatches across the array. This can manifest as an offset or a gain error. Since the active reset employed minimizes reset noise [25] , the dominant temporal noise source is the read noise.
The primary read noise contribution is from the CTIA formed by M1, M2, M6, and M7 with noise from M1 dominating. At the designed transconductance and bias current of M1, thermal noise dominates over shot noise. The input-referred noise power spectral density of a transconductance amplifier is given by [32] (6) where , , and are the Boltzmann constant, absolute temperature, and the amplifier transconductance, respectively. is a constant between 2/3 and 2 depending on the amplifier design. Considering the low-frequency small-signal equivalent circuit of the CTIA shown in Fig. 5 , the output-referred noise power is (7) where is the transfer function of the CTIA. From Fig. 5 , we can write (8) (9) where is the parasitic load capacitance driven by the CTIA. Eliminating from (8) and (9), we can write (10) Recognizing to be of the form and recalling (7) can be simplified to (11) 
From (3) and (12), the SNR is monotonically increasing for decreasing . However, reset noise and mismatch limit the minimum that can be implemented. Based on prior measurements from test structures, we chose 5 fF. With 1 pF, 5 fF, 250 fF, and 1.5, the output referred rms noise voltage due to read noise is about 560 V at 300 K.
IV. RESULTS
The 124
132 imager array was fabricated in a 0.5 m three-metal, two-poly CMOS process. Fig. 6 shows the annotated micrograph of the chip. The die size was 3 mm 3 mm. Area excluding the pads was 7.88 mm with the pixel array occupying 6.61 mm , the column scanner occupying 0.67 mm and the row scanner occupying 0.24 mm . Pads were confined to two sides to maximize area usage.
A. Characterization
For characterization, a microcontroller (Microchip, Chandler, AZ) was used to generate all control signals. Independent 3.3-V regulators were used to power the analog and digital supplies of the chip. Off-chip biases were generated by a 12-b digital-toanalog converter (DAC) chip. The analog output of the chip was buffered using a unity gain amplifier and digitized to 16 b at 1.5 MS/s with a data-acquisition card (National Instruments, Austin, TX) and read into a computer for analysis. Fig. 7 shows an oscilloscope plot illustrating the raw output of a pixel without delta differencing. Signals and are also shown. Prior to the trigger point, the pixel in the first row and column is selected, completing its CTIA. During the first 100 s after the trigger, the pixel was reset with the output going to the inversion point of the CTIA. Then, was released and the positive slope of the pixel output can be seen. Following this, is pulsed, disconnecting the CTIA of the first pixel, and completing the CTIA of the second pixel in the column. Immediately afterwards, goes high, sending the output to the inversion point of the CTIA in the pixel in the second row, first column. Next, is pulsed 122 times, with the th pulse disconnecting the CTIA in the th and connecting it in the th row. During this period, the output of the pixels can be seen to remain at the inversion point of the respective amplifiers. Finally, goes low and there is one more pulse on which disconnects the CTIA in the 124th row and connects the CTIA of the pixel in the first row, first column again. The output can be seen to follow the same linear trajectory as seen before scanning. This confirms the linear photoresponse of the pixel and the row scanning operation. Fig. 8 shows the output of all column-level DDS circuits in response to three different light levels-0 or dark, and . was an arbitrary intensity and is derived from by using a OD 0.3 neutral density filter inserted in the optical path. After removing the 240-mV offset in an off-chip calibration step, the DDS output can be seen to double from 190 to 380 mV in response to light intensity increasing by a factor of two, indicating proper delta differencing. Column scanner operation can also be seen as the DDS output remains stable as all the pixels in a row are scanned by pulsing . One would expect the output of all pixels to be the same as they were under flat-field illumination.
While the results presented so far focus on the operation of a single pixel and the DDS circuit, we now show measurements from the entire imager array. Fig. 9 shows the average digitized output of all pixels in the chip for increasing light intensities measured by a radiometer. The imager was run at 70 frames/s with an exposure time of 14.3 ms. A blue light emitting diode centered at 450 nm, driven by a constant current source, was used to illustrate the chip. A diffuser was used to generate flat-field illumination. A blue light-emitting diode (LED) was chosen to obtain a lower limit of the imager performance. From spectral response measurements of the pixel [26] at 450 nm, the photodiode sensitivity is about 65% of the peak sensitivity at 650 nm. Fig. 10 shows the average SNR of the array measured for the same intensities. Average SNR was defined as (13) where is the number of pixels. and are mean and standard deviation of the output of the th pixel calculated over a thousand frames. To characterize the entire array, signal contributions of each pixel were added in phase and the noise contributions were added in quadrature. As can be seen, at 70 frames/s, the minimum detection limit is about 4 nW/cm , which compares favorably with several recent low-intensity imagers shown in Table I . The peak SNR of 44 dB at 1 W/cm is likely limited due to the small full-well capacity imposed by the 5-fF sense node capacitance. Factoring in the pixel area and the exposure time, the limit corresponds to 36.4 photons/ms/pixel, suitable for single-cell fluorescence imaging [19] .
The dark signal was measured over an exposure of 1 s with a response of 742.2 analog-to-digital counts. This implies Fig. 10 . SNR of all pixels in the array over 1000 frames with a 14.3-ms exposure time (70 frames/s) for increasing 450-nm light intensity. a dark signal of 113 mV/s. Using (3) and our pixel area of , this equates to a dark current density of 0.14 nA/cm . Comparing the dark signal with the saturation value from Fig. 10 , the imager will saturate in darkness at integration times longer than 3 s. This limit can be extended if a low dark current, imager optimized CMOS process is used instead of the standard mixed-signal 0.5 m process used here.
At 75% of saturation, fixed pattern noise (FPN) of the entire array was 0.84%. The FPN was calculated as the mean of the standard deviations of all the pixels computed over a thousand frames at an intensity of 0.5 W/cm . The FPN of pixels within a single column that share the p transistors of the CTIA and the DDS circuit was 0.66%. The corresponding numbers for 0% saturation were 0.99% and 0.86%. The FPN is relatively high due to the use of very small feedback capacitance ( 5 fF) in a 0.5-m technology where the recommended size of well matched capacitors is 100 fF [29] . Any mismatch in the in-pixel photodiodes and feedback capacitors, and the column parallel DDS circuits will manifest as FPN.
The measured output-referred read noise was 824 V. The corresponding input-referred noise is lower by a factor given by the CTIA gain. Using (3), the input-referred charge noise was estimated to be 26 . To estimate the reset noise, we made use of the fact that each reset sample is comprised of fully correlated reset noise and an uncorrelated read noise [31] . By analyzing several instances of two consecutive reset samples, the read noise can be estimated. From a thousand reset frames, we With respect to CCD detectors, binning is a process in which the outputs of several pixels are combined before they are read off-chip, effectively creating larger photosites. The purpose is to trade off spatial resolution for increases in sensitivity and frame rate. Binning could be implemented in our detector by skipping rows and columns during the readout phase and not resetting the skipped photodiodes. Once those photojunctions saturate, the photoinduced electrons can diffuse to the junctions that are not being skipped, effectively increasing the size of each photosite. Fig. 11(a) shows the effect of 4 4 binning on the sensitivity and the SNR of the chip. The maximum frame rate increased to 675 frames/s from the earlier maximum of 70 frames/s. Alternately, if the frame rate was kept at 70 frames/s, the detection limit could be reduced to about 0.8 nW/cm . Fig. 11(b)-(d) shows the effect of binning on spatial resolution. From left to right, the images are at the native resolution (132 124, At 70 frames/s, the chip consumes a total of 718 A of current from a 3.3-V supply, leading to a power draw of 2.37 mW. Digital circuitry, including pixel scanners and clock generation, consumes 58 A while the pixel array, DDS circuits, and bias generation circuits consume 660 A. Table I shows a comparison of several recently published low-intensity CMOS imagers designed for biomedical applications. The detection limits were reported at different SNRs, incident wavelengths, and exposure times. The table shows the incident intensity and the number of photons required by each detector to achieve the reported SNR. For comparison, we measure how many photons our detector needs to achieve the same SNR at the reported wavelength. We also calculate the incident intensity required by our imager to equal the reported SNR after scaling our pixel area and exposure time to match the reported values. These data are shown in Table II The calculations were performed by using the measured SNR versus intensity curve (Fig. 10 ) and the spectral sensitivity of n-well/p-sub photodiodes fabricated in the 0.5-m CMOS process used [26] . Standard luminosity functions [33] were used to convert from units of lux to W/cm . Fig. 12 shows images of mouse spinal cord neurons with immunostained neurofilament, a structural protien found in neurons. The primary antibody used for labelling was SMI-32, which binds to nonphosphorylated neurofilament [35] and the secondary antibody was conjugated to a fluoroscent cyanine dye, Cy-3. Neurofilaments play a role in controlling axonal diameter and nerve conduction velocity. Abnormalities in the protein lead to phenotypes similar to amyotrophic lateral sclerosis (ALS) [36] . Fig. 12 (a) was taken with a cooled CCD camera [34] , the standard in biomedical imaging, while Fig. 12(b) is from the imager designed in this paper. Since the CCD pixels are smaller than the CMOS pixels (6.45 m versus 20.1 m), CCD data were binned for fair comparison. The images were taken at a magnification of 20 through a Nikon epi-fluorescent microscope with a constant incident intensity and an exposure time of 36 ms for both detectors. Fig. 12(c) compares the average SNR of the two imagers computed by using (13) for regions of different saturation levels within each image in order to compare performance in bright and dark regions of an image.
B. Fluorescent Imaging
The sensitivity and the SNR performance of the imager are comparable to the cooled CCD camera in light and dark areas of the image. However, the CCD detector has better performance in the dark areas of the image (saturation 40%) due to its shot noise limited nature and low dark current. Our detector is limited by read noise due to relatively high noise from the high gain CTIA, which was required to boost the sensitivity. Migrating to an imager optimized process with higher quantum efficiency should relax the requirement on the amplification. The CMOS imager achieves a higher frame rate than the CCD, 28 frames/s versus 20 frames/s, due to global shuttering.
V. CONCLUSION
We presented a high sensitivity image sensor fabricated in a standard, nonimager optimized, 0.5-m CMOS process. With a 20.1 pixel size and a frame rate of 70 frames/s, the imager was capable of detecting down to 4 nW/cm of 450-nm light while consuming 2.37 mW. Inclusion of a capacitive transimpedance amplifier enables enhanced sensitivity while controlling the reset noise. Binning was implemented to increase the frame rate (to 675 frames/s) or sensitivity (to 0.8 nW/cm at 70 frames/s) at the cost of spatial resolution. We were able to image fluorescence down to the single-cell level with sensitivity and a signal-to-noise ratio comparable to cooled CCD cameras.
This work was geared toward applications that require the best of both worlds: the sensitivity and low light performance of CCDs and the power requirements, modularity, and system-on-chip capability of CMOS detectors. In particular, we are integrating the presented detector with a miniaturized microscope to create a system that combines illumination, optics, and photodetection [11] . The power characteristics of the imager allow battery operation and the high sensitivity and SNR performance will permit the creation of a small footprint device that can be attached to a rat skull and used to chronically image the brain in awakened and behaving animals. The lack of optical fiber or electrical cables will allow tether-free operation, enabling a wide range and duration of imaging studies. This device may extend our understanding of structural changes in the brain with normal development or in response to abnormal pathologies. Also, by using optical functional imaging techniques, such as voltage-sensitive dyes [4] , calcium dyes [5] , or laser speckle [37] , one can probe neuronal and vascular activity in relation to behavioral activity. In closing, we expect the CMOS imager presented here to be an integral component of miniaturized systems enabling imaging studies in freely-moving animals without any restraints.
