Introduction
Imaging detector arrays and image processing circuits are critical components in many consumer, industrial, and military focal plane imaging array systems. System specifications emphasize different aspects of imaging array technology, including high frame rate, large array size, high fill factor, and high pixel resolution. Over the past several years, significant advances in focal plane array (FPA) development have been achieved. The number of pixels has increased to several thousand on a side, and the resolution of the arrays also has been increased [1] .The combination of higher resolution with a larger number of pixels has resulted in data rates that can not be transmitted off of the FPA through a one-port readout system. This data bottleneck is exacerbated when the application demands high frame rates, which further challenge the data transfer rate off of the FPA. Important emerging high speed imaging applications include combustion, transMach fluid flow, and aerooptic sensing, which required high speed data rates that cannot be handled by conventional image data transfer methods [2] [3] [4] .
In addition to enhanced detector array performance, the integration of image processing circuitry on the focal plane, or the realization of "smart" focal plane arrays, is an area of research under intense study [Error! Bookmark not defined.]. Through preprocessing of the raw image signal using on focal plane integrated circuitry, the data transfer performance limitation of the imaging system can be addressed. Conventional imaging systems use X-Y readout of the sensor array data followed by transportation of the data to an off-chip serial analog-to-digital (A/D) converter (ADC). On-chip A/D conversion is potentially a superior choice for an integrated imaging/preprocessing smart imager, since it can reduce the cost, complexity, weight, and pick-up noise of the FPA system. The noise bandwidth depends on the on-chip architecture. Three different onchip readout systems have been developed [1] [5] [6] . The first is a serial readout system, involving X-Y multiplexing of sensor data to a single on-chip ADC. The second approach is a semi-parallel readout system with an ADC for each column. The third option is a parallel readout system with an ADC dedicated to each pixel.
The parallel readout system is the best choice to accommodate ever increasing data rates for off-chip data transfer. This option performs the A/D conversion as early as possible in a signal chain to avoid processing and transportation of analog signals, and instead, utilizes the digital processing capability of modern CMOS processes. The per pixel A/D conversion is the extreme of this solution, namely, associating one ADC with each pixel. The advantages to this approach are that no signal degradation occurs when digital data is read out of the detector array, and the incident signal on the detector can be sensed and electrically integrated during the entire frame period. When this parallel readout architecture is combined with a three dimensional (3D) through-Si vertical optical communication link, a truly fully parallel readout system can be realized. Using this architecture, a virtually unlimited scalable high speed readout system for a FPA system can be demonstrated. This paper explores the design and implementation of a smart FPA using an integrated detector array and signal processing circuitry. Si CMOS detectors have been used for the imaging detector array, which are integrated directly with per-pixel Si CMOS sigma delta ADCs, which preprocess the detected image signals. Performance data for this integrated smart FPA is included herein. In a future system, these FPA signals can be passed down through a Si CMOS emitter driver which has been integrated onto the imaging chip for 3D vertical through-Si data transfer to a second chip containing an image processor. The design of the smart FPA is explored in the context of the final two layer image processing system.
Architecture
To achieve image processing systems that operate in real time, on large images with frame rates in the high kHz or MHz is beyond the capability of today's imaging systems. For example, a first-order sigma delta analog-to-digital converter (ADC) generating a sequence of 500x500 8 bit images at a frame rate of 100 kHz must be clocked at more than 655 GHz, which is far from practical. Even when parallel ADCs are placed along the edge of the imaging array, the problem is only partially mitigated because the speed at which the ADCs must operate still increases with image size. To generate 500x500 8 bit images at a frame rate of 100 kHz, we need 500 ADCs, on the same die, clocked at more than 1.31 GHz [7] . It is beyond any current technology to make that many high speed ADCs on one die. Thus, serial ADCs are inadequate for this task, and highly parallel ADCs must be examined for evolving high data rate systems.
In this paper, a fully parallel readout system is designed as a scalable FPA readout system. This readout system provides a scaleable solution to the real time high frame rate image capture problem when it is coupled to a massively parallel optically interconnected processor. To keep the design scaleable, the processors must reside beneath the imaging chip, which, in the final system, will use a through-Si vertical 3-D optoelectronic interconnect for parallel connection to the detector plane. This is a scaleable design, since, as the image size increases, the number of parallel vertical optical 3D links and the number of processors in the array can increase accordingly, thus maintaining the frame rate. The system design, using a through-Si integrated parallel optical data link is illustrated in Figure 1 (a).
The fully parallel readout system was designed so that each pixel has an associated ADC, and subarrays of these pixels/ADCs are served by one vertical optical link and one digital signal processing (DSP) unit to perform image processing. Each pixel has an associated ADC. To maximize the imager fill factor, it is necessary to minimize the area of the ADC circuitry. Thus, only the front end of the sigma delta ADC was implemented on a per pixel basis in the detector array. To maximize the imager fill factor, it is necessary to minimize the area of the ADC circuitry. Thus, a first order current input sigma delta oversampling ADC was chosen because it is possible to conserve space by only implementing the front end of the sigma delta A/D converter on a per pixel basis on the detection plane. . The advantage of this architecture is that the sigma delta ADC front end produces digital data, so further noise cannot be introduced to the signal by shifting the digital data. This is in contrast to analog data, which is the data format used in conventional FPA data links. An integrated optoelectronic emitter on each sub-array allowed through-silicon wafer output of digital image data from the focal plane to the processor stacked below each sub-arrays as shown in Figure 1 . An integrated optoelectronic emitter on each subarray allows vertical through-Si output of digital image data from the focal plane to the processor stacked below each focal plane imager. These integrated through-Si vertical optical data links have been demonstrated using stacked foundry Si CMOS circuits, and are a viable technology for 3D system integration. This 3D vertical coupling to the image plane allows the detector and processor arrays to be scaled while maintaining a fixed level of processing per pixel, as shown in Figure 1 .
Thus, the processing rate is independent of the total imager array size, resulting in a scalable readout system. The number of pixels in the subarray depends upon the bandwidth of associated processor circuitryFor example, the type of processor used is this design is the SimPil processor [8] [9] . If an 8x8 sub-array is used, the size of the processor and focal plane sub-array seems to match reasonably well: an 256×256 pixel imaging array with 8 bits of resolution on the focal plane could be achieved by tiling an 8x8 array of processors each operating at 168 MHz. Each pixel block converts the analog light intensity into a digital signal. The entire system is synchronous, and after each clock pulse, every pixel block produces one bit of data. This generated a two dimensional array of bits. All the generated digital output signals were amplified by the emitter driver to drive an integrated optoelectronic emitter on each sub-array, optically interconnecting the imaging/preprocessing array to the SIMPil processor on the second level of circuitry. After the optical signal is received by the receiver located on the second level of circuitry underneath the focal plane array, the optic signal is amplified and synchronized to the SimPIL digital signal processor by a clocked comparator. The serial output of the comparator is read into the SimPil processor by serial to parallel conversion. The signal path from image detector to signal processor is shown in Figure 2 . To achieve a 100 kHz frame rate, each SimPil processor needs to process data at 167 Mbps (for an 8x8 subarray image oversampled by 26). Assuming that a fully pipelined 8 bit processor is clocked at 167 MHz, a frame rate of 100 kHz is possible. Figure 3 shows simulation results of the bandwidth as a function of resolution and array size for three different architectures. Figure 3 (a) was obtained under following assumptions: the array size was 1000 x 1000; the frame rate of the system was 100 kfps; and 8 x 8 detection arrays were used for subarrays of a larger array size fully parallel system. These assumptions dictate the bandwidth of the first order sigma delta ADC, which is 168 MHz for a parallel system, 2.62 GHz for a semiparallel system, and 2.62 THz for a serial readout system. The system bandwidth was the same as the A/D converter bandwidth for the parallel and serial systems, but it was increased for the semiparallel readout system because there was only one processor for the whole FPA system. However the semiparallel and serial readout system bandwidths exponentially increased with array size. From the simulation results, semi-parallel readout system have less bandwidth, compared to the parallel readout system, when it was smaller than a 64 x 64 array size. These simulation results arose from the assumption that each row had its own signal processor for the semiparallel system. If there was only one processor for the whole system, the semiparallel system bandwidth would be the same as serial readout system. From the above two graphs, it is clear that the parallel readout system had less bandwidth than the other two readout systems, yet had the same resolution and array size. would not produce an optimal system if there were no area and power consumption restrictions. For the semiparallel readout system, it is a more optimal solution to use second-order sigma delta ADCs rather than first order sigma delta ADCs so long as the larger area is available. By using second-order sigma delta ADCs, the oversampling ratio can be decreased significantly [10] . Likewise, for serial readout systems, there is no limitation in choosing an ADC type. In the following simulations, a second-order sigma delta A/D converter was used for comparison. All the other assumptions were the same as previously stated. Figure 4 (a) is an interesting simulation result shwoing bandwidth as a function of resolution. At resolutions over 15 bits, the semiparallel readout system resulted in less bandwidth, compared to the parallel readout system (although the 4 GHz bandwidth is a challenging ADC bandwidth to achieve). Figure 4 (b) showed another interesting simulation result: the semiparallel readout system is better for array sizes under 288x288 provided that a signal processor supports each row.
Thus, for low resolution and large image arrays, the parallel readout system has the best performance for low resolution and large image arrays. In the next sections, the 1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.00E+12
1.00E+13
1.00E+14 
Sigma Delta Analog to Digital Converters
Modern short-channel CMOS processes offer a speed performance which is often far beyond system requirements. Speed will continue to improve as shorter channel lengths are available in the future. Accuracy and the component matching, however, are expected to become worse with decreasing linewidth. For a fully parallel FPA system, this is a potentially serious problem because there are thousands of ADCs working together, which necessitate good device uniformity to produce a uniform image.
Hence, it is an advantage to trade off speed and accuracy, thus resulting in a flexible system, which enables access to higher accuracy at the cost of speed degradation. This trade off can be realized using an oversampling converter, which is an ADC that trades off speed and component mismatch. An example of one such ADC is a sigma delta current input 1 st order modulator. The simplified architecture of this ADC is shown in Figure 5 . The blocks that make up the system will now be briefly described. Figure 5 . First order oversampling modulator Figure 6 shows the schematic of the first order modulator circuitry. In the smart FPA application discussed herein, a current buffer is needed between the photodetector and oversampling modulator. The proposed parallel readout system uses a current buffer as the front end of the readout circuit to provide low input impedance and a stable bias to the detector. A current buffer typically must provide a low input impedance to reduce the effects of the nonzero output admittance of the detector. It must also supply a specified DC bias voltage for the input device, to improve the linearity of the detector. In this work, a CMOS current buffer was used. The output of the current buffer was connected to an integrator and to a digital to analog (D/A) converter that was implemented by a current mirror with reset switches. The detector bias was controlled by the bias voltage source that was connected commonly with other pixels. The current source was also connected commonly with other pixels on the outside of the focal plane to save space and power. The current integrator was implemented with a capacitor whose value was determined by the input current size, readout speed, and noise. The last stage is a comparator, which compares the integrator voltage and reference voltage then makes one bit output data stream. The digital output of the comparator would be decimated and filtered by the filters that are programmed in the subsequent DSP chip. The comparator output was also fed back to the DAC to control the feedback current, which made the comparator output average track the input value. kfps system with maximum 2 µA input current and an 800 fF integrator size. The power density function (PDF) of the output code is shown in Figure 7 (b). Consistent with sigma-delta modulator properties, the noise increases with frequency. The modulator output signal was decimated and low pass filtered to produce the binary output code. 
CMOS Image Sensor
To integrate the detector imaging array onto the silicon circuitry, there are two options which can be explored. One is a hybrid integrated detector array [11] , and the other is a monomaterial detector [12] . The advantages of the hybrid detector are a higher fill factor since the detectors can be integrated directly on top of the circuitry, and independent optimization of the detector and the circuitry. Thus, the performance of the array can be significantly improved, but the system cost is higher since there is an assembly cost associated with the hybrid integration. The other image sensor that can be used accesses monomaterial integration for the detector, i.e., implementing the detector directly in the Si CMOS. The responsivity and wavelength of operation are limited, as is the fill factor, however, for many applications, the use of Si detectors is adequate.
CMOS-based image sensors offer the potential opportunity to integrate a significant amount of VLSI electronics on chip, and reduce component and packaging cost. A number of types of Si CMOS photodetectors have been reported in the literature. In this paper, a photodiode was used for the photodetector instead of a phototransistor because the photodiode has better linearity [13] . Additional problems are noise level and scalability. The Si CMOS pixel does not scale well to a larger array size and a faster pixel readout rate, since the bus capacitance and the readout noise increases. In this n p V bi G n n p p p p V paper, these drawbacks were addressed by using a current buffer for the front end of the readout system and by building one ADC per pixel.
The photodiodes were realized using a standard 0.8 µm n-well CMOS process through the MOSIS foundry. The physical size of the photodiode was restricted to 60 µm X 77 µm, because the pixel space was shared with the circuits and data lines. To increase the speed of the detector four parallel photodiodes were used rather than one larger photodiode to reduce the parasitic capacitance, as shown in Figure 9 .
(a) Schematic of each photodetector pixel.
(b) Layout Uniformity is one of the important characteristics of an imaging system. To get a good quality image with a minimum of correction needed, all detectors and their associated circuitry needs to respond to optical input uniformly. To measure the uniformity of the system described herein, a test was run with the readout system running below a 1 MHz system clock frequency, and the maximum data value was set to 64. An uncalibrated halogen light source which was large in comparison to the FPA size was used to illuminate the FPA from a distance of 3 m to produce a uniform intensity across the FPA. Figure 11 shows the test results of FPA uniformity. Standard deviation among the pixels was calculated to measure the uniformity. Twelve different light intensity values were used to measure the uniformity, and most showed good standard devination across the array. Table 1 shows the measured standard deviation for all 12 different optical input intensities. The test results included all noise sources from the detector and circuits. These test results show that the standard deviation decreased with increasing input intensity. The optical intensity was controlled using neutral density filters to generate outputs between 0 and 64. The last test result in Table 6 .3 was not valid because the circuit was saturated. The optical input intensity is reported as a percentage of the maximum incident optical input. When the optical input was more than 15% of maximum, the output was saturated. Photodetector linearity was tested with the halogen optical source and four neutral density filters. By combining the four optical filters, 16 different optical input intensities were measured. However, the circuit was saturated with 5 out of 16 light intensities, and they were removed from the evaluation. Figure 11 
Conclusions
In this paper, a scalable fully parallel readout system for focal plane arrays is demonstrated. To realize a high speed parallel readout system, a compact size 1 st order current input sigma-delta modulator was designed to support each pixel in the imaging detector array with an ADC front end dedicated to each pixel was implemented and tested in Si CMOS. Test results on the FPA indicate good array uniformity and linearity.
