Abstract A CMOS image sensor capable of selectively varying each pixel's exposure time supported by modified X-Y addressing scheme and pixel sequential readout architecture to increase the intra-scene dynamic range for a macro-pixel which the focal plane is divided into is discussed. The proposed architecture provides finer granularity of exposure time control compared to our previous image sensor from a macro-pixel to a single pixel. As a result, a generation of under-exposure due to large illumination difference within a macro-pixel can be suppressed, which enables sophisticated high-level functionalities in machine vision application by scene recognition. We fabricated a test chip using 0.18-μm 1P5M standard CMOS process.
Introduction
In machine vision application such as advanced car safety system and surveillance system, wide dynamic range capability and high sensitivity performance need to be enhanced to execute high-level tasks, for instance car and pedestrian detection/identification and tracking 1) 2) 3) . Even if under severe illumination conditions which vary from dark night to bright noon/sunset and lighting equipment, image sensor needs not to be overor under-exposed as it results in either no contrast or too low to be computer-processed. It is of great advantage for image sensors in machine vision application if they can provide wide dynamic range performance especially for high light or bright side. It is because once a photo-diode (PD) in a pixel is saturated, contrast information is lost and a resultant image becomes flat. On the contrary, assuming the ideal noise-less pixel readout channel, a subtle contrast in low light can be reconstructed albeit faded by photon-shot noise; in other words, the objects in low light can be detected fundamentally whereas over-exposed pixels cannot. This fact we think is one of key points and may require specialized image sensors in machine vision application.
Wide dynamic range imaging is one of active research topics and many approaches have been proposed. These approaches can be categorized into six from broader point of view; 1. Modifying photon-induced pixel response 4)˜6) , 2. Modulating exposure time 7)˜10) , 3. Incorporating analog memory 11) 12) , 4. Incorporating different in-pixel amplification 13) , 5. Adaptively varying column amplifier gain 14) , 6 . Uniformly varying per-pixel programmable amplifier gain 15) . Of these, for extending the bright side, log pixel 5) , multiple exposure time frames 8) , spatially varying exposure (SVE) focal plane 9) , lateral overflow integration capacitor 12) , dual conversion gain 13) schemes are excellences and look practical as of current technology. SVE focal plane has pixels with different exposure times in form of pre-defined patterns. Thus, large illumination range can be linearly captured by trading off with the spatial resolution. For image sensors with small number of pixels as in machine vision, a loss of spatial resolution is critical. For multiple exposure time frames, it does not lose spatial resolution but equivalent frame rates are lost as multiple frames need to be captured to reconstruct a single frame. For lateral overflow integration capacitors, it also linearly captures extremely large illumination range but the pixel size needs to be increased for an additional capacitor and transistors. For dual conversion gain, a priori information of which conversion gain to be used needs decided before capturing.
Also, per-pixel selection is not available. Considering these, the conventional technologies are not necessarily optimal to machine vision application. We have been investigating CMOS image sensor whose focal plane is divided into several segments and redefine it as a macro-pixel 16) 17) . The macro-pixel based image sensor enables the adaptive imaging that optimizes the frame rates and exposure times on each segment simultaneously within an intra-scene, which is suitable for machine vision such as computer recognition. A concept behind the proposed extended dynamic range imaging is to adequately capture incident light of any intensity properly and adaptively within PD's full-well capacity. We believe by doing so throughout all pixels with image processing support, an extended dynamic range imaging suitable for machine vision application that requires shorter latency in general can be realized. With this context in mind, a flexibility of exposure time control on the arbitrary shapes is necessary to provide extended dynamic range images with only one frame readout unlike the multiple frames with different exposure times, and preserve contrast linearly unlike the log pixels. In addition, we learned from our previous chip that the finer granularity of the flexibility is important to make the concept valuable as well as practical. In this paper, we explain improved image sensor based on the macro-pixel structure that extends dynamic range while prohibiting the under-exposure caused in our previous chip by introducing the per-pixel exposure time control in conjunction with pixel sequential readout architecture, and integrates an on-chip Analog-to-Digital Converter (ADC) for faster frame rate.
Macro-Pixel Based Image Sensor Concept

16)
The conceptual diagram of structures for both the conventional and macro-pixel based image sensors is depicted in Fig. 1 . The conventional structure has a single focal plane, in which exposure time and frame rate are controlled globally. On the other hand, the proposed focal plane structure consists of several macropixels. There are two benefits come from the structure. Because each macro-pixel has its own circuit to readout and control imaging parameters individually, it enables frame rate and exposure time be optimized independently on the focal plane to accommodate both wide illumination range and fast moving objects by varying temporal resolution to suppress motion blur. This flex- ibility creates the adaptability that develops increased imaging performance over the conventional structure. The second benefit is that the macro-pixel based structure is suitable for highly parallel processing because of its distributed structure. A macro-pixel can be coupled to an external processor engine which forms tight feedback path to execute real-time machine vision tasks.
Selective Exposure Time Control in Macro-Pixel
Our previous chip demonstrated extended dynamic range imaging by adaptively varying macro-pixel's exposure time individually 16)17) . However, because the granularity of exposure time control is a macro-pixel which is rather coarse, under-exposed regions appeared when exposure time is set for high light. This phenomenon is not appreciated and should be avoided considering our target machine vision application. Thus, we introduce X-Y addressing scheme and pixel sequential readout architecture to solve the issue. To realize the X-Y addressing scheme, two independent but identical scan circuits for both the horizontal and vertical scanners with some logic gates are implemented as shown in Fig. 2 to properly select the target pixels. The outputs from the correlated double sampling (CDS) V/H scanners are directly used as enable signals to the X and Y PD reset signals, PIXRSTR and PIXRSTC, respectively, whereas the outputs from the WDR V/H scanners are AND-gated with the extra PD reset enable signal, WDR EN. The signal WDR EN is controlled by an external field programmable gate array (FPGA) on pixel-by-pixel manner to assert the extra PD reset when necessary. As shown in Fig. 3 , there are at most two different pixels being reset simultaneously as in time t1 or only one pixel for the CDS readout as in time t2. The exposure time for the pixels without the extra PD reset is Texp1, which is identical to one frame readout time, whereas the pixels with the extra PD reset redefines the new exposure time, Texp2.
Fig. 4
shows proposed X-Y addressable pixel and its timing, in which we added two transistors shown in bold to the original circuit 18) . The proposed X-Y addressing circuit includes a NMOS transistor M4 attached to the gate of the reset transistor M1. Three-transistor (3-Tr) type pixel is shown but it can be applied to fourtransistor (4-Tr) type pixel, too. The NMOS transistor does a Boolean AND, so that a pixel only with both PIXRSTR and PIXRSTC are logic high is being reset by shorting the PD with known voltage VRST via M1. By selectively resetting pixels during exposure in pixelby-pixel manner, arbitrary shapes can be traced with different exposure time; this is the foundation of the proposed extended dynamic range imaging. Although the original concept itself is appealing, noise from the floating node denoted as Vx, increased complexity and additional transistors were major drawbacks that prohibit practical use. Among them, we focused on improving the noise; pixel fixed pattern noise 
(FPN).
The noise generation mechanism is described as follows. When the gate voltage of M4 becomes logic low, the gate terminal of M1 denoted as Vx becomes floating. The drain voltage of M4 is already logic low when its gate voltage starts decreasing to shut off. There are clock feed-through and charge injection that make the floating node become less than the ground potential. The problem is that some of the charge goes into the photo-diode via the gate and source overlap capacitance, denoted as Cgso1. As a result, it changes the integration start voltage. The pixel FPN at dark is caused mainly by the dark current differences 19) 20) and cannot be removed by the CDS. Similarly, the charge caused by coupling to the floating node cannot be removed by the CDS since it is not correlated and the amount of the affecting charge varies pixel-by-pixel because of the overlap capacitance variation, which causes the pixel FPN. We introduced M5 as a miniscule constant current to create a virtual ground at the floating node. When the node potential becomes less the ground potential, the same amount but opposite polarity of charge is provided via the ground. This reverse current automatically stops after the node potential becomes equal to the ground potential so that the operation does not consume any static current. We have added one more transistor, M6, aiming to improve image quality. This additional transistor is not specific for the X-Y addressing pixel, but 3-Tr pixel that does not incorporate a pinned photo-diode and a transfer gate. The rationale is as follows. When the photo-diode potential starts decreasing due to the incident light, the output node potential of the pixel source follower amplifier denoted as Vy also decreases rather instantly due to the M2's gate and source overlap capacitance, Cgso2. However, no pixel source follower bias current is connected during the exposure time, very little current due to leakage flows. This means that Vy actually goes up until the transistor M2's drain and source voltage becomes almost identical to balance the very tiny leak current. Thus, some charge is fed back to the photo-diode through the same path; removing the charge from the photo-diode, which leads to less contrast or sensitivity. The root cause of the issue is the delayed feedback loop. So, we have added M6 to cut the feedback loop by constantly flowing small but larger than leakage current. This is obviously not practical as it consumes large amount of current as a whole even if each pixel consumes less than a micro ampere. The experimental result showed only about 1% improvement on the sensitivity, which is much less than our initial estimation. We are analyzing the mechanism, however it can be concluded that adding M6 is not legitimate.
Pixel Sequential Readout Architecture
The previous chip readouts all pixels in a row at once and samples at the column circuits as in the conventional pixel array architecture 16) 17) . However, now the pixel is capable of being reset one at a time because of the proposed X-Y addressing, we can further take advantage by implementing pixel sequential readout architecture as shown in Fig. 5 with its timing. Each column is equipped with additional PMOS source follower amplifier transistor M1 and the select transistor M2 where both transistors' NWELL is connected to its respective source terminal to enhance the amplifier performance. The single bias current transistor is located outside the column circuit to feed the current only to the selected column. The photon-incident signal as well as photo-diode reset voltages are sampled separately by different sample and hold buffers. Thanks to the architecture, the column circuit height becomes shorter which leads to less photo-insensitive area between macro-pixels. There are dual sample and hold buffers because of a need for the overlapped double sampling operation. The outputs of the sample and hold buffers are multiplexed to feed into the swing and level conversion amplifier to convert it into complete fully differential voltage, then to the pipelined ADC. If it were implemented with programmable voltage gain, it would be possible to have high sensitivity to enhance low light performance 8) 15) . The proposed X-Y addressing can save pixels from saturating; avoid losing contrast in high light. By combining it with the per-pixel programmable voltage gain capability, the image sensor were now equipped with wide dynamic range as well as high sensitivity capabilities simultaneously, which is very preferable for machine vision application. Looking at the downside of the architecture, one of demerits would be slow readout speed. However, a macropixel incorporates its own readout circuit and typically a number of pixels in a macro-pixel is not large, 128 × 128 for instance; the readout speed is not an issue for our target machine vision application. The readout speed is mostly determined by the pixel source follower amplifier's settling time, and this is the reason of having high performance PMOS source follower amplifier at column rather than per macro-pixel, we used 4μA bias current denoted as IB SF and 10-time more column PMOS source follower amplifier bias current denoted as IB SFC to settle within 200ns for each readout. With 128 × 128 pixels and 400ns pixel readout time, one frame readout time becomes 6.55ms; thus 152 frame per second (fps) is achievable in theory and designed so. Although, we believe 152fps is not sufficient for the target machine vision application, it is not difficult to increase the speed by implementing multiple readout channels. There are two reasons for the needs of the flexibility on frame rate in machine vision application described as follows. 1) In a situation where only certain regions on the focal plane are very bright whereas the rest of the scene is very dark and an object is moving, it is not appropriate to set frame rate globally because of the increased chances of missing the moving object in the dark regions. Therefore, the flexibility on frame rate based on local illumination intensity is preferable.
2) When tracking an object, frame rate of the regions that covers the object has to be increased. However, for other regions, frame rate can be decreased to reduce wasteful power consumption and recording data volume. Therefore, the flexibility on frame rate based on local motion intensity is preferable.
10-b 2.5MS/s 1.5-b/stage Pipelined ADC with Two Serial Outputs
For real-time machine vision processing, it is preferable to have on-chip Analog-to-Digital Converter (ADC). There are several ADC architectures to choose from. However, we selected the pipelined architecture as shown in Fig. 6 because of the two reasons. The pixel sequential readout with the dual double sampling buffers and the swing and level converter continuously generates fully differential signal in one cycle at a time. The pipelined ADC is one of the fastest Nyquist converters that generates one output digital code in one cycle; the pipelined ADC timing matches to the pixel sequential readout architecture. Another reason is that the ADC consists of a series of an identical stage aside from the transistor sizing and bias current scaling for low power consumption. This regular layout topology is beneficial to a macro-pixel based CMOS image sensor because an ADC has to physically be fit within the width of a macro-pixel. There are ten 1.5-b stages and a 1-b flash ADC at the last stage aiming for 10-b with a quarter LSB resolution. A stage consists of one multiplying DAC that also performs sample and hold with a residue amplifier and two dynamic comparators for sub-ADC 21) . For simplicity, the designed ADC shares the same configuration across all stages except the last stage.
An ADC needs to reside in the periphery of a chip so as not to increase insensitive area. A peripheral circuit consists of analog signal readout channel; dual double sample and hold circuit, ADC, non-overlap clock generator, level shifter, bias circuits and digital control logics including command serial I/F. Among them, an ADC occupies large part of the area. The layout constraint is tight because for example, assuming 7μm pixel pitch and 128 × 128 pixels, the available width is only about 1,000μm. We envision a 3D-IC for the proposed focal plane structure 16) , in which only the top layer incorporates the photo-diode array whereas the next layer is tiled with respective peripheral circuits. In that case, a layout of a peripheral circuit should fit in 1,000μm × 1,000μm area. To alleviate the layout constraints, we moved the ADC's delay matching and error correction circuit to an FPGA. Also, we serialized the twenty one sub-ADC's outputs into two to mitigate a requirement of chip I/O pads from 252 to 24 excluding clock and control signals. This serialization is simple, but it is one of the key enablers realizing the chip implementation in 2D.
Prototype Chip
We fabricated a prototype chip to confirm the ideas. Fig. 7 shows chip layout pattern and enlarged peripheral circuit layout pattern. In the peripheral circuit, a digital place and route (P&R) logic resides at right so that it can propagate clocks and control signals and receive sub-ADCs' outputs without routing congestion. The column output signal is received by a dual double sample and hold buffers and its output is connected to a swing and level converter, then its output is fed to a pipelined ADC. The pipelined ADC is folded in layout and its output is connected to the digital P&R logic after level shift down. Then, after serialization, the ADC digital codes are output via chip I/O pads. Table 1 shows chip specification. The base dynamic range is 48dB and it can be extended by 84dB as the minimum exposure time is only one pixel readout time in theory. However, because the area of the digital counters for the exposure time controls was not enough, the bit widths of the counters were truncated. As a result, the dynamic range extension is limited to 18dB. The extension range of the current architecture depends on the number of pixels in a macro-pixel. Thus, we are thinking of more flexibility on pixel reset timing within one pixel readout time to achieve over 150dB extended dynamic range. The chip was designed for 152fps, but because of an image capturing board performance, all image capturing experiments were done at 12fps. Fig. 8 shows an effect on pixel FPN as a function of the current sink transistor's bias voltage and equivalent per-pixel current consumption when activated by the transistor M5 in Fig. 4 . As the bias voltage increases and goes beyond the threshold voltage, about 0.45V, the pixel FPN starts to decrease. It is because the injected charge caused by the added transistors is being canceled by injecting the opposite charge back into the photo-diode. Three chips were measured and the average improvement is 16.0% when the bias voltage is at 0.5V. It was confirmed that the added transis- tor is effective for better image quality. Unfortunately there is no measurement data that compares between the proposed pixels to the normal 3-Tr pixel. For machine vision application where main purpose of imaging is for computer recognition and/or identification, certain level of the pixel FPN is actually acceptable because of advanced computer algorithms. We set this acceptable level to that of the normal 3-Tr pixel but not anymore. The proposed pixel realizes the useful functionality that surpasses what the normal 3-Tr pixel can do, so we believe if the pixel FPN is comparable to the normal 3-Tr pixel, the advantage is established. This is the reason why we care for lowering the pixel FPN even if it is 3-Tr type pixel. Considering the above, we believe the impact of the 16.0% improvement by one additional transistor is large and important as well. So, it is our future work to investigate the pixel FPN between the proposed pixel and the normal 3-Tr pixel. Fig. 9 shows the ADC's differential and integral non-linearity after calibration. The ADC measurement was done with 0.78MS/s using external discrete 16- bit DACs to generate a ramp signal as a test input through test pins. Five hundred samples were averaged to reduce the temporal noise and converted to 10-bit ADC input range to match with equivalent 1LSB voltage. The DNL and INL are +0.94/-1.38LSB and +1.44/-2.88LSB, respectively. Relatively large DNL is due to the fact that the layout constraint for the ADC is tight; area of only 196fF was available for each capacitor. Also, the ADC output lines have to be bent at the 5th stage because of the folded layout structure. We also suspect that no dummy capacitor implemented around the actual capacitors because of the same reason is another factor. The ADC was calibrated using a histogram base fully digital type 22) but modified for 1.5-bit/stage. Major advantages of this fully digital calibration is that there needs no modification in the analog signal readout chain; the calibration is being done using the exact signal path and the gain error is also corrected unlike other code boundary-gap base fully digital calibration 23) . On the contrary, disadvantage is that it is a type of foreground so it needs to be calibrated before use. The detail calibration algorithm explanation is beyond a scope of the paper, however, we are considering more effective calibration scheme. Fig. 10 shows a captured image using long exposure time. Images from four macro-pixels are only shown but all macro-pixel can capture images. The clock and doll inside the room exhibits good contrast whereas the regions correspond to outside the room are completely over-exposed. From machine vision especially computer recognition point of view, a captured image should not contain over-exposed regions. To realize that, the previous chip adapts the exposure time on macro-pixel base like shown in Fig. 11 , in which the exposure time of left two macro-pixels is reduced by 18dB. As a result, the over-exposed region recovers the contrast. However, if there is distinct illumination difference within a macropixel, under-exposed regions are inevitably generated; the left half of the clockface becomes very dark and the contrast is lost. This was a drawback in the previous chip. Fig. 12 shows a result of the proposed X-Y addressing scheme. The regions enclosed with the dashed-line were captured by the short exposure time otherwise the long exposure time. Both the over-exposed region and the clockface recovered the contrast. A pixel selection pattern is not limited to rectangle or polygon shapes but it can be traced any shape in pixel-by-pixel manner. Thus, for instance, once the over-exposed regions are detected, those pixels' exposure time can be optimized in the next frame. Although, proposed architecture implements only dual exposure time control, it is not difficult to increase a few exposure time control circuits to realize versatile different exposure time controls; very short and short exposure times for bright and fast moving objects and long exposure time for dark objects, for instance. Fig. 13 shows a captured image using the vertical stripe pattern; there are long and short exposure times in every two columns. This pattern simulates the SVE focal plane where the spatial resolution is traded off with the illumination range capability. The middle is an image sub-sampled and reconstructed only using the pixels with long exposure time. The right is an enlarged image around the string "12" on the clockface. The SVE focal plane provides the simultaneous images, hence preferable to fast moving objects. However, because the spatial resolution is lost, it is somewhat difficult to infer the correct strings.
Measurement Results
Conclusions
We have proposed new low pixel FPN X-Y addressing scheme in conjunction with pixel sequential readout architecture for the extended dynamic range imaging based on a macro-pixel CMOS image sensor and fabricated using 0.18-μm 1P5M standard CMOS process. We confirmed that the proposed pixel-by-pixel manner exposure time control successfully recovers the contrast on the over-exposed regions while suppressing a generation of the under-exposed regions. Also, we confirmed that the proposed current sink transistor decreases the pixel FPN by 16.0% on average. Lastly we integrated 10-b, 2.5MS/s pipelined ADC and the measurement result showed +0.94/-1.38LSB DNL. Future work includes having programmable and high voltage gain amplifier with pixel-by-pixel selection capability to enhance low light performance, more flexibility on exposure time control and higher frame rate for better adaptive imaging in machine vision application, and the pixel FPN improvement validation from the normal 3-Tr pixel. 
