Abstract-This paper reports on a QVGA vision sensor embedding 160 column-level digital processors executing real-time tunable scene background subtraction for robust event detection. The single-ramp column-parallel ADCs are used to estimate the pixel variations and detecting anomalous behaviors against two reference images stored in on-chip. The sensor generates a 160x120 pixel bitmap associated to potential alert conditions. The chip is powered at 3.3V/1.2V for the analog/digital parts and consumes 1.6mW when operating at 15fps dispatching gray-scale image and a quarter QVGA bitmap.
INTRODUCTION
Commercial cameras are targeted to visual tasks where image quality and resolution are the most important features. However, in some applications such as surveillance and monitoring, they are not efficient since they force the processor to continuously analyze images, with a large waste of power. Embedding low-level image processing on-chip would make camera and system to be more energy-efficient. Following this approach, we present a QVGA vision sensor embedding a lowpower background subtraction algorithm [1] . The sensor detects anomalous motion in the scene and generates an alert bitmap as input for high-level processing (e.g. tracking and classification) to be executed by the processor. Several implementations of onchip motion detection have been proposed [2] - [4] , which are based on frame difference technique. Although some of them can detect slow moving objects, they cannot suppress noisy zones of the scene, such as swaying vegetation or rippling water, which are not so uncommon in real scenarios. Differently from our previous fully analog implementation [5] - [6] , we propose a digital approach allowing motion to be detected over a larger range and in harsh outdoor scenarios.
II. VLSI-ORIENTED ALGORITHM
The background is modeled with two thresholds [1] , updated at each frame and stored into a frame buffer for subsequent operations. The embedded algorithm can be divided into two steps:
Learning step -two images are generated and updated at each frame: IMIN (contains the minimum reference value for each pixel) and IMAX (contains the maximum reference value for each pixel). For the generic i-th frame, the current value of each pixel P(x,y) is compared with its IMIN and IMAX; then the two reference images are updated as follows:
where ∆OPEN and ∆CLOSE (∆OPEN > ∆CLOSE) are user-defined parameters used to update the two reference images in opening and closing conditions.
Detection step -it is used to detect if one pixel of the array is "cold" or "hot", i.e. its behavior is normal or anomalous against its past history:
where H(x, y) is the binary status (hot-pixel) of the pixel P(x, y) and ∆HOT sets the hot-pixel conditions. Fig. 1 shows how the algorithm works when a pixel changes regularly (e.g. swaying vegetation). In this case, the two thresholds (Vmax, Vmin) track the current signal (Vpix) at different speeds, thus modifying the safe-zone (cold-pixel), while outside it, the pixel is a hot-pixel (red). From frame to frame, the two thresholds try to suppress the pixel by reaching the max and min peaks of Vpix. After about 170 frames the hot-pixel disappears and the oscillation is effectively registered as a background.
III. VISION SENSOR ARCHITECTURE
The rolling-shutter vision sensor consists of an array of 320×240 pixels, 320 single-ramp 4MHz 8-bit column ADCs, a bank of 160 processors that implements the row-wise algorithm updating the 10b reference images (IMIN, IMAX), stored into a 375Kb 6T-cell SRAM. At each row readout phase, the Processing Elements (PE) generate a 160-bit hot-pixel array. Each hot-pixel is filtered by a programmable Erosion Filter before to be delivered off-chip. A quarter QVGA hot-pixel bitmap (160×120 bits) is generated at the end of each frame according to (1)-(4). Fig. 2 shows the architecture of the vision sensor.
A. Pixel Readout and A/D Conversion
The schematic of the 3T pixel column readout and A/D conversion is shown in Fig. 3 . It is implemented with a foldedcascode amplifier, which is also re-used as voltage comparator for the single ramp ADC. The readout phase starts with the pixel voltage driving the bit-line (Vbl): its value is charged on C1 (S=H) and then it is amplified with a gain of 2, (C1/C2=2) (S=L). Fixed pattern noise is compensated by subtracting the pixel reset voltage: the reset value is stored on C1, with inverted polarity (Phl=H), and added to the signal on C2. Each of the signal and reset sampling phases can be repeated several times by pulsing S, therefore increasing the overall gain and averaging the pixel follower and the amplifier noise in a multiple-sampling operation. After the pixel has been read out and stored onto the feedback capacitor, the A/D conversion starts. Capacitor C1 is disconnected from node A (S=H), while C2 is connected to the DAC (pre=L), which provides the voltage ramp starting from Vh. This operation pulls-up the inverting node (A) of the amplifier, which is now in open-loop working as voltage comparator and forcing its output (C) to ground. The node A, connected to the global DAC through C2, follows the decreasing voltage Vramp while the global counter is clocked. When the voltage on node A reaches Vref, the output of the amplifier switches toward Vdd and the 8-bit latch toggles, storing the value of the counter. The amplifier/comparator and ADC occupy a silicon area of 8 µm × 210 µm.
B. Processor and SRAM
The digital processors are organized in an array of 160 cells, processing pixels row-by-row during the sensor readout phase. Each Processing Element compares the 8-bit pixel signal P(x,y) against two thresholds I(x,y)MAX and I(x,y)MIN stored on the embedded SRAM, detects opening/closing and hot-pixel conditions (1)- (4), and updates the thresholds accordingly to be restored into the frame buffer ready to be reused next frame for further processing. Isolated hot-pixels are removed by their own programmable 3×3-pixel kernel erosion filter.
Since the rolling shutter sensor works up to 15 fps, each row of pixels has to be read out, converted, processed and delivered off-chip in less than 278 µs. Pixel amplification uses 20 µs, while the 4MHz ADC takes 64 µs for data conversion; updating and storing the thresholds into SRAM takes 6 µs; the 320×8-bit grey-scale pixels are delivered in 80 µs.
In order to guarantee the algorithm fine-tuning, the two reference images (IMIN, IMAX) need to be updated with 10b resolution (0.25 LSB). Therefore, a 240×160×10-bit memory has been embedded with the sensor to store the temporary results. In our case, a 6T-cell SRAM has been adopted, turning into a size of 1.6µm×2.8µm/bit. The entire double frame buffer occupies a silicon area of 1.9 mm 2 , including the peripheral circuitry. Fig. 4 shows the microphotograph of the fully tested chip together with the sensor prototype controlled by an FPGA. A graphical user interface allows setting the sensor parameters: ∆OPEN, ∆CLOSE, ∆HOT and the exposure time. Fig. 5 shows an example of an outdoor scenario with a moving boat. The algorithm neglects the background and clearly detects the moving boat suppressing the waves.
IV. EXPERIMENTAL RESULTS

V. CONCLUSIONS
In this paper we presented a low-power QVGA vision sensor with programmable dynamic background subtraction.
Experimental results show the capability of the sensor to robustly suppress the background (e.g. rippling water) while extracting salient moving features. The chip consumes 1.6mW while delivering QVGA gray-scale image and quarter QVGA bitmap at 15 fps. The main chip characteristics are listed in Table  I . 
