Abstract-Incorporating multi-resolution capabilities into imagers renders additional power saving mechanisms in the subsequent image processing. In this paper, we show how, by exploiting a certain mask structure, 3 × 3 kernels can be reduced to 2 × 2 kernels if charge redistribution is provided at the focal plane of the imaging device. More precisely, by averaging and shifting a half-resolution pixel grid, we will have a pre-processed image, subsampled by a factor of 2 on each dimension, that can be filtered with a mask of a reduced size. Very useful image filtering kernels, like a 3 × 3 Gaussian kernel for image smoothing, or the well-known Sobel operators, fall into this category of reducible kernels. Operating onto the pre-processed image with one of these reduced kernels represents a smaller number of operations per pixel than realizing all the multiply-accumulate operations needed to apply a 3 × 3 kernel. Memory accesses are reduced in the same fraction. Concerning the difficulties of providing this pre-processed image representation, we propose a methodology for obtaining it at a very low power cost. It requires the implementation of user definable image subdivision and subsampling at the focal plane. Experimental results are given, obtained from measurements on a CMOS imager prototype chip incorporating these multi-resolution capabilities.
I. INTRODUCTION
The advances in CMOS integration have permitted the development of smart CMOS image sensors [1] . These chips incorporate concurrent image sensing and processing. One of the main advantages of this integration is the possibility of transferring a large part of the computational load associated with early vision tasks to the focal plane. The sensor array becomes a specialized processor with an adapted architecture. In early vision, processing is characterized by regular, local computations with inherent pixel-level parallelism. These computations are precisely the most time-and power-consuming tasks on DSP-based systems [2] . By incorporating processing capabilities at the focal-plane, in most of the occasions by efficiently using devices operating in analog mode, the computational load of the main digital processor can be greatly alleviated. The result is the realization of early vision tasks at record performances in speed, power and area [3] [4] [5] . In this paper we are showing how very simple additional circuits at the focal plane lead to improvements in the system power consumption. In particular, we will demonstrate that, by exploiting the internal structure of the convolution kernels, we can operate onto a focal-plane pre-processed image obtaining virtually identical results with only 45% of the operations needed when applying the original kernels.
One of the most basic operations that can be implemented at the focal plane, without producing a significant signal degradation, is the averaging of disjoint groups of pixels. This can be achieved by charge redistribution right after, or even in parallel with, photocurrent integration. This will be the starting point of our proposal. We will explain how 3 × 3 kernels of a particular structure can be transformed into 2 × 2 kernels to be applied to the pre-processed images. Then we will show how this focal-plane pre-processing can be implemented at negligible power consumption with a working prototype chip.
II. IMAGE SUBDIVISION AND KERNEL REDUCTION
Let us consider a M × N -pixel array. The value of each pixel is represented by a voltage resulting from integrating a photocurrent into a sensing capacitor during the exposure time -which is a very feasible implementation. Consider these capacitors being 4-connected through switches, what permits dividing the focal plane into rectangular blocks. Within each block, charge redistributes itself achieving voltage averaging. By configuring the grid to be regularly subdivided into 2 × 2-pixel size blocks, we will have an image of M/2×N/2 blocks. This state is depicted in Fig. 1(a) , where every four pixels are labelled with the same value. Now let us re-define the grid by shifting the edges of the grouping scheme one pixel down and one pixel to the right ( Fig. 1(b) ). Once the new grouping is enabled, charge redistributes again, and the values of the pixels, originally p ij , p i,j+1 , p i+1,j and p i+1,j+1 , are now averaged within each new block, resulting in:
Notice that the output image, since we have started by averaging the 2 × 2-pixel blocks, will be a quarter of the size of the full-resolution sensor, i. e. half of the height and half of the width of the original image. Small arrows in Fig. 1(a) signal the quarter-size input image, while those in Fig. 1 (b) point to the pixels that are going to be sampled to obtain the M/2 × N/2 pre-processed image. This scheme can be extended to lower image resolutions provided that the size of the blocks is B × B, being B an even number. In such a case, the grid must be shifted B/2 pixels to obtain a representation that is equivalent to that already described at the corresponding lower resolution. It can be seen that the result of applying the reduced kernel:
to the shifted image ( Fig. 1(b) ), being p ′ ij the pixel weighted by the upper-left element of the kernel, a, is:
This is the same as applying a 3 × 3 kernel of the form:
to the original half-resolution image ( Fig. 1(a) ), centered in p ij . In other words, applying kernel K to the 9 2 × 2 blocks of Fig. 1(a) is equivalent to applying kernel K ′ to the 4 central blocks of Fig. 1(b) .
Remind that the output image is half of the width and half of the height of the original image, as subsampling is required to perform the procedure. From the point of view of the digital implementation of the required signal processing, this simplification -if the pre-processed image can be efficiently generated at the focal plane, as we will see later-represents an important reduction in the computing needs. Instead of 9 MACs (multiply-accumulate operations), the pixel output can be obtained by using 4 MACs. This means only 45% of the resources required for the convolution of the original 3 × 3 kernel. Some kernels can, however, be further decomposed in a horizontal and a vertical components. Applying the same decomposition to the reduced kernels proposed still makes for a reduction of the resources to 66% of the initially compromised. Memory accesses has been reduced as only 4 pixels, instead of 9, need to be considered to evaluate the output of each pixel. It must be said also that the relations required between kernel elements greatly restrict the number of kernels that can be reduced. Fortunately, some very useful templates in early vision processing fall into this category. For instance, the usual 3 × 3 binomial mask for image smoothing, which is a good approximation of a Gaussian filter with σ ≈ 0.7 [6] , is transformed into a 2 × 2 kernel in this way:
The Sobel operators [6] are other interesting templates. They compute an approximation to the components of the image intensity gradient:
where G x approximates the derivative in the horizontal direction while G y the vertical. They are employed for edge detection. Both kernels hold the prescribed structure.
III. FOCAL-PLANE IMPLEMENTATION OF PRE-PROCESSING
Previous works involving multiresolution and averaging [7] realize all signal processing outside the pixel array. However, the type of pre-processing described by Eq. (1) can be implemented in the focal plane by the architecture described in [8] and depicted in Fig. 2 . Each pixel contains a photodiode that discharges a sensing capacitor at a rate proportional to the incident light power. There is also a set of switches that connect each capacitor to its nearest neighbors. One interesting property is that connections between neighboring rows and columns are user-selectable. Column and row selection signals are stored at serial-in/parallel-out shift registers located at the upper and leftmost sides of the pixel array, respectively. Any possible combination of 1's and 0's can be loaded into these registers in order to reproduce any particular connectivity pattern for, respectively, columns and rows. The connection pattern, however, does not become effective until a global signal is enabled after configuration. In this way, this enhanced imager is able to perform the operations described in Sect. II. First of all, pixels are grouped into 2 × 2 blocks by loading a bit pattern with alternate 1's and 0's in both the column and the row registers. After enabling this connectivity scheme, charge redistributes within each block, reaching the configuration depicted in Fig. 1(a) . Right after that, connections are disabled, the bit pattern is shifted one position in both directions, horizontal and vertical. Then, the new connection scheme is enabled. The result is depicted in Fig. 1(b) . Notice that if we consider the M/2×N/2 image to be our starting point -as we can have the connection pattern already loaded and enabled to realize charge redistribution simultaneously with photocurrent integration-, the only additional energy required to do the work is disabling the connection switches, shifting the registers one position, and re-enabling the connection switches with the new configuration. A rough estimation, obtained by evaluating the energy required to turn on and off the switches and to shift the register content, will render a power consumption below 0.33µW at 30fps for a QCIF-size array.
In order to implement the complete operation at the focal plane, different circuits are required for each kernel. In the case of the binomial mask, one more diagonal shift of the block selection and an additional average render the required filtering. For the Sobel operators, extra hardware for image substraction is required, as there are kernel elements with a negative sign. The contribution to the overall power consumption in the first case will be negligible. In the second case it will depend on the implementation of the substraction block.
IV. CHIP RESULTS
We have implemented the focal-plane pre-processing in a prototype chip (Fig. 3) intended for low-power image processing, while the reduced kernels has been applied off-chip. The prototype chip contains all the elements to implement the required pre-processing at a low power cost. The main characteristics of the chip are summarised in Table I . As depicted in Figs. 4 and 5, we have operated on images captured at the laboratory (available at http://www.imse-cnm.csic.es/vmote/redkern). These images correspond to pictures of 'Lena' and the 'Baboon' displayed at a computer screen. Artifacts due to the screen grain can be observed. We proceeded in this way: first we took snapshots of the computer screen, either showing 'Lena' or the 'Baboon', and then read out the images from the chip and filtered them off-line by applying G s , G x and G y . This is shown in the first row of Figs. 4 and 5. Then we grouped the pixels of the input images to form the half-resolution images, read them out and applied the filters by convolution with the 3 × 3 kernels off-chip, for a reference. Then we shifted the pixel grouping on-chip and read out the pre-processed half-resolution image to apply, off-chip, the 2 × 2 kernels, i. e. G RMSE values are always below 1% in the Gaussian blur experiment, and below 4% in the application of the two Sobel filters. This is consistent with the fact that edge detection is the result of the off-chip application of two masks, what contributes to error spreading. The error committed by on-chip binning alone is 0.53% (RMSE) in both examples. Finally, although the maximum detected error in one single-pixel can be high -46.91% in one case-, it is not significant as it occurs at the boundaries of the image.
V. CONCLUSIONS
In this paper we have reported an example of how multiresolution capabilities can lead to additional power savings. In particular, using the most elementary operations at the focalplane, namely charge redistribution and user-definable image subdivision, the pre-processed image can be subsequently filtered with a fraction of the computational resources required to apply the original kernels. In order to illustrate the validity of the approach, we have implemented the required image preprocessing in a prototype imager with focal-plane processing capabilities. The errors committed, due to the analog nature of the processing in the focal plane, are kept below a reasonable bound for early vision applications. 
