Multi-resolution low-power Gaussian filtering by reconfigurable focal-plane binning by Fernández Berni, Jorge et al.
Multi-resolution low-power Gaussian filtering by
reconfigurable focal-plane binning
J. Ferna´ndez-Berni a, R. Carmona-Gala´na, F. Pozas-Flores a, A´. Zara´ndyb and
A´. Rodr´ıguez-Va´zquezb
aInstitute of Microelectronics of Seville (IMSE-CNM)
CSIC-Universidad de Sevilla, Spain.
bComputer and Automation Research Institute (MTA-SZTAKI)
Hungarian Academy of Sciencies, Budapest, Hungary.
ABSTRACT
Gaussian ﬁltering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection
are examples of tasks where diﬀerent Gaussian ﬁlters can be successfully utilized. However, their implementation
in a conventional digital processor by applying a convolution kernel throughout the image is quite ineﬃcient.
Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their
neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive
and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This ineﬃciency
is specially remarkable for ﬁlters with large variance, as the kernel size increases signiﬁcantly. In this paper, a
diﬀerent approach to achieve Gaussian ﬁltering is proposed. It is oriented to applications with very low power
budgets. The key point is a reconﬁgurable focal-plane binning. Pixels are grouped according to the targeted
resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry
out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the
application of a 3×3 binomial ﬁlter kernel, which in turns is a good approximation of a Gaussian ﬁlter, on the
original image. The variance of the closest Gaussian ﬁlter is around 0.5. By repeating the operation, Gaussian
ﬁlters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until
reaching the desired ﬁlter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proof-
of-concept focal-plane array manufactured in 0.35μm CMOS technology are presented. A maximum RMSE of
only 1.2% is obtained by the on-chip Gaussian ﬁltering with respect to the corresponding equivalent ideal ﬁlter
implemented oﬀ-chip.
Keywords: Focal-plane processing, Gaussian kernels, binomial ﬁlter mask, low-power smart image sensors
1. INTRODUCTION
Gaussian kernels are a fundamental component of a computational approach to visual perception motivated
by physics and biological vision.1 Convolution with Gaussian kernels and Gaussian derivatives constitute a
canonical class of image operators for early vision. As a family, Gaussian kernels form a semi-group. One
important property is that any coarser scale representation can be obtained from any representation at a ﬁner
level. Additionaly, Gaussian kernels have the property of preserving local extrema in the image, i. e. no minima
nor maxima are accidentally introduced when a Gaussian blur is applied in order to supress ﬁner scale details of
the image.2 Because of these properties, Gaussian ﬁlters are able to generate a scale space3 and, consequently, a
multi-scale image representation.4 It is worth mentioning that scale-space operators have a similar form to the
receptive ﬁelds observed in neurophysiological studies.5 This type of image representation is certainly useful for
image interpretation. As there is no a priori knowledge about the scale of the relevant elements in the scene, a
multi-scale representation covers all the possible ranges. Image features can then be extracted at diﬀerent scales
and scale-invariant features can be highlighted as characteristic of whatever takes place in the visual ﬁeld.6 It is
not strange that visual attention models based on saliency make extensive use of these operators.7
Further author information:
Jorge Ferna´ndez-Berni: E-mail: berni@imse-cnm.csic.es, Telephone: +34 954466666
Bioelectronics, Biomedical, and Bioinspired Systems V; and Nanotechnology V, edited by Ángel B. Rodríguez-Vázquez,
Ricardo A. Carmona-Galán, Gustavo Liñán-Cembrano, Rainer Adelung, Carsten Ronning, Proc. of SPIE Vol. 8068, 
806806 · © 2011 SPIE · CCC code: 0277-786X/11/$18 · doi: 10.1117/12.886555
Proc. of SPIE Vol. 8068  806806-1
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
The isotropic Gaussian kernel, centered at the origin, employed to generate a scale-space representation of a
two-dimensional image, is deﬁned as a parametrized function g : R2 × R+ → R where:
G (x; ξ) =
1
2πξ
e−|x|
2/2ξ ⇔ Gˆ (k; ξ) = e−2π2|k|2ξ (1)
in which ξ is referred as the scale parameter and corresponds to the variance of the Gaussian kernel (ξ = σ2),
and Gˆ(·) is the Fourier transform of G(·). One advantage from the point of view of the implementation is that
the Gaussian kernel is separable into two orthogonal functions G1(·) and G2(·):
G (x; ξ) = G1(x1; ξ) ∗G2(x2; ξ) = 1
2πξ
(
e−x
2
1/2ξ ∗ e−x22/2ξ
)
(2)
Given that the image plane is discretized, the function G(·) is only evaluated at valid points of the grid.
For a relatively large σ, i. e. higher scales, the number of elements of the kernel that cannot be neglected is
prohibitively large, as can be seen below:
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.04 0.00 0.00
0.00 0.04 1.00 0.04 0.00
0.00 0.00 0.04 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.03 0.11 0.03 0.00
0.00 0.11 0.44 0.11 0.00
0.00 0.03 0.11 0.03 0.00
0.00 0.00 0.00 0.00 0.00
0.00 0.01 0.02 0.01 0.00
0.01 0.06 0.10 0.06 0.01
0.02 0.10 0.16 0.10 0.02
0.01 0.06 0.10 0.06 0.01
0.00 0.01 0.02 0.01 0.00
σ = 0.4 σ = 0.6 σ = 1.0
In fact, a minimum size of 6σ has been estimated in order to avoid excessive ripple in the stop band introduced by
truncation.8 In terms of the required computing power and resources, the dynamic adaptation of the kernel size
represents a signiﬁcant drawback. An alternative approach will be time-multiplexing the smoothing operators.
In other words, repeatedly applying smaller kernels in order to obtain a higher scale parameter, what directly
derives from the semi-group characteristic of the Gaussian kernels:
G (x; ξ1 + ξ2) = G (x; ξ1) ∗G (x; ξ2) (3)
that can easily be understood in the Fourier domain:
Gˆ (k; ξ1 + ξ2) = e
−2π2|k|2(ξ1+ξ2) = e−2π
2|k|2ξ1 · e−2π2|k|2ξ2 = Gˆ (k; ξ1) Gˆ (k; ξ2) (4)
Therefore, we need to select an elementary Gaussian ﬁlter, or an approximation, that can be easily implemented,
both in terms of the number of non-zero elements of the kernel and in terms of the relations between them. The
2-D binomial ﬁlter4 is a good candidate:
B2 = B2 ∗ (B2)T = 1
4
[
1 2 1
] ∗ 1
4
⎡
⎣
1
2
1
⎤
⎦ = 1
16
⎡
⎣
1 2 1
2 4 2
1 2 1
⎤
⎦ (5)
Proc. of SPIE Vol. 8068  806806-2
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
11
0
1
1 0 1 1
Figure 1: Focal-plane capacitor grid for charge redistribution
which is the result of convolving a horizontal, B2, and a vertical,
(
B2
)T
, 1-D binomial masks. Each of these 1-D
ﬁlters are, in turn, the result of convolving twice the elementary averaging mask, B1:
B2 = B1 ∗B1 = 1
2
[
1 1
] ∗ 1
2
[
1 1
]
=
1
4
[
1 2 1
]
(6)
Because of the central limit theorem, the transfer function and the mask of the binomial ﬁlter approximate
the Gaussian ﬁlter with an equivalent variance. In the case of the kernel expressed in Eq. (5) the variance is 0.5,
and the error committed in the approximation of the equivalent Gaussian ﬁlter is around 0.8%, depending on
the input image.
The rest of the paper is dedicated to an eﬃcient implementation of the binomial ﬁlter based on the use of focal-
plane multi-resolution capabilities. It is organized as follows. First we will show how reconﬁgurable resolution is
implemented by adding the possibility of binning pixels together and allowing for charge redistribution among
them. Then we will demonstrate that the eﬀect of repeatedly averaging the pixels in shifted divisions of the focal-
plane grid is that of applying a binomial ﬁlter. Finally, some experimental results, obtained with a prototype
chip fabricated in a 0.35μm CMOS technology, are displayed, conﬁrming the validity of the approach.
2. CHARGE REDISTRIBUTION AND PIXEL BINNING
At the focal plane of a CMOS imager, the photogenerated current is directly sensed and (or) integrated.9 In the
latter case, the pixel value is a voltage at the sensing capacitor. This voltage is stored, at least temporarily, so it
can be read out. If an electronic shutter is provided,10 the pixel voltage is maintained until the next reset, within
the accuracy permited by leakages. Fully-parallel operations can be performed onto these voltages at the focal
plane without using an external memory as these capacitors act as a distributed analog memory. If switches are
provided between the capacitors, as can be seen in Fig. 1, the stored charge redistributes ending in the averaging
of the initial voltages. Let us consider that, by setting the appropriate control pattern, a sub-image of size m×n
is isolated. This is realized by turning on the m − 1 signals that control the connections between the m rows
of pixels, and the n − 1 signals that control the connections between the n columns in Fig. 1. By enabling the
Proc. of SPIE Vol. 8068  806806-3
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
electrical paths between the m× n capacitors, the pixels whose original values are p0ij , . . . , p0i+m−1,j+n−1 end in:
pi+k,j+l
∣∣∣∣
∀k∈{0,...,m−1},∀l∈{0,...,n−1}
=
1
mn
m−1∑
k=0
n−1∑
l=0
p0i+k,j+l (7)
It is worth to mention that the result is exactly the same if the switches conforming the m × n region are set
from the start, as charge redistributes in parallel with photocurrent integration. This is called pixel binning.11
Consider now a regular subdivision of the focal-plane grid. For instance, an alternate sequence of 1’s and 0’s
is loaded into the row and column connection control registers of Fig. 1. It means that the full-resolution image
of M ×N pixels is divided into 2× 2-pixel blocks. As the four pixels within each block are connected together,
they will end up having the same pixel value:
pi,j
∣∣∣∣
i∈{1,3,5,...,M−1},j∈{1,3,5,...,N−1}
=
1
4
(
p0ij + p
0
i,j+1 + p
0
i+1,j + p
0
i+1,j+1
)
(8)
that is the average of the original values of the four pixels contained in the block. We have assumed that M and
N are even. The resulting image contains M/2×N/2 pixels, with the connection scheme depicted in Fig. 2(a). It
will be the starting point for the processing we will explain later. Another relevant assumption is that any feature
that we are interested in must be noticeable at this resolution. The following analysis applies to images divided
in blocks of any size as long as their dimensions are even and the results to be expected are M/2×N/2-pixel or
smaller images.
3. GAUSSIAN FILTERING BY GRID SHIFTING
Let us start with an image, of size M ×N -pixels, stored in a capacitor grid like that of Fig. 1. The grid has been
divided into 2 × 2-pixel blocks, within which charge has been allowed to redistribute. It means that our initial
image is of size M/2 × N/2-pixels and has four capacitors storing the same voltage, i. e. the same pixel value
(Fig. 2(a)). Let us concentrate on the transformation that is going to be suﬀered by the value pij stored at the
position indicated by the arrow in Fig. 2(a). At a certain point in time, the alternate sequences of 1’s and 0’s at
the row and column connection control registers are shifted one space down and to the right, respectively. The
(a) (b)
Figure 2: (a) Focal-plane division in 2× 2-pixel blocks and (b) shifted grid.
Proc. of SPIE Vol. 8068  806806-4
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
pixel grouping scheme changes from that of Fig. 2(a) to the one depicted in Fig. 2(b). Consequently, because of
a new redistribution of the charge in the newly formed blocks, the value of the marked node becomes:
p′ij =
1
4
(pi−1,j−1 + pi−1,j + pi,j−1 + pij) (9)
The values at the neighboring nodes, that were originally pij as well, are now averaged in their new 2× 2-pixel
blocks, so they have been transformed into:
p′i,j+1 =
1
4
(pi−1,j + pi−1,j+1 + pij + pi,j+1) (10)
p′i+1,j =
1
4
(pi,j−1 + pij + pi+1,j−1 + pi+1,j) (11)
p′i+1,j+1 =
1
4
(pij + pi,j+1 + pi+1,j + pi+1,j+1) (12)
If the control sequences are shifted back to the original position, one space up and to the left, then the new
values expressed by Eqs. (9)-(12) and averaged once more, resulting in:
p′′ij =
1
16
(pi−1,j−1 + 2pi−1,j + pi−1,j+1 + 2pi,j−1 + 4pij + 2pi,j+1 + pi+1,j−1 + 2pi+1,j + pi+1,j+1) (13)
Notice that the M × N -pixel image has undergone two shifts of the connection scheme followed by the
averaging of the pixel values within the resulting 2 × 2-pixel blocks. Each combination of grid shifting and
averaging has the same eﬀect as applying the averaging mask:
B1 = B1 ∗ (B1)T = 1
2
[
1 1
] ∗ 1
2
[
1
1
]
=
1
4
[
1 1
1 1
]
(14)
over a M/2×N/2-pixel image. By doing it twice, we are applying the 3× 3 binomial ﬁlter mask of Eq. (5):
B1 ∗B1 = 1
4
[
1 1
1 1
]
∗ 1
4
[
1 1
1 1
]
=
1
16
⎡
⎣
1 2 1
2 4 2
1 2 1
⎤
⎦ = B2 (15)
that is precisely what is expressed in Eq. (13). This theoretical result has been checked by numerical simulation∗,
yielding 0.16% RMSE for a 256 × 256-pixel image of Lena. This small error is associated to diﬀerences in the
rounding error committed on following the diﬀerent methods.
4. CHIP MEASUREMENTS
Although the above described procedure may theoretically render the same results as the direct convolution with
the binomial ﬁlter mask, its physical implementation involves a number of switches to reconﬁgure and shift the
connection grid. Switching error becomes more apparent when the storage capacitors are small. In this section
we are showing the results obtained by implementing binomial ﬁltering by shifted average grids in a prototype
chip with focal-plane reconﬁgurability and multi-resolutional capabilities. The prototype chip (Fig. 3)12 has
been fabricated in a 0.35μm CMOS process with anti-reﬂective coating and reduced photodiode dark response.
A summary of the chip characteristics and features is given at Table 1. This chip was not originally thought to
operate following the already explained scheme, but it has a reconﬁgurable focal-plane connection grid, like that
in Fig. 1, that provides multi-resolution capabilities.
The ﬁltering procedure explained above has been programmed into the chip test environment. The results
obtained on-chip render a 1.12% RMSE for the ﬁrst application of the ﬁlter. This overall error is attributable
to the accumulated switching errors and also to the noisy readout. Fig. 4 depicts the original 176 × 144-pixel
∗MatlabR© files for comparing the results of realizing binomial filtering either directly or by grid shifting and averaging
can be found at http://www.imse-cnm.csic.es/wivisnet/spie files/
Proc. of SPIE Vol. 8068  806806-5
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
Figure 3: General view and microphotographs of the CMOS prototype chip
Technology 0.35μm CMOS 2P4M 3.3V
Vendor (Process) Austria Microsystems (C35OPTO)
Die size (with pads) 7280.8μm × 5780.8μm
Cell size 34.07μm × 29.13μm
Fill factor 6.45%
Resolution QCIF: 176×144 px
Photodiode type n-well/p-substrate
FPN 0.72%
PRNU (50% signal range) 2.42%
Sensitivity 0.15V/(lux·s)
Measured power consumption 5.6mW@12kSa/s
Maximum throughput 110kSa/s (9μs/Sa)
Table 1: Summary of the prototype chip features.
image captured by the chip, together with the downsampled, after pixel binning, 88 × 77-pixel version, that
is the initial image for both the on-chip and the oﬀ-chip (ideal) ﬁltering. Starting from this image, sucessive
steps has been realized in order to generate a space scale. Each step implies the convolution with the binomial
ﬁlter mask (B2), either by averaging and shifting the connection grid on-chip or directly applying the mask
oﬀ-chip with Matlab R©. This can be seen in Fig. 5. The ﬁrst column represents the image ﬁltered on-chip. The
second the oﬀ-chip, ideal, version starting from the same input (Fig. 4(b)). The third column is the diﬀerence
normalized to the value of the maximum individual pixel error detected at each step. This maximum deviation
is 3.17%, 3.83%, 3.69%, 3.82%, 5.07%, 4.74%, 4.79%, 4.96% and 5.66%, respectively. For the complete image,
the measured RMSE is 1.12%, 1.39%, 1.55%, 1.69%, 1.82%, 1.92%, 2.02%, 2.12% and 2.23%, respectively for
each step. Notice that the ideal ﬁltering has the eﬀect of averaging the zero-mean noise introduced by readout at
every step of the on-chip ﬁltering. This noise is re-sampled each time a new image is delivered from the on-chip
processing. The consequence is that the error tends to increase as we go up the scale space.
An important feature of this alternative method to compute the scale space is that the incidence on the
power budget is far below the milliwatt. For each repetition, shifting the grid and averaging twice is estimated
to require 20nJ. This estimation is obtained by simulation and represents switching the complete connection grid
twice. Image capture and readout are excluded from this sum. At 30fps, it represents 0.6μW, what is certainly
negligible and below the precision of our measurement setup.
Proc. of SPIE Vol. 8068  806806-6
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
5. CONCLUSIONS
Theoretical background for the implementation of an approximated Gaussian ﬁlter by using multi-resolution
capabilities at the focal-plane is given. Ideally, the only diﬀerence with the direct application of the binomial
ﬁlter convolution mask is rendered by the rounding error of the computing hardware. We have implemented this
procedure in a prototype chip with all the necessary means to reconﬁgure the focal-plane connection scheme.
The results evidence the validity of our assumption. The on-chip ﬁltering approximates the ideal within a 1.2%
error. The incidence of this processing in the total power budget of the smart imager operation is negligible.
ACKNOWLEDGMENTS
This work is partially funded by the Andalusian Regional Government through project 2006-TIC-2352, by
the Spanish Ministry of Science and Innovation through project TEC 2009-11812, co-funded by the Euro-
pean Regional Development Fund, and also supported by the Oﬃce of Naval Research (USA), through grant
N000141110312.
REFERENCES
[1] Romeny, B. t. H., [Front-End Vision and Multi-Scale Image Analysis ], Springer (2003).
[2] Lindeberg, T. and Romeny, B. t. H., “Linear scale-space: (i) basic theory (ii) early visual operations,” in
[Geometry-Driven Diﬀusion in Computer Vision ], ter Haar Romeny, B. t. H., ed., 1–77, Kluwer Academic
Publishers (1994).
[3] Lindeberg, T., “Scale-space,” in [Encyclopedia of Computer Science and Engineering ], Wah, B., ed., IV,
2495–2504, John Wiley and Sons (2008).
[4] Jahne, B., “Multiresolutional signal representation,” in [Handbook of Computer Vision and Applications ],
Jahne, B., Hauβecker, H., and Geiβler, P., eds., 2, 67–90, Academic Press (1999).
[5] Soodak, R. E., “Two-dimensional modeling of visual receptive ﬁelds using Gaussian subunits,” Proceedings
of the National Academy of Sciences 20, 9259–9263 (December 1986).
[6] Lowe, D. G., “Object recognition from local scale-invariant features,” in [Proc. of the IEEE Int. Conference
on Computer Vision ], 2, 1150–1157 (1999).
[7] Itti, L., Koch, C., and Niebur, E., “A model of saliency-based visual attention for rapid scene analysis,”
IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1254–1259 (November 1998).
[8] Sotak, G. E. and Boyer, K. L., “The Laplacian-of-Gaussian kernel: a formal analysis and design procedure
for fast, accurate convolution and full-frame output,” Comput. Vision Graph. Image Process. 48, 147–189
(November 1989).
Figure 4: Chip captured image (a) and downsampled version (b).
Proc. of SPIE Vol. 8068  806806-7
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
[9] Otha, J., [Smart CMOS Image Sensors and Applications ], CRC Press (2007).
[10] Aw, C. H. and Wooley, B., “A 128x128-pixel standard-CMOS image sensor with electronic shutter,” Solid-
State Circuits, IEEE Journal of 31, 1922 –1930 (December 1996).
[11] Zhou, Z., Pain, B., and Fossum, E., “Frame-transfer CMOS active pixel sensor with pixel binning,” Electron
Devices, IEEE Transactions on 44, 1764 –1768 (October 1997).
[12] Ferna´ndez-Berni, J., Carmona-Gala´n, R., and Carranza-Gonza´lez, L., “FLIP-Q: A QCIF resolution focal-
plane array for low-power image processing,” IEEE J. of Solid-State Circuits 46, 669–680 (March 2011).
Figure 5: On-chip ﬁltering, ideal and ampliﬁed diﬀerence.
Proc. of SPIE Vol. 8068  806806-8
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
Figure 5: (Cont.) On-chip ﬁltering, ideal and ampliﬁed diﬀerence(c)
Proc. of SPIE Vol. 8068  806806-9
Downloaded From: http://www.spiedl.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms
