A programmable VLSI filter architecture for application in real-time vision processing systems by Serrano-Gotarredona, Teresa et al.
A Programmable VLSI Filter Architecture for Application in
Real-Time Vision Processing Systems
Teresa Serrano-Gotarredona1, Andreas G. Andreou2, and Bernabé Linares-Barranco1
1Instituto de Microelectrónica de Sevilla (IMSE), Centro Nacional de Microelectrónica (CNM),
Ed. CICA, Av. Reina Mercedes s/n 41012 Sevilla, SPAIN. Phone: 34-5-4239923,
Fax: 34-5-4231832, E-mail: terese@imse.cnm.es
2Dept. of Electrical and Computer Engineering, The Johns Hopkins University,
Baltimore, MD 21218, USAAbstract
An architecture is proposed for the realization of
real-time edge-extraction filtering operation in an
Address-Event-Representation (AER) vision system.
Furthermore, the approach is valid for any 2D
filtering operation as long as the convolutional kernel
F(p,q) is decomposable into an x-axis and a y-axis
component, i.e. F(p,q)=H(p)V(q), for some rotated
coordinate system {p,q}. If it is possible to find a
coordinate system {p,q}, rotated with respect to the
absolute coordinate system a certain angle, for which
the above decomposition is possible, then the
proposed architecture is able to perform the filtering
operation for any angle we would like the kernel to be
rotated. This is achieved by taking advantage of the
AER and manipulating the addresses in real time.
The proposed architecture, however, requires one
approximation: the product operation between the
horizontal component H(p) and vertical component
V(q) should be able to be approximated by a signed
minimum operation without significant performance
degradation. It is shown that for edge-extraction
applications this filter does not produce performance
degradation. The proposed architecture is intended
to be used in a complete vision system known as the
Boundary-Contour-System and Feature-Contour-
System Vision Model, proposed by Grossberg and
collaborators. The present paper proposes the
architecture, provides a circuit implementation using
MOS transistors operated in weak inversion, and
shows behavioral simulation results at the system
level operation and electrical simulation and
experimental results at the circuit level operation of
some critical subcircuits.
  I. Introduction
Human beings have the capability of recognizing
objects, figures, and shapes even if they appear embedded
within noise, are partially occluded or look distorted. To
achieve this, the human vision processing system is
structured into a number of massively interconnected
neural layers with feedforward and feedback connections
among them. Neurons communicate by means of
electrical streams of pulses. Each neuron broadcasts its
output to a large number of other neurons, which can be
inside the same or at different layers, and the way this is
done is through physical connections called synapses.
One big problem encountered by engineers when it
comes to implement bio-inspired (vision) processing
systems is to overcome the massive interconnections. An
interesting way of trying to solve this is by Address Even
Representation (AER) [1]-[3]. In AER each neuron codes
its activity as a pulse stream signal with very low duty
cycle, i.e. pulse width must be minimum but separation
between pulses should be fairly large. Each neuron has a
code or address, and every time it produces a pulse it will
try to write its code on a common digital bus. A receiving
system will continuously be reading this bus and send the
pulse to those neurons who ought to be connected to the
sending neuron. In this manner the activity of a large
number of neurons can be time multiplexed on a common
bus. This principle allows to structure hierarchically a
very complex neural system. For example, a retina chip
with AER output is continuously putting addresses on a
bus representing the sensed images. Several chips, each
with an AER receiver system, can be reading the same
bus, doing some specialized processing and broadcasting
the outputs of all their neurons using again AER on
another external bus, and so on. Furthermore, extra
processing can be added easily while the “addresses” go
from one chip to the next. For instance, image rotation or
translation can be performed in a straightforward manner
by inserting an EEPROM for which the transformation
operation has been programmed pixel by pixel (or address
by address). In the architecture proposed in this paper we
take advantage of this fact to simplify the processing chip.
As neuroscientists manage to unfold the internal
structure and functions of the vision system, it becomes
more feasible for mathematicians and computer scientists
to propose and understand bio-inspired vision models and
algorithms, and for engineers to build bio-inspired
artificial vision systems. One powerful vision model
proposed recently by Grossberg et al. [5] is the Boundary-
Contour-System (BCS) and Feature-Contour-System
(FCS) vision model. It consists of nine layers which after
local illumination normalization and contrast
enhancement of an input image, performs local edge
extraction for different spatial orientations and scales, and
then is able to identify consistent long range contours of
the shapes in the input image through processing layers
with feedforward and feedback connections. In this
vision model one of the stages performs a 2D filtering
operation for edge extraction, and other stages perform
other 2D filtering operations. The processing architecture
proposed in this paper is intended to be used in this vision
model to perform a simplified version filter doing the
edge-extraction operation. The same processing
architecture can be reprogrammed to perform some of the
other 2D filters needed in the BCS-FCS vision model.
The present paper is structured as follows. In the next
Section we will briefly describe the structure,
functionality, and operations performed by the BCS-FCS
vision model. In Section III we introduce modification of
the edge-extraction kernel which substitutes a product
operation by a minimum operation in the original kernel.
Section IV describes briefly the essence of AER, and in
Section V we introduce a VLSI architecture capable of
implementing a 2D programmable filter. Section VI
provides system level behavioral simulation results of this
architecture programmed with a kernel to do an extraction
of vertically oriented edges, and finally Section VII
indicates the conclusions and future work.
  II. The Boundary-Contour-System and
Feature-Contour-System Vision Model
Fig. 1 shows a schematic representation of the
structure of the BCS-FCS model [5]. The BCS consists of
several identical subsystems (three in the case of Fig. 1)
each of which is tuned for a different spatial scale. Each
BCS spatial subsystem consists of 8 layers. Consecutive
layers have been drawn in Fig. 1 as connected by thick
shaded arrows. We may think of these arrows as the
representation of a convolution (or filter) operation
applied to the state of the previous layer and resulting in
the state of the next layer. For instance, the 2D input
image suffers three different filtering or convolutional
operations, each of which is the starting point of a BCS
subsystem. From here on, each BCS subsystem operates
autonomously. From Layer 1 to Layer 3 there are only
feedforward filtering operations, while Layers 4 to 8 are
connected in a feedback loop configuration, which means
the system will reach a steady state after a certain number
of iterations (if the system is implemented sequentially on
a computer) or after a certain time constant (if the system
operates asynchronously and fully parallel, like in
biological brains). The outputs of Layers 1 and 5 of the
three BCS subsystems are fed to the FCS. Next we will
briefly describe the processing performed on the different
layers.
A.Stage 1: Center-ON OFF-Surround
Let us assume
(1)
is an N×M input image provided by a vision sensing
front end. This input image is applied to a 2D filter
whose impulsive response or kernel of radial symmetry
is shown in Fig. 2(a). We can see that pixels close to the
center region of the kernel are going to contribute with
positive weights to the convolution, while pixels further
away will contribute negatively. The result of such a
convolution is local illumination normalization and
contrast enhancement. The mathematical expression for
this kernel is
(2)
where , are positive parameters, , and
controls the spatial scale of the filtering. In the case of
Fig. 1 there are three BCS subsystems, which means
three Center-ON OFF-Surround filters are applied in
parallel to the same input image, each with a specific
( ). From now on the processing in each BCS
subsystem is independent.
B.Stage 2: Simple Cells
The second stage of the BCS system applies an
orientation specific convolution for detecting edges
oriented within a narrow angle range. This is performed
by convolving the output of Layer 1 with the kernel shown
in Fig. 2(b) for different orientations. This is why the
output of Layer 1 in Fig. 1 suffers several convolutions in
parallel, one for each orientation, resulting in as many
“Layer 2” as orientations have been considered. The
kernel of Fig. 2(b) is mathematically described by the
difference between two displaced gaussians
Layer 6
Layer 7
Layer 2
Layer 3
Layer 4
Layer 1
La
yer
 5
La
yer
 8
Input Image
Layer 9
Output Image
 Fig. 1: Schematic Representation of Grossberg’s et al.
Boundary Contour System (BCS) and Feature Contour
Systems (FCS) Vision Model
BCS
FCS
I p q,( )
p 1 …N,=
q 1 …M,=

S1 p q,( ) A1e
p2 q2+
σg
----------------
A2e
p2 q2+
ασg
----------------
–=
A1 A2> α 1> σg
σg
g 1 2 3, ,=
(3)
where the coordinate system is rotated a certain
angle with respect to the coordinate system of the input
image ,
(4)
with being the total number of orientations to be
considered.
C.Stage 3: Complex Cells
After applying the filtering of Stage 2, a pixel in
Layer 2 for orientation k will display a high positive value
if the input image presents a positive change in contrast
with respect to the k-th orientation axis. If the change in
contrast is negative, the output of this pixel would be a
high negative value. In order to detect whether or not there
is an edge at that orientation around the given pixel there
is no need to distinguish between positive and negative
values. Therefore, the purpose of this processing stage is
simply to rectify the output of the previous one.
D.Stage 4: Hypercomplex Cells, Competition across
Space
At Layer 3 for orientation k, pixels will present a
positive value if around that pixel there is an edge at that
orientation. The higher the pixel value, the clearer the
edge was. At this stage, and independently for each
orientation, a 2D Center-ON OFF-Surround filter is
applied to contrast enhance the previous image. This is
equivalent to performing a spatial competition among
pixels, favoring those with higher values.
E.Stage 5: Hypercomplex Cells, Competition across
Orientations
At this stage, all Layer 4 pixels of the same spatial
position but for all possible orientations k, are going to
compete among them to contrast enhance those
orientations with higher pixel values. This is done by
applying a 1D Center-ON OFF-Surround filter to pixels
of Layer 4 of the same spatial position but for all k
orientation values.
F.Stage 6: Bipole Cells, Long-Range Cooperation
The operation of this stage is the most complicated. It
tries to identify long term “Contours”, which can be
defined as edges that remain consistent over larger space
ranges. This is achieved by performing for each
orientation k the following sum of convolutions,
(5)
where is the resulting state of pixel of Layer
6 for spatial scale g and orientation k, subscript r
denotes orientation, and each convolution is given by
(6)
where denotes the state of pixel of Layer
5 for orientation r, denotes orientation perpendicular
to r, and the kernel is defined by
(7)
with , , and being positive parameters. Fig. 2(c)
depicts this kernel for the case .
G.Stage 7: Hypercomplex Cells, Competition across
Space
This stage performs the same operation than Stage 4.
(a)
Fig. 2: Convolutional Kernels used in the BCS system of
Fig. 1: (a) Center-ON OFF Surround Kernel used by
Stages 1, 4, and 7. (b)Edge-Extraction Kernel used by
Stage 2. (c) Bipole Kernel used by Stage 6.
(b)
(c)
0 10
20 30
40 50
60 70
0
20
40
60
80
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10
20 30
40 50
60 70
0
20
40
60
80
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0 10
20 30
40 50
60 70
0
20
40
60
80
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fg pk qk,( )
1
2piσghσgv
------------------------e
1
2--
pk
σgh
-------  
2
–
e
1
2--
qk
σgv
-------
1
2--+  
2
–
e
1
2--
qk
σgv
-------
1
2--–  
2
–
–=
pk qk,( )
p q,( )
pk p
pik
nR
-----cos q pik
nR
-----sin–=
qk p
pik
nR
-----sin q pik
nR
-----cos+=
nR
Apqk
g C p q k r, , ,( )
r 1=
nR
∑=
Apqk
g p q,( )
C p q k r, , ,( ) yr p q,( ) yrˆ p q,( )–[ ] Zr pk qk,( )⊗=
yr p q,( ) p q,( )
rˆ
Zr pk qk,( ) pk( )sgn e
β pk2 qk2+( )–
e
µ qk / pk
2( )2–
×=
k r–( )pi
nR
------------------- pk( )sgn 2
2qk
pk
-------- pk,  atan–  
 
cos×
β µ γ
k r 0= =
H.Stage 8: Hypercomplex Cells, Competition across
Orientations
This stage performs the same operation than Stage 5.
The output of Stage 8, Layer 8, is combined with the
output of Stage 3, Layer 3, to form the input image for
Stage 4. This way a feedback loop is formed, which once
settled will yield the proper output of each BCS
subsystem.
I.Stage 9: Feature Contour System (FCS)
The information about consistent long range
contours can be taken from Layer 5 (once the feedback
loop has settled), for all computed orientations. The FCS
takes the original (local illumination normalized and
contrast enhanced) image present in Layer 1 and performs
a selective diffusion operation between pixels, using the
contour information present at Layer 5: the contours at
Layer 5 act as barriers to the diffusion operation. The
result of all this processing is a clean noise-free image
with clear and consistent long range contours.
  III. An Edge-Extraction Filter
In the rest of this paper we will concentrate on Stages
2 and 3, the filter for edge-extraction and subsequent
rectification. We will first introduce a simplification of the
kernel of eq. (3) which will allow us to propose a very
compact and efficient hardware that takes advantage of
the AER as well.
The kernel of eq. (3) is decomposable into two
factors, each of which depends only on either the x-
coordinate or the y-coordinate ,
, (8)
with (9)
The simplification proposed here consists in substituting
the product operation between and by the
signed minimum,
(10)
Fig. 3(b) shows the result of applying the filtering of eq.
(8) to the input image of Fig. 3(a), while Fig. 3(c) results
when using the kernel of eq. (10). As can be seen there
is no appreciable difference in the resulting images.
To evaluate quantitatively the effect of the proposed
approximation we can use the Normalized Square Error
defined as,
. (11)
This quantity helps us to evaluate the difference between
the original kernel and the modified kernel
obtained when the product operation is
substituted by the signed minimum. Table 1 gives the
computed NSE for several kernels. All kernels in Table
1 are decomposable in the product of two functions that
depend separately on the  and  components.
  IV. Using Address Event Representation
(AER)
Fig. 4 shows a schematic figure outlining the essence
behind the AER. Suppose we have an “emitter” chip
containing a large number of neurons or cells D1, D2, D3,
... whose activity changes in time with a “relatively slow”
time constant. For example, if Chip 1 is a retina chip and
each neuron’s activity represents the illumination sensed
by a pixel, the time constant with which this activity
pk qk
Fg pk qk,( )
1
2pi----- Hg pk( )V g qk( )=
Hg pk( )
1
σgh
--------e
1
2--
pk
σgk
-------  
2
–
=
V g qk( )
1
σgv
------- e
1
2--–
qk
σgv
-------
1
2--+  
2
e
1
2--–
qk
σgv
-------
1
2--–  
2
–=
Hg .( ) V g .( )
Fg' pk qk,( )
1
2pi----- Hg pk( )[ ]sgn V g qk( )[ ] ×sgn=
min Hg pk( ) V g qk( ),{ }×
NSE
F x y,( ) Fm x y,( )–
2
xd yd∫∫
F x y,( ) 2 xd yd∫∫----------------------------------------------------------------------=
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450
(a)
(b)
 Fig. 3: Behavioral Simulation Results when performing
2D Filtering for vertical edge-extraction. (a) Input Image,
(b) using the edge-extraction filter kernel of eq. (8), (c)
using the modified filter kernel of eq. (10)
(c)
F x y,( )
Fm x y,( )
x y
changes is, at the most, equivalent to Frame-Rate (i.e., 25-
30 changes per second or a time constant of about 30-
40ms).
The purpose of an AER based communication
scheme is to be able to reproduce the time evolution of
each neuron’s activity inside a second or “receiver” chip,
using a fast digital bus with a small number of pins. In the
“emitter” chip the activity of each pixel has to be
transformed into a pulse stream signal such that pulse
width is minimum and the spacing between pulses is
reasonably high to time multiplex the activity of a
relatively large number of neurons. Every time a neuron
produces a pulse its address or code should be written on
the bus. For the case more than one pulses are produced
simultaneously by several neurons, a classical arbitration
tree can be introduced [1]-[3], or one based in Winner-
Take-All (WTA) row-wise competitions [6], or simply by
making no neuron accessing the bus in case of a
“collision” [7]. Whatever method is used the result will be
the presence of a continuous sequence of addresses or
codes on the digital bus that one or more receiver chips
can read. Each receiver chip must contain a decoding
circuitry so that a pulse reaches the neuron (or neurons)
specified by the address read on the bus. If each neuron
integrates the sequence of pulses properly, the original
activity of the neurons in the emitter chip will be
reproduced. Note that in AER those neurons that are more
active access the bus more frequently. This property
allows to optimize the use of the bus, since neurons with
low activity will not consume much communication
bandwidth.
This is the simplest AER based communication
scheme among chips. However, AER allows easily to add
more complicated processing. For example, input images
can be translated or rotated by remapping the addresses
while they travel from one chip to the next. By properly
programming an EEPROM as a look-up table any address
remapping can be implemented, by simply inserting the
EEPROM between the two chips. Furthermore, many
EEPROMs can be connected in parallel each performing,
for example, a rotation at a specific angle, and each
delivering the remapped addresses to a set of specialized
processing chips. It is also possible to include synaptic
weighting by having the EEPROM store the weight value,
dumping it on a data bus, have the “receiver” chip read
both the address and the data bus, and perform a weighted
integration in the destination(s) neuron(s). It is also
possible to implement “projective fields”, i.e. for every
address that appears on the bus a small digital system
could generate a sequence of addresses around it and send
it to the “receiver” chip. This would be a time-multiplexed
projection field generation. In the architecture proposed
in this paper, we implement a synaptically weighted
projection field for each address read on the bus, and not
in a time-multiplexed manner but in parallel. As we will
see, the receiver chip will perform the following
operations: for every address read on the bus it will send
pulses to a bubble of neurons around that address. The
width of those pulses is modulated according to some
weights stored on chip. Time integration of those pulses
for the complete array of neurons in the receiver chip
implements a convolution operation. In the rest of the
paper we will concentrate on describing the circuits able
to implement such a convolutional or filtering chip.
  V. System Design
Fig. 5 shows the basic operating principle of the
proposed architecture. The address bus provides the
coordinates of the neuron (or pixel) around which
the kernel of eq. (10) should be applied. Pulses will be
applied to all rows with y-coordinate in the interval
, and all columns with x-coordinate in the
interval , where is the width
considered for the kernel.
Table 1
kernel parameters NSE(dB)
Gaussian -24.92
Even
Gabor -19.04
Odd
Gabor -19.03
Displaced
Gaussians
-22.73
F x y,( ) H x( )V y( )=
e
1
2--–
x
σx
-----  
2
e
1
2--–
y
σy
-----  
2 σx 10=
σy 15=
e
1
2--–
x
σ
--  
2
e
1
2--–
y
σ
--  
2
2pi yys
----  sin
σ 15=
ys 20=
e
1
2--–
x
σ
--  
2
e
1
2--–
y
σ
--  
2
2pi yys
----  cos
σ 15=
ys 20=
e
1
2--–
x
σx
-----  
2
e
1
2--–
y ys–
σy
------------  
2
e
1
2--–
y ys+
σy
-------------  
2
–
  
  
σx 15=
σy 5=
ys 5=
D1
D2
D3
CHIP1
A
RBITER+EN
CO
D
ER
DIGITAL
BUS
D
ECO
D
ER
D1
D2
D3
CHIP2
 Fig. 4: Address Even Representation Interchip Communication Scheme
x0 y0,( )
y0 L– y0 L+,[ ]
x0 L– x0 L+,[ ] 2L 1+
Pulses will be modulated in width according to
function (see eq. (8)) for the rows, and function
for the columns. At each pixel there is an AND
gate which provides a pulse of width equal to the
minimum of and . This pulse will generate a
fixed magnitude current pulse of the same width which
will be integrated on a capacitor. Each pixel contains two
integrators. One of them, called the “positive integrator”,
integrates the pulse of length when
; while the other, called the
“negative integrator”, integrates the pulse when
. The values of and
( ) are stored digitally on chip on a
small RAM.
Fig. 6 shows the floorplan diagram of the system. It
consists of two input decoders that decode the address of
the arriving pulses, a C-element required for the AER
communication protocol [1]-[3], an array of
integrator cells , two sets of programmable
monostables and
whose pulse widths are controlled
by the bits stored in two RAMs, and
(which store the digital words and
, respectively), two arrays of
and selecting cells and
, respectively, two output decoders to select the
min H x( ) V y( ),( )
V y( )
H x( )
H x( )
y0
x0
x0
BUS X
BU
S Y
y0
 Fig. 5: Schematic Representation of the Basic Operation
Principle behind the proposed Architecture
V y( )
V y( )
H x( )
V y( ) H x( )
min H x( ) V y( ),( )
H x( )( )sgn V y( )( )sgn 0>
H x( )( )sgn V y( )( )sgn 0< V x( ) H y( )
x y, L …0… L,,–=
c1M
c11
cNM
cN1
cij
x1 xNxi
Px+1 Px
-
1 Px+i Px
-
i Px+N Px-N
Mx
-L
MxL
Tx
-LSx
-L
TxL
SxL
M
y
-L
M
yL
Ty
-L
Sy
-L
TyL
SyL
yM
yj
y1
Py+M
Py-M
Py+j
Py-j
Py+1
Py-1
Ack
Rqst
Vmonob
C-element
Latch X
La
tc
h 
Y
Input Decoder X
In
pu
t D
ec
od
er
 Y
IM+
+
IM-
Ij+
Ij-
I1+
I1-
ScanM
Scanj
Scan1
Io
Output Decoder X
O
ut
pu
t D
ec
od
er
 Y
Scx1 ScxNScxi
ScyM
Scyj
Scy1
Cx1,-L
Cx1L
CxN,-L
CxNL
Latch
Rqstm
Rqst
Ack
LatchRqst
Ack
M
y
s
Mxl
CyM
,-L
CyM
,L
Cy1,-L
Cyj0
Cxi0
Rqstm
RyL
Ry
-L
Ry
s
Rxl
Rx
-L
RxL
Random
Scan
Bus
8
4
4
Address
Bus
4
4
8
4
4
Cxi-l,l
Cyj-s,s
Cy1,-L
RAM Y
RAM
 X
 Fig. 6: Floorplan of Complete 2D Filtering System
Ty
s
Sy
s
Txl
Sxl
N M×
cij
M x L– ... M x0 ... M xL, , , ,
My L– ... My0 ... MyL, , , ,
RAM X RAM Y
Rx L– ... Rx0 ... RxL, , , ,
Ry L– ... Ry0 ... RyL, , , ,
2L 1+( ) N× 2L 1+( ) M× Cxi l– l,
Cy j s– s,
cells to be scanned, and a column of scanning circuits
to read out the integrators analog output current .
Note that in the present prototype of Fig. 6 the system
does not generate an AER output. This can be solved by
either adding the necessary circuitry to each pixel [1]-[3]
which will decrease the cell density, or by adding a post-
processing chip that scans sequentially all cells in the
array of Fig. 6 and generates an AER output.
The operation of the system in Fig. 6 is as follows. In
and digital words of bits are stored
( and ). The first
bit (or ) indicates the sign of the function (or
). The following bits indicate the absolute value
(or ). These bits linearly control the length
of the pulse triggered by monostables (or ). The
pulses generated by the monostables are sent through
lines (or ) and are triggered whenever an external
pulse arrives to the system (whenever signal Rqst pulses).
When an external pulse arrives, the input decoders
activate lines and corresponding to the address of
the arriving pulse. The selection cells controlled by
(cells in Fig. 6, ) connect the
pulse in line to line if the sign bit is ‘1’. If
the sign bit is ‘0’ line is connected to the negative
line . This way, pulses (or ) are sent through
lines or ( or ) depending on the
sign of the weight stored in (or ).
Each neuron has two integrators. The positive
integrator accumulates charge when pulses are
simultaneously arriving through horizontal and vertical
lines of the same sign. That is, it integrates a pulse when
lines and (or lines and ) are
simultaneously high, or equivalently it performs the
operation . Hence, the
positive integrator in cell computes along time the
following sum
(12)
where , , is the (lossy)
integral over time of the number of pulses pixel
is receiving, and is the fixed magnitude of the current
pulses being integrated.
Similarly, the negative integrator accumulates charge
when pulses arriving through horizontal and vertical lines
of opposite sign and (or and ) are
simultaneously high, that is, it performs the operation
. Hence, along time it
computes the following sum
(13)
Consequently, the difference between the outputs of the
positive and negative integrators is given by,
, (14)
which is the filter operation we want to implement. In
what follows we will describe the circuit components
and operations of each block in Fig. 6.
A.Communication Protocol: The C-element
To perform a proper communication between two
chips a communication protocol must be implemented
[1]-[3]. In the AER scheme, the sender chip indicates
when the address of a pulsing neuron is ready on the bus
and the receiver chip must acknowledge that the pulse has
been received and that it is ready to receive a new pulse.
Fig. 7 shows the timing diagram of a valid
communication protocol for the two chips. The sender
chip generates a request signal and the receiver
generates an acknowledge signal . When the sender
has put the address on the bus it pulls the request signal
to a high value. Once the receiver detects a high
signal it latches the received address and pulls the
acknowledge signal high. The sender can put now
low and begin to process the following pulse. The
receiver must wait until all the monostables have sent
their pulses to the corresponding neurons and the
has gone low to put the signal low. Once the
signal is low the sender can activate the signal to a
high value again. Fig. 8 shows the schematic of the cell
used in the receiver chip to generate the signal. This
cell is known as “C-element”. This element receives two
input signals: a request signal generated by the
sender system, and signal which is the wired-
NOR of all the monostable output pulses
. The C-element generates an
output acknowledge signal which is sent back to the
sender system. When no pulses are being received,
is low. Signal is high as no pulses are being
Scan j Io
RAM X RAM Y n 1+
Rx L– ... Rxl ... RxL, , , , Ry L– ... Rys ... RyL, , , ,
Sxl Sys H x( )
V y( ) n
H x( ) V y( ) n
M xl Mys
T xl T ys
xi y j
xi
Cxi l– l, l L …0…L,–=
T xl Pxi l–+ Sxl
Sxl T xl
Pxi l–
- T xl T ys
Pxi l–+ Pxi l–- Py j s–+ Py j s–-
Rxl Rys
cij
Pxi+ Py j+ Pxi- Py j-
Pxi+ Py j+∩( ) Pxi- Py j-∩( )∪
cij
I ij
+ Iwmin H (xpi) V (yq j ),( )npq
p q,
H xpi( )( )sgn V yqj( )( )sgn 0>
∑=
xpi xp xi–= yqj yq y j–= npq
xp yq,( )
Iw
Pxi+ Py j- Pxi- Py j+
Pxi+ Py j-∩( ) Pxi- Py j+∩( )∪
I ij
- Iwmin H (xpi) V (yq j ),( )npq
p q,
H xpi( )( )sgn V yqj( )( )sgn 0<
∑=
Iw H xpi( )( )sgn V yqj( )( )sgn min H (xpi) V (yq j),( )npq
p q,
∑
CHIP1 CHIP2
Address
Rqst
Ack
t
Rqst
t
t
Address Bus
Pulse
Ack
Sender Receiver
 Fig. 7: Timing diagram of the address-event
communication protocol
Bus
Rqst
Ack
Rqst
Rqst
Ack
Rqst
Rqst
Ack Ack
Rqst
Ack
Rqst
Vmonob
T x L– … T xL T y L– … T yL, , , , ,
Ack
Rqst
Vmonob
generated by the monostables. Consequently, the
signal is low. When a valid address pulse arrives the
sender puts the signal high. The rising edge of this
signal is used to trigger the monostables so that
signal becomes low. Once is high and
has been set to low the C-element sets to a
high value, meaning that the pulse has been
received. The high value of is used to latch the
present bus address and signal that triggers the
monostables. Latching the address assures that the
corresponding neighborhood is kept selected until all
monostables finish their pulses. By latching we
assure that the monostable pulses do not end if signal
goes low before the monostable pulses have
finished. The C-element waits until goes low and all
the monostable pulses finish ( ) to put the
signal low again. Once the acknowledge is low the
sender is allowed to pull up again and a new
communication cycle can begin.
B.The Monostables
The schematic of a monostable with controlling
bits is shown in Fig. 9. Transistors and are
equally sized, as well as transistors and .
Switches are controlled by a digital n-bit word
that set the capacitance connected to node .
When no address is being received, and are
both low and hence is also low. Transistor is
cut off so that node is low. Node is also set low
through those transistors with a high value. If all
bits are low will always be high (by ) and no
pulse will be generated. When an input pulse arrives
signal and hence become high. As soon as
goes high node goes high. Current begins
to flow through the switch formed by transistors and
charging node at the rate set by bits . When
node reaches voltage value , the current
through transistor becomes higher than the current
supplied by so that the output node flips from
high to low. The length of the pulse at is the time
taken by current to charge node up to a voltage of
. This time is given by
(15)
where is the total capacitance present at node
and is set by the bits stored in the
corresponding RAM word or . With this
scheme, the length of the monostable pulses is linearly
controlled between 0 and , with
being the number of bits controlling each monostable
pulse length, and the unit capacitance in Fig. 9. Fig.
10 depicts the pulse widths obtained with Hspice versus
the value of the digital control word, for a monostable
controlled by bits. For this simulation, values of
,  and  were used.
C.The Selection Cell
Fig. 11 depicts the schematic of the selection cell
(or ) used to select the neighborhood of
cells where the monostable pulses have to be sent. Each
selection cell consists of two NAND gates controlling the
gates of two PMOS transistors ( and ) that
behave like switches, and two NMOS pull down
transistors ( and with a constant gate voltage
). Each selection cell (for example, in Fig. 6)
has two control signals (the decoder output and the sign
bit from ), one input signal (the monostable
output ) and two outputs ( and ). When a
pulse arrives with address , it activates the
decoders output and , respectively. The decoder
output controls all the selection cells with
. When is high, if the sign bit is ‘1’,
the selection cell connects the monostable output
line to the positive line . If the sign bit is
‘0’, line is connected to the negative line . The
same is valid for the coordinate selection cells.
D.The Core Cell
The schematic of cell of Fig. 6 is shown in Fig. 12.
It consists of two diode-capacitor integrators [3]. The
Rqst
Vmonob
Ack
Ack
 Fig. 8: Schematic of the C-element used for the
arbitration of the address-event inputs
Ack
Rqst
Rqst
Vmonob Rqst
Vmonob Ack
Rqst
Ack
Rqstm
Rqstm
Rqst
Rqst
Vmonob 1=
Ack
Rqst
n
Rqstm
Rqstm Rqstm
Rqstm
Rqstm
Vthm
b1 bn
Vm
IT
M1 M2
M5
M3 M4
Vout
 Fig. 9: Schematic of a Monostable Cell
S1 Sn
20Cu 2
n-1CuMb1 Mbn
M6 M7
M8
OR{bi}
VT
M1 M2
M3 M4
S1 … Sn, ,
b1 … bn, , V m
Ack Rqst
Rqstm M5
V out V m
Mbi bi
bi V m M8
Rqst Rqstm
Rqstm V out IT
M6
M7 V m bi
V m V thm
M2
M1 V out
V out
IT V m
V thm
T
Cmono
IT
---------------V thm=
Cmono V m
b1 … bn, ,{ }
Rxl Rys
2n 1–( )CuV thm/IT n
Cu
0 5 10 15 20 25 30 35
0
50
100
150
200
250
 Fig. 10: Monostables pulse length expressed in nanoseconds
versus value of controlling digital word obtained through
Hspice simulation
Pu
lse
 W
id
th
 (n
s)
n 5=
Cu 0.2 pF= IT 75µA= V thm 2.5V=
Cxi l– l, Cyi s– s,
MP+ MP-
MN+ MN -
V PD Cxi l– l,
xi
Sxl RAM X
T xl Pxi l–+ Pxi l–-
xi y j,( )
xi y j
xi Cxi l– l,
l L– … L, ,[ ]∈ xi Sxl
Cxi l– l,
T xl Pxi l–+ Sxl
T xl Pxi l–-
Y
cij
positive integrator integrates the ANDED pulses that
arrive in row and column lines with the same sign, that is,
and (or and ). The negative integrator
integrates the ANDED pulses that arrive in row and
column lines of opposite sign, that is, and (or
and ). Each diode-capacitor integrator consists
of two transistors and ( and ) operating in
the subthreshold region, a capacitor and a transistor
(or ) acting as a current source of value
(controlled by bias voltage ) with its source pulsed by
the output of the NOR gate. The input and output currents
and of the positive integrator are related to the
voltage at node through the following differential
equations (the treatment for the negative integrator would
be the same for currents , and voltage ) [3],
(16)
(17)
where is the thermal voltage and , are model
parameters of the MOS transistor operating in the
subthreshold region. From eqs. (16) and (17) we can get
an expression that relates the output and input currents
of the diode
, (18)
where
(19)
Note that current mirror gain A is controlled by voltage
. During the time in which and (or
and ) are simultaneously high, the source of
transistor is low, and this transistor is acting as a
current source sinking a constant current from the
integration node . In this case  in eq. (18).
Suppose that a train of pulses of constant frequency
, pulse width and interspike interval (as
depicted in Fig. 13) is applied simultaneously to lines
and (or and ). Integrating equation
(18) from to with , results in
, (20)
where the integrator time constant is given by
. When the ANDED pulses are zero, the
source of transistor becomes high and . If
the pulses go low at time and stay low for an
interspike time (see Fig. 13), the output current at
time just before a new pulse is applied, is given
by
. (21)
When pulses of width are applied at a constant
frequency as shown in Fig. 13, a steady state is
reached in which the charge injected by the diode during
the inactive periods equals the charge sank by the
xi
Sxl Txl
Pxi-l+
Pxi-l-
VPD
VPD
 Fig. 11: Schematic of a neighborhood selection cell
MP+
MP-
MN+
MN-
Pxi+
Pxi-
Pyj+
Pyj-
Vw
VA
Scxi
Pxi+
Pxi-
Pyj-
Pyj+
Vw
VA
Scxi
Ij-
VREF
C
C
Iij-
 Fig. 12: Schematic of the Core Cell with the two
diode-capacitor integrators
vg
-
vg
+
Mw+
Mw-
M1+ M2+
M1- M2-
Iin+
Iin-
VREF
Iij+
Ij+
Pxi+ Py j+ Pxi- Py j_
Pxi+ Py j-
Pxi- Py j+
M1+ M2+ M1- M2-
C
Mw+ Mw- Iw
V w
Iin+ I ij+
vg
+
I in- I ij- vg-
I in+ C
vg
+d
td--------– Iop
V A κvg+–
vt
----------------------  
 
exp+=
I ij+ Iop
V dd κvg+–
vt
------------------------  
 
exp=
vt Iop κ
QT
Iij+d
td-------- I ij
+ I in+
1
A
---I ij+–  =
A
Vdd V A–
vt
-----------------------  exp=
QT
Cvt
κ
--------=
V A Px+i Py+ j Px -i
Py - j
Mw +
Iw
vg
+ I in
+ Iw=
1 T⁄ T h T l
Px+i Py+ j Px -i Py - j
t
Pxi+
t1 t2 t2+Tl
Th Tl
T
nT (n+1)T
 Fig. 13: Timing diagram of the pulses applied to lines
, ,  orPxi+ Pxi- Py j+ Py j-
t1 t2 I in
+ Iw=
1
I ij+ t1 T h+( )
---------------------------
1
AIw
---------
1
I ij+ t1( )
--------------
1
AIw
---------–  
T h
τ
-----–  exp+=
τ Cvt/κIw=
Mw+ I in
+ 0=
t2
T l
t2 T l+
1
I ij+ t2 T l+( )
--------------------------
1
I ij+ t2( )
--------------
T l
AQT
-----------+=
T h
1 T⁄
current source during the pulse. In this steady state, the
two following equations must hold,
(22)
and
. (23)
Solving eqs. (20)-(23),
, (24)
and the steady state ripple will be
, (25)
where the assumption has been made..
According to equation (24), each integrator outputs a
current which is proportional to the frequency and
width of the input pulses. Supposing the AER input
image pixel intensity is linearly encoded with the
frequency of the arriving pulses, and the convolutional
kernel is encoded as the pulses width, the output current
of the positive integrators would be the input image
filtered with the filter positive terms. Equivalently, the
negative integrator output currents would be the input
image filtered with the negative terms of the filter.
Hence, the result of subtracting the output current of the
negative integrator from the output current of the
positive one is the non-rectified filter output.
Fig. 14 shows an Hspice transient simulation for one
of the integrator cells in Fig. 12. Transistor sizes are
and , integrating capacitor is
, pulse amplitude is , pulse width
is , frequency of pulse stream is
, and voltage was set to (which
yields a current gain from transistor to of
around 2000).
To verify the operation of the diode-capacitor
integrator a small prototype was integrated in a CMOS
2.5 double-poly single-metal technology. The sizes of
transistors and of the integrator were set to
and a the total capacitance at node
was approximately . To verify the linear
dependence of the output current in the steady state versus
the frequency of the arriving pulses and the width of the
pulses (see eq. (24)), measurements of the output current
in the steady state were performed where the frequency of
the input pulse stream and the width of the pulses
were swept separately. The results are shown in Fig. 15.
Fig. 15(a) shows the measured steady-state current level
as a function of frequency. During the measurement
voltage was set to , the current bias
and the width of the arriving pulses was . Fig.
15(b) shows the steady-state current level as a function of
pulse width, while maintaining the frequency constant at
, for and .
E.The Scan Out Cell
A random access scanning circuit can read the
rectified output current of any cell selected by the Random
Scan Bus of Fig. 6. The output decoder X (see Fig. 6)
selects a column i. When a column is not selected, the
output currents and of all cells in that column
flow to a line of constant voltage . If column is
selected, currents and of all cells in these
columns flow to lines and , respectively, of the scan
out cell shown in Fig. 16. Each scan out cell
receives two input currents , and provides an output
current .
At the input of each scan out cell, current is
mirrored through a PMOS current mirror and subtracted
I ij+ t1( ) I ij+ t2 T l+( ) I ij+ L ∞( )= =
I ij+ t2( ) I ij+ t1 T h+( ) I ij+H ∞( )= =
I ij+ L ∞( ) AIw
T h
T l
----- AIw
T h
T
-----≅ ≅
I ij+ ∞( )∆
I ij+ ∞( )
-------------------
I ij+ H ∞( ) I ij+ L ∞( )–
I ij+ L ∞( )
------------------------------------------
T h
τ
-----≅=
τ T T l, , T h»
1 T⁄
T h
 Fig. 14: Hspice Simulation of Integrator Cell
W 12µm= L 12µm=
C 0.1 pF= Iw 13.5nA=
T h 100ns=
1/T 80KHz= V A 4.67V
M1
+/- M2
+/-
µm
M1 M2
W L⁄ 10µm 10µm⁄=
 Fig. 15: Experimentally measured results of a diode-
capacitor integrator, (a) Steady-State Current vs. frequency
of Input Pulse Stream, (b) Steady-State Current vs. Th
vg 0.5 pF
1/T T h
V A 4.7V Iw 10nA=
T h 500ns=
100KHz V A 4.6V= Iw 10nA=
I ij+ I ij- cij
V REF i
I ij+ I ij- cij
I j+ I j-
Scan j Scan j
I j+ I j-
Io I j+ I j-–=
I j-
from current . The PMOS current mirror has an active
input [8] clamped to a voltage . This maintains a
constant voltage at output nodes of cells
when they are selected, thus speeding up the read out of
currents. Current enters the current comparator
composed of transistors , and [9]. It
consists of an opamp ( ) fed back with
transistors and in a diode configuration. This
feedback loop maintains the comparator input node (and
output of all selected cells) clamped to voltage
. If current is positive transistor will sink
this current. Transistor shares its gate with and its
source is connected to a voltage reference of value ,
thus transistor mirrors the current passing through
,
(26)
The precision of this current reflection depends on how
tightly the source of is clamped to voltage . To
achieve a good precision a high gain opamp is needed. If
current is positive transistor sources this
current, which is mirrored by transistor because its
source is clamped to by the current comparator
composed by transistors and OPAMP2.
Therefore, the current through  and  is,
(27)
This current is again reflected by the PMOS transistor
pair , . At the output node, the currents through
transistors and are added together to get the
rectified current . Since transistors
operate in weak inversion, increasing the
source voltage of transistors and with respect
to will make the current mirrors and
to have a gain higher than one (actually the
gain will be exponentially controlled by this voltage
difference).
Fig. 17 shows an Hspice simulation of the DC
characteristic of a scancell. In this simulation, current
was set to and current was swept from to
. Two traces are shown in Fig. 17. The dotted line
shows the current flowing through transistor
. The solid line corresponds to current
flowing through transistor . Note that a current
amplification of about 150 has been applied from the
input to the output currents. This allows speeding up the
current read out process.
  VI. System Level Operation Behavioral
Simulations
So far electrical (Hspice) simulations and
experimental measurements of some of the circuit
components have been presented. However, to validate
the functionality of the proposed architecture, some
system level (behavioral) simulations are mandatory. In
this section we provide such simulations using MATLAB
on the architecture of Fig. 6 for a system of 128×128 cells.
The input image fed to the system is shown in Fig. 18(a).
Using MATLAB the AER stream of addresses that this
image could generate was computed. The stream of
pulses flowing through the bus is characterized by a
sequence where
is the address present on the bus at time .
This stream of addresses was then used to control the
mathematical model of the architecture of Fig. 6. Each
one of the 128×128 cells is characterized by the state
of two integrators: the positive integrator and the
negative one . The state of the integrators is controlled
by the following differential equations (see eq. (18))
(28)
whose solution is of the form given by eq. (20) (to
compute its charging during the presence of a pulse) or
by eq. (21) (to compute its discharge during the absence
of pulses). These equations were used to update the state
of the integrators in the following manner: for each
address present on the bus all cells in
the range were
accessed. For each accessed cell the pulse width
was computed using the approximation of eq.
(10) and the simulation results of Fig. 10. Depending on
Ij-
Ij+
VREF
VREF
+
-
OPAMP2VREF
VREF
[Ij+-Ij-]+
[Ij--Ij+]+
 Fig. 16: Schematic of a cell to scan out the absolute
value of the difference of two currents
M2 M4
M1 M3
M5 M7
PMOS
MIRROR
|Ij+-Ij-|
M6
+
-
OPAMP1
I j+
V REF
V REF Iij- cij
I j+ I j-–
M1 M2 OPAMP1
OPAMP1
M1 M2
I ij
+
cij
V REF I j- I j+– M2
M4 M2
V REF
M4
M2
I j- I j+–[ ]+
I j- I j+– if I j- I j+– 0>
0 otherwise

=
M2 V REF
I j+ I j-– M1
M3
V REF
M5 M6,
M3 M5
I j+ I j-–[ ]+
I j+ I j-– if I j+ I j-– 0>
0 otherwise

=
M5 M7
M4 M7
Io I j+ I j-–=
M1 M7–
M4 M7
V REF M2 M4,
M5 M7,
I j+
80nA I j- 0nA
160nA
I j- I j+–[ ]+
M4 I j+ I j-–[ ]+
 Fig. 17: Hspice DC input-output Characteristics of a
Scancell
M7
xi tn( ) y j tn( ) tn, ,( ) n 0 1 2 …, , ,=
xi tn( ) y j tn( ),( ) tn
cij
I ij
+
I ij
-
QT
dIij
+
dt-------- I ij
+ I in
+ 1
A
---I ij
+
–  =
QT
dIij
-
dt-------- I ij
- I in
- 1
A
---I ij
-
–  =
xi tn( ) y j tn( ),( ) cpq
p i L– i L+,[ ]∈ q j L– j L+,[ ]∈,{ }
T h p q,( )
the resulting sign, either the positive or the negative
integrator was updated. After an integrator has been
updated the present time was stored for it, so that the
next time it needs to be updated the simulator can
compute properly its discharge amount with eq. (21).
For each cell its output is given by . Using
this method until all integrators have reached their
steady state within 1% tolerance results in the system
output depicted in Fig. 18(b). In this case, addresses
were not pre-rotated, so that the system is extracting
vertical edges. As can be seen, pixels around vertical
edges result in a very high output value, while as the
edge angle around a pixel deviates from vertical its
output value smoothly decreases until zero.
  VII. Conclusions and Future Work
An architecture that implements a pseudo-Gabor
filter for edge extraction has been presented. The
architecture allows to implement any 2D filter F(p,q)
decomposable into x-axis and y-axis components
such that the product can be
approximated by a signed minimum. Positive and
negative values of and can be programmed.
The architecture requires an AER input. This allows to
rotate the 2D convolution kernel any angle.
A VLSI circuit implementation that realizes the
proposed architecture is provided. Circuit simulation
results and experimental measurements of critical
components were given. System-level behavioral
simulations of a 128×128 array have been included which
validate the proposed approach. Cell size is
if no AER output is available and
if AER output is included, for a
double-poly double-metal CMOS process. This would
allow, for a die, to implement a 2D filter with
approximately pixels for no AER output, and
pixels if AER output is provided.
Future work includes the fabrication of a test
prototype, its interface to a retina chip with AER output,
and the implementation of more processing layers of the
vision model system described in Section II. Note that the
present architecture can be used to implement the
processing of Stages 4, 5, 7, and 8 as well. For the
implementation of Stage 6 the present architecture can be
used if it is possible to substitute the kernel of eq. (7) by
another one decomposable into horizontal and vertical
components and if the product can be approximated by a
signed minimum.
  VIII. References
[1] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sivilotti,
and D. Gillespie, “Silicon Auditory Processors as
Computer Peripherals,” IEEE Transactions on Neural
Networks, May, 1993.
[2] M. Mahowald, An Analog VLSI System for Stereoscopic
Vision, Kluwer Academic Publishers, 1994.
[3] Kwabena Boahen, “Retinomorphic Vision Systems,”
Microneuro’96: Fifth Int. Conf. on Neural Networks and
Fuzzy Systems, Lausanne, Switzerland, February 1996.
[4] Gordon M. Shepherd, The Synaptic Organization of the
Brain, Oxford University Press, 3rd Edition, 1990.
[5] S. Grossberg, E. Mingolla, and J. Williamson, “Synthetic
Aperture Radar Processing by a Multiple Scale Neural
System for Boundary and Surface Representation,”
Neural Networks, vol. 8, No. 7/8, pp. 1005-1028, 1995.
[6] Z. Kalayjian, J. Waskiewicz, D. Yochelson, and A. G.
Andreou, “Asynchronous Sampling of 2D Arrays using
Winner-Takes-All Arbitration,” Proceedings of the 1996
IEEE Int. Symp. on Circuits and Systems (ISCAS’96),
Atlanta, vol. 3, pp. 393-396, 1996.
[7] A. Mortara, E. A. Vittoz, and P. Venier, “A
Communication Scheme for Analog VLSI Perceptive
Systems,” IEEE Journal of Solid-State Circuits, vol. 30,
No. 6, pp. 660-669, June 1995.
[8] D. G. Nairn and C. A. T. Salama, “Current-Mode
Algorithmic Analog-to-Digital Converters,” IEEE
Journal of Solid-State Circuits, vol. 25, pp. 997-1004,
August 1990.
[9] A. Rodríguez-Vázquez, R. Domínguez-Castro, F.
Medeiro, and M. Delgado-Restituto, “High Resolution
CMOS Current Comparators: Design and Applications
to Current-Mode Function Generation,” Analog
Integrated Circuits and Signal Processing, vol. 7, pp.
149-165, 1995.
(a)
(b)
 Fig. 18: System-Level Behavioral Simulations of a
128×128 Array. (a) Input Image, (b) Output Image of
Pseudo-Gabor Filter extracting Vertical Edges
cij I ij
+ I ij
-
–
F p q,( ) H p( )V q( )=
H p( ) V q( )
67.2µm 72.6µm×
75µm 90.6µm× 1.2µm
1cm2
128 128×
120 100×
