I. Introduction
Human beings have the capability of recognizing objects, figures, and shapes even if they appear embedded within noise, are partially occluded or look distorted. To achieve this, the human vision processing system is structured into a number of massively interconnected neural layers with feedforward and feedback connections among them. Neurons communicate by means of electrical streams of pulses. Each neuron broadcasts its output to a large number of other neurons, which can be inside the same or at different layers, and the way this is done is through physical connections called synapses [l] . One big problem encountered by engineers when it comes to implement bio-inspired (vision) processing systems is to overcome the massive interconnections. The Address Even Representation (AER) [2] - [5] approach is a possible solution. Fig. 1 shows a schematic figure outlining the essence behind AER. Suppose we have an "emitter" chip containing a large number of neurons or cells D1, 0 2 , 0 3 , ... whose activity changes in time with a "relatively slow" time constant. For example, if Chip 1 is a retina chip and each neuron's activity represents the illumination sensed by a pixel, the time constant with which this activity changes can be equivalent to Frame-Rule (i.e., 25-30 changes per second or a time constant of about 30-40ms)'. The purpose of an AER based communication scheme is to be able to reproduce the time evolution of each neuron's activity inside a second or "receiver" chip, using a fast digital bus with a small number of pins. In the "emitter" chip the activity of each pixel has to be transformed into a pulse stream signal such that pulse width is minimum and the spacing between pulses is reasonably high to time multiplex the activity of a relatively large number of neurons. Every time a neuron produces a pulse its address or code should be written on the bus. For the case more than one pulses are produced simultaneously by several neurons, a classical arbitration tree can be introduced [2] - [4] , or one based in WinnerTakes-All (WTA) row-wise competitions [6], or simply by making no neuron accessing the bus in case of a "collision" [7] . Whatever method is used the result will be the presence of a sequence of addresses or codes on the digital bus that one or more receiver chips can read. Each receiver chip must contain a decoding circuitry so that a pulse reaches the neuron (or neurons) specified by the address read on the bus. If each neuron integrates the sequence of pulses properly, the original activity of the neurons in the emitter chip will be reproduced. AER allows easily to add more complicated processing. For example, input images can be translated or rotated by remapping the addresses while they travel from one chip to the next. By properly programming an EEPROM as a look-up table any address remapping can be implemented, by simply inserting the EEPROM between the two chips. Furthermore, many EEPROMs can be connected in parallel each performing, for example, a rotation at a specific angle, and each delivering the remapped addresses to a set of specialized processing chips. In the architecture proposed in this paper, we implement a synaptically weighted projection field for each address read on the bus. This can be done by either having a.hard-wired kemel in the filtering chip [8] , or by implementing a programmable one, as proposed in this paper.
The Programmable Filter
The programmable filter described in this paper is intended to be used in a vision model system, known as the BoundaryContour-System (BCS) and Feature-Contour-System (FCS) [9] . Such vision model consists of a set of layers computing convolutions. The convolutional kemels used in most of these layers F(x, y ) are decomposable into x-and y-axis components, F ( x , y ) = H ( x ) V ( y ) , for some rotated coordinate system {x, y } . Using AER allows us to implement a filtering chip only for the coordinate system {x, y } for which F ( x , y ) is decomposable. To do the filtering for another coordinate system {x', y ' } , rotated with respect to {x, y } an arbitrary angle a , we can use the same chip but providing addresses which have been rotated -a previously.
In the filtering chip, the convolutional kemel is implemented as follows. Let us call Po(p, q ) the sequence of pulses the AER bus receives for address (p.4). Every time a pulse for address ( p , q ) is received, pulses are sent to all pixels in its vicinity. This way, a lossy integrator at pixel (x, y ) of the receiver chip will integrate the sequence of pulses
which are all pulses coming in from its vicinity, weighted by the convolutional kemel F(x, y ) . The weighting is performed by modulating the width of each incoming pulse. Thus, every time a pulse is received for pixel ( p , q ) , a pulse of width
Pulse width modulation is done as follows. When a pulse for coordinate ( p , q ) is received, all columns x in the vicinity of column p receive a pulse of width l H ( x -p)I , and all rows y in the vicinity of rows q receive a pulse of width IV(y -q)1 . The values of H ( . ) and V ( . ) are stored in a small on-chip RAM. The integrator at coordinate ( x , y ) receives a pulse of width equal to the 1. In this paper we consider "Red-Time" a processing that is performed at frame rate (30-4Oms) or faster 0-7695-0619-4/00 $10.00 0 2000 EEE minimum of IH(x -p)l and lV(y -q ) / . Consequently, the convolutional kernel the system will implement is an approximation to (2) 
Iv(y)l)
the signed minimum of the vertical and horizontal components.
Circuit Description
The address bus provides the coordinates (xo, yo) of the neuron (or pixel) around which the convolutional kernel should be applied. Pulses will be applied to all rows with y-coordinate in The following n bits indicate the absolute value IH(x)l (or IV(y)l ). These n bits linearly control the length of the pulse triggered by monostables Mx, (or My, ). The monostables achieve this by charging with a constant current a programmable capacitor controlled by the n bits in Rx, or Ry, . The pulses generated by the monostables are sent through lines T x , (or T y , ) and are triggered whenever an external pulse arrives to the system. When an external pulse arrives, the input decoders activate lines xi and y . corresponding to the address of the arriving pulse. connected to the negative line P xi-, . This way, pulses T x , (or Ty, ) are sent through lines P xi'-, or P xi-, (Pyj'_ , or P yj-, )
depending on the sign of the weight stored in Rx, (or Ry, ). Each neuron c . . has two integrators. The positive integrator accumulates charge when pulses are simultaneously arriving through horizontarand vertical lines of the same sign. That is, it integrates a pulse when lines Px: and Py+ (or lines Px; and P y j ) are simultaneously high, or equivalently it performs the operation (Px: n P y f ) U ( P x ; n P y j ) . Sihlarly, the negative integrator accumulates charge when pulses arriving through horizontal and vertical lines of opposite sign Px? and P y j (or Px; and P y: ) are simultaneously high, that is, it performs the operation (P x: n P y i ) U ( P x; n P y;) .
,) used to select the neighborhood of cells where the monostable pulses have to be sent. It consists of two NANb gates contiolling the PMOS switches M P + and M P -, and two NMOS pull down transistors MN+ and M N -. Each selection cell Cxi-, , has two control signals (the decoder output xi and the sign bit Sx, from RAM X ), one input signal (the monostable output T i , ) and two outputs (P xi'-, and P x i -, ). When a pulse arrives with address ( x i , y .) , it activates the decoders output xi and y . , respectively. The decoder output xi controls all the selection cells C x j -, , with 1 E [-L, ..., L] . When xi is high, if the si& bit Sx, is 'I,, the selection cell Cxi_,> , connects the monostable output line Tx, to the positive line Px?-, . If the sign bit Sx, is '0'. line T x , is connected to the negative line P x i -, .
The same is valid for the Y coordinate selection cells.
Each synaptic cell cij has two integrators, the positive and the negative. Fig. 3(b) shows the circuit diagram for the positive integrator. The negative is identical, except for labelling. The integrator is based on the Capacitor-Diode integrator concept for subthreshold MOS operation [4] . This integrator has some interesting properties with respect to a conventional linear RCintegrator:
In Fig. 3(b) , the two AND and the NOR gates provide a pulse of width equal to the minimum of the pulse width coming in horizontally and vertically. This pulse turns ON current source M, providing a current pulse of amplitude I, (controlled by bias voltage V, ). Since transistors M : and M l are biased in subthreshold, the integrator input and output currents, ITn and lij , are related by [4] Fig. 3(a) depicts the schematic of the selection cell Cxj-, , (or Steady state current is proportional to pulse stream frequency Steady state current is proportional to pulse width Steady state current ripple is independent of current level Steady state is reached, for a given precision, after a constant number of pulses where A = exp((Vdd-V,)/v,) , Q, = CV,/K, v, is thermal voltage and K is a characteristic subthreshold dimensionless technology parameter whose value may range from 0.60 to 0.98 [IO] . When a train of pulses of width T, and frequency 1/T ( T D T, ) is applied to this integrator, the steady state output current and ripple are [4] where T = Q,/I, D T, . Equation (4) expresses that the relative resolution in the integrator output is constant, independent of the signal level, and that each integrator outputs a current which is proportional to the frequency 1 / T and width T, of the input pulses. If the AER input image pixel intensity is linearly encoded with the frequency of the arriving pulses, and the convolutional kemel is encoded as the pulses width, the output current of the integrators would be the input image filtered by the convolutional kernel. Fig. 4(a) shows an Hspice transient simulation for one of the integrator cells in Fig. 3(b) . Transistor sizes are W = 12pm and L = 12pm, integrating capacitor is C = O.lpF, pulse amplitude is I , = 13.5nA , pulse width is T , = loons, frequency ?f pulse stream is 1IT = 8 0 K H z . V,, = 5V , and voltage V , was set to 4.67V (which yields a current gain from transistor M: to M Y -of around 2000). Similar simulations were performed by sweeping the frequency of the input pulse stream 1IT and the width of the pulses T, . The results are shown in Fig. 5 . Fig. 5(a) shows the steady-state current level as a function of frequency, while maintaining T, = I o n s . Fig. 5(b) shows the steady-state current level as a function of pulse width, while maintaining the frequency constant at 4 K H z .
Sometimes, in 2D image filtering processing, a rectification operation has to be performed. This is the case, for instance, when doing orientation extraction with Gabor-like kemel filters. The output of the filter is rectified for each pixel [9] . Because of this, the chip scan-out circuitry, which brings out of the chip the state of a cij cell, has been designed to be able to add a rectification operation. The random access scanning circuitry can read the rectified output current of any cell selected by the Random Scan Bus of Fig. 2 . The output decoder X (see Fig. 2 ) selects a column i through pin S c x i . When a column is not selected, the output currents I + and I ; of all cij cells in that column flow to a line of constant voltage VREF (see Fig. 3(b) ). If column i is selected, currents I! and I : . of all c i . cells in these columns flow to lines If and I j , respectively, of the scan out cell S c a n j shown in a gain higher than one (actually the gain will be exponentially controlled by this voltage difference). This allows to have a current gain such that the output current is of the order of hundreds of p A m p s or even some mili -A m p s , making it possible to drive this current directly off-chip at high speeds. Fig. 4(b) shows an Hspice simulation of the DC characteristic of a scancell. In this simulation, current I + was set to 80nA and current I-. was swept from OnA to 160nA. Two traces are shown in Fig. 4(b) . The Using the analytical solution of eq. (3) and the Hspice simulation results, behavioral system level simulations were performed on the arquitecture of Fig. 2 for a 128 x 128 array, doing a Gabor filtering for vertical lines extraction. Fig. 7 ( b ) shows the output obtain for the input image in Fig. 7(a) .
IV. Conclusions
An architecture that implements a programmable 2 D image filter has been presented. The architecture allows to implement anyJL@u!/ [Will I @fL 
