Abstract-This paper describes the electronic implementation of a four-layer cellular neural network architecture implementing two components of a functional model of neurons in the visual cortex: linear orientation selective filtering and half wave rectification. Separate ON and OFF layers represent the positive and negative outputs of two-phase quadrature Gabor-type filters, whose orientation and spatial-frequency tunings are electronically adjustable. To enable the construction of a multichip network to extract different orientations in parallel, the chip includes an address event representation (AER) transceiver that accepts and produces two-dimensional images that are rate encoded as spike trains. It also includes routing circuitry that facilitates point-to-point signal fan in and fan out. We present measured results from a 32 64 pixel prototype, which was fabricated in the TSMC0.25-m process on a 3.84 by 2.54 mm die. Quiescent power dissipation is 3 mW and is determined primarily by the spike activity on the AER bus. Settling times are on the order of a few milliseconds. In comparison with a two-layer network implementing the same filters, this network results in a more symmetric circuit design with lower quiescent power dissipation, albeit at the expense of twice as many transistors.
consists of a linear spatio-temporal filtering stage and three nonlinear mechanisms: half-wave rectification, expansive exponentiation, and contrast normalization [1] [2] [3] . Linear spatio-temporal filtering determines the neural selectivity along different stimulus dimensions. Half-wave rectification conserves metabolic energy by mapping mean levels to a low quiescent spike rate. Expansive exponentiation sharpens selectivity. Contrast normalization enables neurons to retain stimulus selectivity over a wide input contrast range.
This paper describes a VLSI chip that implements two components of this model: linear orientation selective spatial filtering followed by half-wave rectification. Orientation selectivity is a predominant characteristic of neurons in the primary visual cortex [4] . The implementation of orientation selective neurons is an appropriate starting point in building a silicon model of the selectivity of neurons in the visual cortex, since orientation selective neurons are used in neural models of selectivity along other stimulus dimensions such as direction of motion [5] , [6] , and binocular disparity [7] .
This chip implements neurons with spatial receptive field (RF) profiles that are similar to a Gabor function. In the functional model, the RF profile is the filter's impulse response reflected around the and axes. Gabor functions fit the RF profiles measured from orientation selective cortical neurons well [8] [9] [10] . A Gabor function is a sinusoidal grating with frequency and orientation modulated by a Gaussian envelope (1) where represents the original coordinate function translated by and rotated by . The parameter determines the spatial phase of the sinusoid with respect to the center of the Gaussian. The RF profiles of the neurons on this chip are Gabor-type, since their modulating function is not Gaussian.
The chip described here processes a 32 by 64 pixel input image with two orientation selective filters using a continuous time analog processing network. The filters have even and odd symmetric impulse response that are said to be in phase quadrature, since they differ in phase by . Physiological measurements in cortex indicate that neighboring neurons often differ in phase by [11] , with the distribution of phases clustering around even and odd symmetric RF profiles [12] . Energy models of motion and binocular disparity selectivity rely heavily on the existence of neurons with phase quadrature RF profiles [5] , [7] .
The differential ON-OFF channels used to represent all input, output and internal signals differentiates this chip from previous electronic implementations of orientation selective filtering 1057 -7122/04$20.00 © 2004 IEEE networks, which mostly used single-ended representations [13] [14] [15] [16] . Serrano-Gotarredona et al. propose an architecture that uses an internal differential representation to accumulate signals, but the input and output are single ended [17] . Liu et al. constructed orientation selective neurons from the output of a silicon retina that included both ON and OFF channels, but only used the OFF channels [18] . Although the ON-OFF circuit architecture requires as many transistors as a functionally equivalent single-ended design, it has several compelling advantages as we describe in the latter part of the paper.
Section II describes the four-layer cellular neural network (CNN) architecture used to establish the orientation selectivity of the neurons on the chip. Section III derives the spatial transfer functions of the orientation selective filters and proves stability. Section IV describes the chip architecture, with a high level description of the address event representation (AER) communication circuits used for input and output. Section V describes in detail the pixel level processing circuits, including the analog circuits for the orientation selective filtering network, as well as the circuits converting between the digital asynchronous spike train representation used at the periphery and the continuous time current-mode representation used internally. Section VI reports experimental measurements from the chip, and compares the design here with a previous two-layer design. Section VII concludes with a summary and discussion of future directions. Preliminary reports of this work have appeared in [19] [20] [21] .
II. NETWORK ARCHITECTURE
Biological systems use ON and OFF channels to encode signal variations around a background level efficiently. While single neurons can encode positive and negative signals as variations around a quiescent firing rate, this representation is inefficient as each spike consumes metabolic resources. With separate ON and OFF channels, background signals correspond to low quiescent spike rates on both channels. Positive signals are encoded by increases in the ON-channel spike rate, negative signals by increases in the OFF-channel spike rate. For example, ON-center and OFF-center retinal ganglion cells respond to positive and negative contrast of the center with respect to the surround. Diffuse illumination applied to both center and surround elicits little response from either cell.
ON-OFF signal representations prevail in computational models of visual cortical neurons. Most cortical neurons exhibit low spontaneous spike rates. Hubel and Weisel proposed that the oriented excitatory and inhibitory regions of the RF profiles arise from linear summation of feedforward input from corresponding ON-center and OFF-center cells in the lateral geniculate nucleus [4] . This basic model has been preserved in most subsequent work, which has extended it to include push-pull inhibition to account for contrast invariance or cortical feedback to sharpen orientation selectivity. Ferster and Miller give a review of recent work in [22] .
Our network also adopts an ON-OFF signal representation. Each pixel in the image is associated with four neurons. Two neurons carry the positive (ON) and negative (OFF) half-wave rectified outputs of the even symmetric filter. The other two carry the output of the odd symmetric filter. We refer to the neurons as EVEN ON , EVEN OFF , ODD ON , and ODD OFF .
Our network establishes orientation selectivity through local recurrent interconnections between neurons, which facilitate implementation in VLSI while enabling the resulting RF profiles to extend over many pixels. We model the network as a four-layer CNN whose layers are indexed by . Each layer consists of an by array of cells, each with real valued input , state , and output , where indexes the array location. We drop the when referring to an entire layer. We assume that the input to all layers is strictly positive. The outputs of the ON and OFF layers are equal to the difference between their states positive and negative half wave rectified (2) where and . The state evolves according to the differential equation (3) where the summation is over all layers. The denotes the elementwise product of two arrays. The denotes correlation, e.g.,
. The coefficient matrices , and are the state feedback, output feedback and feedforward cloning templates.
This network differs from the classical multilayer CNN [23] in three ways. First, it adds the elementwise product with the state. Second, it contains an additional state feedback template , through which each cell's state influences its neighboring cells' states. Third, the output of each cell is a nonlinear function of the states in cells of two layers, which introduces additional coupling between layers.
The nonzero cloning templates for orientation selective filtering are where In each template matrix, the central element indicates the (0, 0) term. The parameters are nonnegative reals and determine the 
III. NETWORK ANALYSIS
This section derives the spatial transfer function of the network that relates a constant input image with the steady state output and examines the stability of the network. The first part summarizes the main results. The subsections contain the detailed proofs, which can be skipped without loss of continuity.
Because the analysis of this network is complicated by the element-wise product, we consider a simplified network without the product (4) Fig. 2(a) shows a block diagram of the interactions between the layers. This network is easier to analyze, but has much in common with the original network in (3). First, the two networks share a unique common equilibrium point where the state of all cells is positive. Second, any additional equilibrium point in the original network is unstable. Finally, we conjecture that stability of the common equilibrium point in the simplified network implies its stability in the original network.
We derive the transfer function from a constant input image to the common equilibrium by expressing (4) in terms of the sum and difference of the ON and OFF input and state variables, e.g., and . We find that both the sum and difference components evolve according to linear differential equations, but that the sum components are driven by a nonlinear function of the difference components. The difference components evolve independently, and are related to the input difference components by the transfer function (5) where and are the discrete Fourier transforms of and and are spatial-frequency variables, and This transfer function reaches its maximum value of at , corresponding to orientation and spatial frequency . Since the transfer function drops by approximately one half at and , we refer to and as the 6-dB half bandwidth in the and directions.
Although there are five parameters that determine the filter shape, only four can be specified independently by the parameters. The filter gain is fixed by the choice of , and according to (6) Because the parameters are positive, both and are nonnegative, corresponding to orientations between 0 and . Orientations outside this range can be obtained by reflecting the templates around the horizontal and/or vertical axes. Alternatively, we can flip the image from left to right before input to the network. To relate this complex valued filter to real valued Gabor filter described previously, observe that where and are the transforms of (1) with and . The modulating function can be approximated by a Laplacian function in 1-D and by a Bessel function in two-dimensional (2-D) [24] .
Because the network connections are spatially invariant, the Fourier modes evolve independently [25] . We establish the stability of the sum and difference components by showing that the eigenvalues of the feedback matrices lie in the left half plane for all parameters corresponding to valid filter parameters. For unstable parameters, the network exhibits spatially oriented Turing patterns [26] . Our implementation uses a transistor analog of a network of conductances, which Poggio and Koch suggested for solving problems in computational vision [27] , [28] . The stability of conductance networks can be established by viewing the dynamics as gradient descent on a suitably defined cost function [29] [30] [31] [32] [33] , and a similar approach can be taken for this network [24] . Incorporating the half wave rectifying nonlinearity into the feedback is critical in ensuring network stability.
A. Original Versus Simplified Network
Clearly, any equilibrium point of (4) is also an equilibrium point of (3). The existence of the spatial transfer function indicates that the equilibrium point of (4) is unique. The state of all cells is positive at this equilibrium point because the feedback loops containing the blocks and correspond to a lossy diffusion process driven by a strictly positive input. Formally, assume that assumes its minimum at pixel . The first equation in (4) evaluated at equilibrium gives where
The term since is a minimum. The next two terms since the outputs and the parameters are nonnegative. The last term by assumption. In the electronic implementation, this input is represented by the current through a transistor in saturation, which will always be positive due to leakage. Thus, . Similar arguments hold for the other layers.
Any additional equilibrium point in the original network is unstable. First note that the state of any neuron at equilibrium must be nonnegative. If the minimum state is nonzero, it must satisfy (7), which implies that it must be positive. Any additional equilibrium point must have for some . Linearizing around this equilibrium and letting denote a small perturbation around it, we have that where by the arguments above.
It seems reasonable that stability of the common equilibrium point in the simplified network should imply stability for the original network, since the element-wise product operation does not change the slope of the derivative at the equilibrium point, as the state is strictly positive at equilibrium. In addition, our numerical simulations and experimental measurements from the chip have not revealed any unexpected instability.
B. Spatial Transfer Function
Expressing (4) in terms of the sums and differences of the ON and OFF variables (8) where and denotes the element-wise absolute value of . Correlation by is a discrete approximation to a directional derivative. The template is a combination of even impulse pairs in the horizontal and vertical directions. Fig. 2(b) illustrates that the both the sum and difference components evolve linearly. The difference components are unaffected by the sum components, but the sum components are driven by the absolute value of the difference components.
In [24] , we studied the dynamics of the differential components. For completeness, we recapitulate that analysis here. The steady state response and the stability can be analyzed easily in the spatial-frequency domain. Assume a doubly infinite array and define , and to be the 2-D discrete Fourier transforms of the input, state, and output, e.g.,
. Taking the discrete Fourier transform of the first two equations in (8), correlation by and correspond to multiplication by where . Thus
We suppress the dependence on and to avoid clutter. If we define and , then . Letting and defining to be the spatial transfer function at temporal steady state, we obtain (5).
C. Stability
Stability of the difference components is guaranteed for any set of parameters corresponding to valid filter parameters. Equation (6) implies , which implies for all . However, the network is unstable for combinations of parameters that imply . It is impossible for , because the parameters are nonnegative. Fig. 3 shows that the network is unstable if the cross coupling between the even and odd layers, which is determined by the parameters, is large enough. The sum components evolve according to a discrete approximation to a lossy diffusion equation driven by the sum of the full-wave rectified difference component and the input sum component. In the spatial-frequency domain where . and denote the discrete Fourier transforms of the absolute values of the even and odd difference components. Stability is ensured since for all . Incorporating the half wave rectifying nonlinearity into the dynamics is essential to ensure network stability. To see this, suppose that we remove the nonlinearity and instead let in (4) and make similar substitutions for , and . We find that Fig. 2(c) shows that the evolution of the difference components is identical to that in (8) , but the sum component now evolves independently of the difference component.
The sum components are unstable for some parameters that correspond to valid filter parameters. In the spatial-frequency domain For stability of the eigenvalues of the feedback matrix, , must be negative for all . Since the parameters are nonnegative, the largest eigenvalue occurs for . This implies that for stability, we must have . Fig. 3 shows that the stable parameter region is only a subset of the stable parameter region for the difference components. 
IV. CHIP ARCHITECTURE
The visual cortex processes each region of the visual field with neurons selective to many different orientations, which are grouped into a hypercolumn. To replicate this organization, we require a set of chips, each processing the same image but tuned to different orientations. Fig. 4 (a) depicts a three-chip network.
To enable the construction of this multichip system, each chip is a transceiver, containing both a receiver to receive input images and transmitter to transmit output images. Each chip also includes asynchronous routing circuits to facilitate signal fan out and fan in, which will be described in detail in a forthcoming publication. Briefly, the split replicates its input, sending one copy into the processing array via a receiver and sending the other off chip. Fanning the output of a silicon retina (e.g., [34] ) out to a set of chips by cascading the split output of one with the split input of the next, we can build an array of orientation selective hypercolumns. The merge circuit combines its input with the array output from the transmitter and sends the combined stream off chip. The encoding we use enables us to distinguish the different images at the merge output, but we can also use the merge to combine images additively, implementing signal fan in. Combining input signals from a retina with output signals from other chips, we can implement intracortical interconnections, as well as feedback interconnections from later processing stages. The vast majority of inputs to cortical neurons come from other nearby cortical neurons, i.e., neurons tuned to similar orientations [35] , [36] . Feedback from extrastriate areas appears to modulate the responses of neurons in V1 [37] .
Input and output images are rate encoded as arrays of spike trains, which are communicated using the AER protocol [38] . The AER protocol communicates continuous time spike activity from an array of silicon neurons in one chip to another chip over an asynchronous digital bus. It is more efficient than scanning when the spike activity within the array is sparse, as we expect here since only a few image locations will contain edges near the orientation selected by each chip.
The transmitter signals a spike occurrence by placing the location (address) of the spiking neuron onto the bus. The receiver takes the address that appears on the bus and feeds a spike to the corresponding neuron in its array. The protocol is asynchronous, with the time that the address appears on the bus encoding the spike time directly. Collisions between simultaneous spikes from two neurons are handled by arbitration.
Addresses are placed onto the bus in "bursts," where each burst encodes all of the simultaneous spikes from neurons within a given row and a given chip. We use a word serial format, where each burst is a sequence of addresses. As shown in Fig. 4(b) , the transmitter signals the start of a burst by placing an address identifying the source chip onto the address lines (Addr) and taking the request signal ReqY high. Subsequent addresses are signalled by taking _ReqX low. The second address identifies the row. Each of the remaining addresses identifies one of the columns containing a neuron that spiked. The transmitter signals the end of the burst by taking ReqY low. The receiver acknowledges receipt of each address by a transition on the Ack line.
We use absolute addressing to identify rows and columns within a chip, but relative addressing to identify each chip. Each chip signals its own activity with bursts whose chip addresses are set to zero. Every time a chip relays a burst from its split or a merge input, it increments the chip address. For example, a chip address of 1 at the merge output of Chip B in Fig. 4(a) indicates the spikes in the burst come from Chip A.
For each pixel, the four neurons are addressed using the least significant bit (LSB) of the row and column addresses. EVEN and ODD neurons are indexed by row addresses with the least significant bit (LSB) at 0 and 1. ON and OFF neurons are indexed by column addresses with LSB 0 and 1. Thus, the network for processing an by pixel image actually contains a by array of neurons arranged into 2 2 blocks.
V. PIXEL PROCESSING CIRCUITS
Each pixel in the array contains the circuits necessary for processing four neurons. This includes four leaky integrators that convert input spike trains to continuous currents, current-mode analog processing circuits that implement the filtering/rectification network and four spiking neuron circuits that convert the current outputs of the network to spike trains.
We represent each state variable array as the drain currents in an array of nMOS transistors with fixed gate voltage , as shown in Fig. 5(a) . The sources are connected through capacitors to the ground. We assume all transistors operate in weak inversion and are saturated, so the drain currents representing the layer are given by (10) where and are the gate and source voltages referenced to the bulk node, is the thermal voltage, is a process and geometry dependent current and is a process dependent constant. Representative parameters for the TSMC0.25 um process are pA and . Differentiating with respect to time, , where is the current entering the capacitor . To implement the network, we equate the current flowing out of the capacitor with the sum on the right hand side of the elementwise product operator in (3). Each sum can be grouped as three currents , and . For the first equation, describing the intralayer state feedback, describing the cross coupling between layers and describing the input. In the fabricated design, we do not implement the capacitor explicitly. However, the source nodes will invariably have some parasitic capacitances associated with them. The remainder of this section describes the circuits generating these three currents, as well as the spiking neuron circuit that converts each output current into a spike train.
A. Layer Self-Feedback
We use the diffuser/pseudoresistor network [39] , [40] in Fig. 5(a) to implement . In total, the circuit contains four diffuser networks, one for each layer.
In weak inversion, the drain current flowing through the horizontal nMOS transistor from node into node is Substituting (10), we get where . The total current flowing out of the capacitor at each node due to the five transistors connected in the diffuser network implements where .
B. Cross-Coupling Circuits
Layers are coupled through the cell outputs. The mapping from state to output in (2) can be specified by the implicit equations
The ON-OFF circuit [34] in Fig. 5(b) implements a similar mapping (11)
where is a small current set by . Kirchoff's current law applied at the sources of transistors and gives . Rearranging the left and right sides gives (11) . The translinear principle applied to the loop gives where (13) Combining these equations with which yields (12) . The upper bound in (12) is equal to the zero input quiescent output current in both and . Each spatial shift operator is implemented by a tilted current mirror shown in Fig. 5(c) . The difference in the source voltage controls the gain:
. The shift is implemented by connecting the drain voltage of the output transistor to the appropriate node of the diffuser network. The entire cross coupling between the even and odd arrays, requires one diode connected transistor with source voltage and four mirror transistors, two with source voltage and two with .
C. Current-Mode Integrator
Four current-mode integrators at each pixel convert the incoming spike trains to input currents . Fig. 6(a) shows the schematic of one integrator [41] . The inputs _RSelX and _RSelY are shared by one row or column of cells. The receiver takes both inputs low when an address event with the corresponding row and column address is received. This injects a charge packet into the diode-capacitor integrator formed by and , and pulls the acknowledge signal _Ack low, signalling the receiver that the spike has been delivered. The bias voltage controls the magnitude of the current pulse and the communication cycle-time determines its duration. The difference between the source voltages of the current mirror, , controls the gain and the time constant of the integrator. 
D. Spiking Neuron
Four spiking neuron circuits at each pixel convert the four output currents, which are obtained by mirroring the diode connected transistor of the spatial shift circuit, into spike trains. We use the design shown in Fig. 6(b) , which is similar to that in [42] . The voltage is initially high, and decreases as discharges . Once reaches a threshold value, the inverter switches and the neuron fires, bringing the row request line, _TReqY, low to signal a spike. Once a row has been selected, the AER transmitter takes TSelX high. All of the neurons in the selected row that have generated a spike then reset and pull the column request lines, _TReqX low. Once a row has been selected, no new neurons in that row can spike.
Positive feedback through the current mirror minimizes the inverter switching time, saving power. The bias voltage controls the amount of feedback. If it is too high, current feedback is small and power consumption increases. If it is too low, the background firing rate is high and obscures the signal. The _RESET signal is a global signal which resets all neurons.
VI. EXPERIMENTAL RESULTS
We designed and fabricated an array of 32 64 pixels in the TSMC0.25 um mixed signal/RF process available through MOSIS. This process contains five metal layers and one poly layer, uses nonepitaxial wafers, and is intended for 2.5-V applications. Chip characteristics are summarized in Table I .
We generated the array layout by tiling metapixels, which contain the circuits required for two pixels stacked vertically. Fig. 7 shows the layout of the top half of the metapixel. The bottom half is mirrored vertically so that the analog filter circuits are adjacent.
We laid out the metapixels to minimize cross talk from the digital communication circuits (the spiking neurons and the current-mode integrators) to the analog spatial filtering circuits. The analog filtering circuits lie in the middle of the metapixel, with the digital circuits on the top and bottom. Within the digital parts, the integrators lie next to the analog circuits. The spiking neurons, which contain the most switching transistors, lie at the top and bottom, farthest from the analog processing circuits. Guard rings, which are inserted between the Gabor cells, the integrators and the spiking neurons, provide low impedance paths to collect the minority carriers injected by digital transistors, which would otherwise lead to variations in the bulk voltage when they reach a well or substrate. The digital and analog circuits use separate power and ground lines. Bias lines connected to source voltages controlling current mirror gains run wide on the top metal layer to reduce impedance.
A. Steady-State Response
With the Gabor-type filtering circuits turned off by setting , the spiking neuron circuit maintains a background spike rate because the gate node of transistors and in Fig. 6(b) is not fully discharged to ground during the reset of and the residual current through discharges the gate of . Increasing decreases the quiescent spike rate by reducing this residual current. However, it also increases power consumption per spike by decreasing the gain of the current feedback. In a tradeoff between these two effects, we set mV to minimize total power consumption. At this point, the average spike rate per neuron is 5.8 Hz with a standard deviation of 6.6 Hz. We computed these statistics using a total of 392 256 spikes collected from the merge output during an 8.2-s time window. When the Gabor-type filter circuits turned on but with no input applied, the background spike rate and its variance increase due to the quiescent output current of the ON-OFF circuit. For the array tuned to vertical orientations, the average quiescent spike rate computed across the array was 15.1 Hz, with a standard deviation of 9.0 Hz. We computed these statistics tuning using a total of 392 714 spikes collected from the merge output during a 3.2 second time window. A similar increase is observed for other filter parameters.
To test the spatial impulse response of the array, we excited the " " input of pixel (17, 32) with a 50-kHz spike train from a pattern generator. All other inputs were silent. A logic analyzer connected to the merge output collected the output spike train, which is digitally processed for analysis. Fig. 8 shows the four outputs of the array. We computed the spike statistics using 390680 spikes collected over a 3.0 second window.
To show the tunability of the array, Fig. 9 shows the differences between the ON and OFF outputs for a spatial impulse input, when the array is tuned to vertical, diagonal, and horizontal orientations and different spatial scales. For vertical tuning, filter parameters which fit the response predicted by (1) in the least squares sense were radians/pixel and radians/pixel. The signal-tonoise ratio, defined as the energy in the ideal filter output with the best fit parameters divided by the energy in the difference between the actual and ideal filter outputs was 11.2 dB. For the diagonal orientation tuning, best fit parameters were , corresponding to a spatial frequency and orientation , and . The signal-to-noise ratio was 11.8 dB. For the horizontal orientation tuning, best fit parameters were and . The signal-to-noise ratio was 8.8 dB.
B. Temporal Response
We measured the temporal response of the arrays by applying a step change in the spike rate applied to the input of pixel (17, 32) from 0 Hz to 25 kHz (stimulus onset) and vice versa (stimulus offset). Using a fast input spike rate better indicates the response of the current-mode processing array, since it minimizes temporal ripple in the output current of the current-mode integrator. More than 10 input spikes are integrated per output spike, so the output spike response is not influenced significantly by the temporal characteristics of the input spike train. These experiments revealed a temporal asymmetry between the response to stimulus onset and offset. Fig. 10 shoes the output at pixel (17, 32) . The response to stimulus onset is essentially instantaneous. The steady state response is a spike rate of 1.6 kHz. This corresponds to an average interspike interval of 0.625 ms, which is the approximately the delay before the first spike. On the other hand, at stimulus offset the response took about 1 ms to die away. Fig. 11 shows a similar asymmetry in the response of the neuron at pixel (17, 33) . In this case the steady state firing rate to the stimulus is 360 Hz, corresponding to an interspike interval of 2.8 ms. The temporal asymmetry is primarily due to the nonlinearity introduced by the elementwise multiplication in (3), which slows down the network at low input levels, but speeds it up at high input levels. The dynamics of the current-mode integrated has similar characteristics [41] , so part of the asymmetry can be attributed to this stage.
Despite the asymmetry, the settling times for onset and offset are both on the order a few milliseconds, which implies that for intended applications, the temporal dynamics of the array are negligible. For reference, consider that each frame in a video sequence occupies 30-40 ms or that the temporal bandwidth of cortical neurons is on the order of 10s of hertz. 
C. Power Dissipation
The power consumption is dominated by the activity of the communication circuits, rather than the processing circuits. We measured the power dissipation of the chip while stimulating pixel (16, 32) with spike trains ranging in frequency from 0 Hz to 100 kHz and plot the results in Fig. 12 as a function of average output activity per neuron, which is much higher than the input activity. The power increases linearly with the output activity. The quiescent power consumption with no input, but an average output activity of 14 Hz, is about 3 mW. The buffer circuits that drive the pads account for around 75% of the total power consumption. The digital spike communication circuits account for around 24%. The analog circuits consume less than 1%.
D. Comparison With Single-Ended Architecture
An implementation of the same filter kernels using a single-ended representation, described in [43] , requires half as many transistors for implementing the analog filtering network. If connected to an AER interface, each pixel would require half as many integrators and spiking neurons. However, the ON-OFF implementation described here has several advantages which outweigh the additional hardware cost.
First, the filtering network has reduced quiescent (zero input) power dissipation. The single-ended implementation encodes positive and negative signals as variations around a quiescent bias current which dissipates power, even if the output of the filter is zero. In the ON-OFF implementation, the analogous bias current is the quiescent output currents in the ON-OFF circuit. To compare the power dissipation across a range of operating conditions, we assume that the maximum absolute signal current in the two cases is the same. Since the bias current limits the maximum negative signal excursion, the power dissipation of the single-ended implementation is where is a constant of proportionality depending upon the supply voltage and the array tuning.
An upper bound on the quiescent power dissipation of the ON-OFF implementation is -where the factor 2 arises because the ON-OFF implementation has twice as many paths from to ground. We estimate this by assuming that the output currents of the ON-OFF circuit are and computing the total current flowing from , ignoring the reduction in the output current due to the feedback which supplies current to the input. Since and should be in saturation, the source voltages of and must be a several multiples of below . Thus, the maximum output current of the ON-OFF circuit is . Combining this with (13) gives , which implies -Typical parameters for the TSMC0.25 um process are 3.1 pA and . Choosing nA and , we obtain -. However, there is a tradeoff between latency and power. In the ON-OFF implementation, weak signals are processed slower than fast signals because the elementwise product with the state slows the dynamics of the array when the current levels are smaller. The signal gain is independent of signal strength. Since this chip takes input from a contrast sensitive silicon retina, weak signals correspond to areas with little contrast. The slower response improves signal-to-noise ratio for weak signals by increasing temporal smoothing. Biological systems exploit the same strategy. In the retina, rods, which are sensitive for dim light, respond slower than cones, which are sensitive to bright light [44] . The response of cat retinal ganglion cells speeds up for higher contrast signals [45] .
Second, the ON-OFF network exhibits reduced fixed pattern noise in the output. The primary source of fixed pattern noise in the single-ended architecture is mismatch in the transistors supplying the bias current, which adds spatial noise to the filter input. By reducing the quiescent bias current, the ON-OFF network reduces the fixed pattern noise. In [43] , the standard deviation of the fixed pattern noise was 26-38% of the bias current. In this network, the standard deviation of the quiescent spike rate in Fig. 8 with no input (9.1 Hz) was 1.2% of the peak spike rate (752 Hz).
Third, the ON-OFF signal representation includes half-wave rectification of the filter output, while the single-ended architecture does not. Although this could be added as a separate circuit to the single-ended architecture, its design is complicated by the large fixed pattern noise in the output, which means that the reference point around which to rectify varies from pixel to pixel.
Fourth, the ON-OFF output representation conserves bandwidth on the AER bus. The low quiescent output currents map to near zero quiescent spike rates at the output of the spiking neuron circuit. For Fig. 8 , the average quiescent spike rate (15.1 Hz) was 2.0% of the peak spike rate (752 Hz). If the output current of the single-ended architecture is fed into a spiking neuron circuit, quiescent spike rate must be 50% of the maximum spike rate, assuming that the maximum positive and negative signal excursions are identical. Given that power dissipation is dominated by the communication circuits, this would significantly increase power consumption as well.
Finally, the ON-OFF circuit design is more symmetric. First, all current gains and in the ON-OFF architecture are positive, and are implemented using nMOS current mirrors. The single-ended architecture requires positive and negative current gains. Negative gains require an extra mirroring step through a pair of pMOS transistors, which increases mismatch. Second, the positive and negative signal excursions have the same limit in the ON-OFF architecture, both being limited by
. For the single-ended architecture, the maximum negative signal is limited by the bias current while the maximum positive signal is limited by the largest current before the transistors leave weak inversion.
VII. CONCLUSION
Inspired by the functionality of visual cortical neurons, we have designed an orientation selective image filtering chip that uses an ON-OFF signal representation. The resulting circuit architecture has compelling engineering advantages over previous single-ended feedback circuit architectures for orientation selective filtering.
Our current work seeks to incorporate this chip into a multichip functional model of the primary visual cortex. Each chip contains an array of neurons, all selective for the same orientation but different image locations. Sets of chips implement hypercolumns of neurons selective for different orientations. Because both input and output are AER encoded spike trains, this network will be able to include feedback interactions, such as competition between orientations to enhance orientation selectivity [46] . The orientation selective neurons may also be used in building neurons selective along other stimulus dimensions, such as binocular disparity and direction of motion. Because the network includes a rectifying nonlinearity, it may also be useful in modeling responses to second-order stimuli using a "filter-rectify-filter" model [47] .
