Abstract -The architecture of an active resistive mesh containing both positive and negative resistors to implement a Gaussian convolution in two dimensions is described. With an embedded array of photoreceptors, this may be used for image detection and smoothing.
of an active resistive mesh containing both positive and negative resistors to implement a Gaussian convolution in two dimensions is described. With an embedded array of photoreceptors, this may be used for image detection and smoothing.
The convolution width is continuously variable by 2:1 under riser control. Analog circuits implement a 45X 40 mesh on a 2-pm CMOS IC, and perform an entire convolution in 20 ps on applied images.
I. INTRODUCTION H ARDWARE capable of sensing an input in two dimensions and processing it in parallel to obtain results in real time is of great interest in applications such as low-power compact image recognition systems. In digital signal processors today, a 2D input from a sensor is first scanned and quantized, and subsequently processed using pipelined parallel algorithms to obtain a fast throughput rate [1] . The data at each grid point in the 2D input, corresponding to one pixel in the case of a sampled image, serially enter this signal processor and flow through it at some usually fast clock rate. A substantial increase in throughput may be obtained over this signal flow rate by using simultaneous processing per pixel, particularly if the signal fan-out is eliminated by not digitizing the input but retaining it as an analog quantity. This is how signal processing takes place in natural biological systems [2] [3] [4] . synthesis is guided by experience, ingenuity, and taste, the approach is ad hoc and limited in its generali~, but when successfully executed, it may offer a savings in power and enhancement in speed by orders of magnitude over the digital approach [5] . The input to an analog signal processor is some current or voltage, the output some other voltage or current determined by the laws of physics governing the circuit. The early analog computers were built on this principle, but being composed of building blocks with quite general functions, they were not very efficient in hardware for massively parallel tasks.
Translinear integrated circuits are one well-known example of an efficient use of hardware to embody complex nonlinear algorithms, although usually for scalar or onedimensional array inputs. They achieve hardware efficiency by exploiting transistor device physics rather than from complex building blocks such as operational amplifiers; they are also hardwired to accomplish a specific task [6] , [7] . Our work deals with a class of circuits suited to simultaneous signal processing in two dimensions also using processing at the transistor level.
II. IMAGE SMOOTHING USING SIMULTANEOUS 2D
SIGNAL PROCESSING This section will discuss the algorithm and architecture of a particular image processing function we have implemented for potential use in compact machine vision systems [8] . problems, the contours in a sheet which also has a continuous leakage to ground will decay in a characteristic fashion in response to a voltage applied at a single point.
The spatial rate of decay depends on the leakage conductivity to ground relative to the lateral conductivity. This decay function may be thought of as the spatial impulse response of the leaky resistive sheet, or, equivalently, its convolution kernel; the potential contours in response to multiple-point stimuli will then be determined by linear superposition. Consider, for example, a one-dimensional discrete version of the leaky resistive sheet composed of a uniform linear mesh of resistors R 1 with resistors RO from every node to ground (Fig. 1) . In response to a current excitation at one node, the resulting voltage dis-. tribution on the mesh decays n nodes away from the excitation according to an exponential function exp(-nR~/Ro) [16] . This convolution kernel differs from a Gaussian in two important ways: it has a slower decay at its tails, and the exponential on either side of the excitation meet at the center to produce a CUSP (Fig. 1) . The discontinuity in derivative at this point would produce undesirable results when this function is applied to a noisy image and then followed by edge enhancement. The mesh must therefore be modified to produce a characteristic function which better resembles the flat-topped Gaussian at the point of excitation.
Obtaining a practical realization of this mesh was one of the key contributions of our work.
C. An Active Resistive Mesh Implementing
Gaussian Convolution
We first qualitatively examine why the resistive mesh in the previous example produces a cusped convolution kernel, and how it must be modified.
An indirect procedure for synthesizing the desired network is then described, followed by methods to extend it to two dimensions.
The spatial derivative of voltage at a point in a resistive sheet or discrete mesh specifies the potential gradient or the electric field there. According to the point form of Ohm's law, J = uE, a current injected at a point (assuming the point has nonzero extent, so that the current density there is not infinite) on a resistive sheet with leakage to ground will produce some nonzero electric field (E) there, and therefore a nonzero potential gradient. A nonzero J may produce a zero E only if o -+% which implies that the sheet must appear perfectly conductive at the point of injection.
If a negative resistance is introduced to locally neutralize the dissipation in the sheet, while maintaining the dissipation across the large scale, a convolution function may be obtained with a flat top and decaying tails. It is plausible to achieve this in a discrete resistive mesh by introducing negative resistors not between every node, because that would simply modify the value of I?l, but between every other node, or perhaps even straddling several nodes. Investigating this numerically, we found that a mesh implementing a convolution of the desired shape could be obtained using negative resistors of a certain value connecting nodes with their second nearest neighbors.
We also came upon an alternative procedure to synthesizing the same mesh, based on the theoretical work relating to the optimal smoothing of images. This is now described.
Poggio et al. [9] have analyzed how to smooth samples , -@< j <~, of a noisy function to best estimate the derivative if the noise were not present.
They seek a fitting function U(x) with continuous first derivative which interpolates the sample points~with a least-mean-square difference, but with the constraint that the derivatives of U(x) are not allowed to fluctuate excessively to obtain the least noisy estimate of the actual derivatives of the sampled function. This is expressed as the problem of minimizing an energy functional E, defined as the mean square difference between the interpolating function and the samples, subject to a penalty on excessively large second derivatives.
The strength of the penalty is controlled by a parameter A, called the regularization parameter:
It is shown that the U(x) minimizing E in (1) is obtained by convolving~. with an almost exactly Gaussian kernel, and the width of this kernel increases with A. We may use this result by exploiting a fundamental connection between the minimum of an energy functional and the operating point of a circuit. It is known from circuit theory that Kirchhoff's laws and the constituent relations of the components drive a network to a state of minimum energy dissipation, so it is reasonable to construct a network whose energy dissipation is described by (l). The network equations may be obtained directly by setting the derivative of the right-hand side of (1) to zero.
Using a discrete estimate of the second derivative in (l), we get j j where~. = U(x = j). This is a quadratic form, and therefore has a unique minimum where dE/dLJ = O for all j, so o=2(~-~)+A;~(U+l+q.1-2q)2 for all j. 
, the desired 2D convolution may be obtained by driving an array of lD meshes parallel to the y axis with the matrix of sampled photovoltages, and an identical array of lD meshes along the x axis with the matrix of buffered outputs from the first array. This is not very efficient in hardware, because each mesh must have independent active circuits to produce the negative resistances, and an intermesh buffer must be used at every node.
Another possible implementation on a 2D rectangular grid is to connect every node to its four nearest neighbors oriented 9(P apart with resistors RI, and the four second nearest neighbors at the same orientations with resistors -R2. grid, but not so on a hexagonal grid which inherently possesses a circular symmetry. The image must also be sampled on a hexagonal grid for compatibility with the mesh, "which now consists of equal resistive connections 6@ apart in orientation to nearest and second nearest neighbors.
A hexagonal grid affords the greatest spatial sampling efficiency in the sense that the least photoreceptor sites will attain a desired coverage of the image [181, and the fewest network elements will yield the desired circular symmetry (Fig. 3(a) ). The latter was verified in the simulated convolution kernel of this 2D network ( Fig.   3(b) ). We required the kernel width to be variable by a factor of 2 under user control.
That the convolution width depends on the ratio RO /R1 was known from the synthesis procedure, but the strength of this dependence was not, Simulations of the network showed a weak dependence (Fig. 4) () The mean network voltage at a given level of ,photosensor illumination will change with the convolution width:
for example, when the convolution width is decreased by making all RO large, the mean voltage will also increase because the buffered photocurrents will flow into larger resistors. This will impose the unnecessary demand of a large common-mode range of operation in active circuits such as RO. We used a scheme to normalize the network inputs by slaving the buffer transconductance of the logarithmic photoreceptor proportionally to RO, so as to maintain a constant mean network voltage at all illuminations.
C. Network Resistors
The 5-kfl resistors for the nearest-neighbor internode connections in the network were implemented using p-well diffusions.
A Gaussian convolution kernel would be obtained in spite of tolerances in the p-well resistivity as long as the relative magnitude of the positive and negative resistors remains 1:4. To make this ratio on the chip depend only on geometry, both RI and Rz were implemented in the same material, p-well diffusion, and a negative impedance converter (NIC) was attached to R2 to invert its polarity.
Our NIC implementation (Fig. 7) consists of the combination of a voltage follower and current inverter. The op-amp-based followers at each end of Rz impose across it the potential difference at their inputs, and the resulting current flow, forced through the Class-B type output stages, is sourced from or sunk into the positive or negative power supply. Current mirrors in series then apply the same current at the input leads of the followers, inverting the sense of current flow as perceived at the network nodes. A negative resistance -R2 is presented to the network.
Six negative resistors converge on every node in this hexagonal mesh. Six different NIC'S are, however, not required at each node; instead, a single NIC placed at the node after the confluence of the resistors will simultaneously make them all negative (Fig. 7(b) ). The dc gain in a simple five-FET op amp was large enough to obtain accurate inversion of the resistor 1-V characteristics and eliminate the crossover nonlinearity in the Class-B stage. The NIC at every node thus contained only 11 FET's.
D. Layout Considerations
A key concern in the implementation of this network as an IC is whether the usual two layers of metal and one of polysilicon can implement the starlike fan-out of interconnections emanating from every node. We proved to ourselves at the outset of this work that this was possible. A hexagonal grid was obtained by horizontally staggering successive rows of cells, and their interconnections implemented on a Manhattan geometry ( Fig. 8(a) ). All three available layers of interconnect were used to create abuttable cells. The power, ground, control, and output rails ran parallel to these rows from edge to edge of the chip.
A unit celI, including its portion of interconnect, measured 170 X 200~m in 2-pm CMOS (Fig. 8(b) ). The area of the photoreceptor collector-base junction, the blank rectangle in the cell layout at the lower left, measured 56X 24 pm. No wires were allowed to traverse the photosensor because metal would absorb the incident light. Parasitic photocurrents generated in the source/drain junctions of other active circuits would have negligible effect on the voltages at the low-impedance nodes there. We observe finally that the active circuits occupied only 57% of the cell area, a measure of the toll exacted by the richness of interconnect in this circuit.
E. Output Means
This convolution network accepts a 2D input in the form of an incident image, does 2D signal processing across the resistive mesh, but on a standard IC is restricted to ID output at the pins along the periphe~. The output therefore must be read at the pins (Fig. 9 ) by accessing one row of nodes at a time, and, at least in this implementation, becomes the bottleneck to the throughput rate. Addressable MOS switches were used to connect every node to output lines, and on-chip vertical bipolar transistors connected as emitter followers served as analog buffers at the pads. The speed of signal processing was determined by the relaxation time of this unclocked network, but a clock was introduced at the output to scan out the rows. To relieve this bottleneck, one can The network has 2D input, accomplishes 2D signal processing, but is forced to output results in ID.
envisage connecting several 2D computational IC'S performing a cascade of low-level vision tasks, with micro solder balls joining together matrices of pads on their surfaces, or through via holes on the back sides of the chips. This technique, originally developed for "flip-chip" mounting, is used at very high densities today to mate 2D focal plane array sensors to active substrates [22] . Once the desired data reduction has taken place at the output of the such a cascade of chips, a few high-level outputs containing image features could be scanned out in parallel on pins with no loss in throughput speed.
IV. EXPERIMENTAL RESULTS
We were able to fit a 45x 40 array of unit cells on a 7.9 x9.2-mm die, the largest die size available to us through the MOSIS foundry service. Power supplies of + 5 and -5 V were used, mainly for convenience in circuit design; the circuits could be modified with a minor effort for operation on a single 5-V supply. The fabricated chip (Fig. 10 ) contained more than a 100000 transistors and was fully functional.
The network response to optical input was measured by shining light on the exposed chip, and reading the outputs using a specially developed interface board under control of a personal computer. An array of analog column voltages along an addressed row were digitized and stored, and the smoothed output image reconstructed on the computer screen after all rows had been scanned. to 320 kfl ( Fig.  n(b) ). a mask used in place of the lid on the cavity of the ceramic PGA package. We had also made provision on the IC to measure the actual compressed signal driving the network, so that the true network function could be obtained by deconvolving it from the measured output. The convolution kernel was thus deduced from measurements of the network input and output (Fig. 12 ). It was difficult at this sampling resolution to accurately ascertain that it was a Gaussian function, but the characteristic inflection in the function as it approaches the peak value was evident. This would not appear unless the network contained negative resistors. We were able to change the full width at half maximum of the kernel by a factor of 2, from 4.7 to 9.4 pixels wide, by changing RO across its full span with the control current. The network output was most noisy at its tails at minimum RO, and we had to use smoothing in the sense of a least-mean-square fit to deduce the kernel function. Light through the pinhole nominally sampled only a small neighborhood on the chip; we moved the pinhole to points on the chip either side of the center, and found an acceptable uniformity in the response (Fig. 13(a) ), which is determined here by MOSFET matching across the extent of the chip surface [23] . The slight uptilt of the output at the ends of the measured response was caused by the edge effect when the network terminates at the chip boundaw. The uniformity across three chips was also acceptable at this sampling resolution ( Fig. 13(b) ), except for one chip where a particularly large uptilt appears. The smoothing effected by the network on a character "T" was also measured (Fig. 14(a) ), and its symmetry after rotations relative to the chip axis verified ( Fig.   14(b) ). 
A. Component Characteristics

Precautions
were required in making the measurement to compensate for the effects of the 2-W power dissipation when no heat sink was mounted on the package. This large power dissipation produced a thermal gradient across the IC, peaked at the center with circularly symmetric isotherms spreading out towards the chip boundary. We deduced this from a corresponding pattern in photoreceptor dark currents, which appeared as a stimulus to the network in the absence of an optical input. This had to be calibrated and subtracted from all measurements to obtain the true optical response. We emphasize that this relatively large power dissipation was not fundamental to the network; 75% of it was due to an unnecessarily large bias current in one building block, the control circuit for the variable resistor. A further reduction in quiescent power could be obtained by devising a voltage drive to the network nodes, because the current sources in the present implementation produce some steady power dissipation through RO, even when the chip is not illumi- 
