Abstract-
tion kernels similar to those of the Gabor filter simultaneously. Test results indicate that the chips work as expected. Future work on chip design includes fabricating chips with larger numbers of cells and two-dimensional (2-D) filters with tunable orientation and scale. On the application side, we are investigating the use of these chips in binocular vergence control of an active stereo vision system.
ACKNOWLEDGMENT
The author would like to thank K. K. Lau and K. Poon for their help in making measurements of the chip reported here.
I. INTRODUCTION
The boundary contour system (BCS) and feature contour system (FCS) combine models for processes of image segmentation, feature filling, and surface reconstruction in biological vision systems [1] , [2] . They provide a powerful technique to recognize patterns and restore image quality under excessive fixed-pattern noise, such as in specific absorption rate (SAR) images [3] .
The BCS model encompasses visual processing at different levels, including several layers of cells in visual cortex interacting through shunting inhibition, long-range cooperative excitation, and renormalization. The implementation architecture, shown schematically in Fig. 1 , partitions the BCS model into three levels: simple cells; complex and hypercomplex cells; and bipole cells. A simpler model, that does not require bipole cells and only involves cells in V1, but otherwise preserves much of the structure and properties of BCS as implemented here, is presented in [4] .
Simple cells compute unidirectional gradients of normalized intensity obtained from the photoreceptors. Complex (hypercomplex) cells perform spatial and directional competition (inhibition) for edge formation. Bipole cells perform long-range cooperation for boundary contour enhancement and exert positive feedback (excitation) onto the hypercomplex cells. Our present implementation does not include the FCS model, which completes and fills features through diffusive spatial filtering of the image blocked by the edges formed in BCS.
The motivation for implementing a relatively sophisticated model such as BCS on the focal plane is dual. First, as argued in [5] , complex neuromorphic active pixel designs become viable engineering solutions as the feature size of the VLSI technology shrinks significantly below the optical diffraction limit and more transistors can be stuffed in each pixel. The pixel design that we present contains 88 transistors, likely the most complex active pixel imager ever put on silicon. Second, our motivation is to extend the functionality of previous work on analog VLSI neuromorphic and cellular image processors for image boundary segmentation, e.g., [5] , [6] , and [8] - [12] , which are based on simplified physical models that do not include directional selectivity and/or long-range signal aggregation for boundary formation in the presence of significant noise and clutter. The analog VLSI implementation of BCS reported here is a first step toward this goal, with the additional objectives of real-time low-power operation, as required for demanding target recognition applications. As an alternative to focal-plane operation, the input image can be loaded electronically through random-access pixel addressing.
From the perspective of cellular neural networks (CNN's) [12] , [13] the architecture implementing BCS described here is interesting in two aspects, which extend the capability of conventional cellular structures with nearest neighbor connectivity. First, longrange connectivity across the bipole cells is achieved through the use of (a variant on) diffusor elements [7] , [15] which implement a diffusive kernel extending across several cells with just nearest neighbor coupling between cells. Second, directional selectivity in the response to image edge contours is achieved by mapping a vector field onto a cellular structure on a hexagonal grid, with components in three directions for each cell. More specifically, the BCS model integrates three layers of such vector fields, including simple cells, (hyper) complex cells, and bipole cells, with bottom-up and top-down interactions between layers. The full-blown BCS model [1] is sufficiently complex to make a scalable focal-plane VLSI implementation impractical, if not impossible. Algorithmic and architectural simplifications, which preserve much of the original functionality of BCS, are the subject of Section II. An analog VLSI cellular implementation in currentmode CMOS technology is presented in Section III, and experimental results from a small (12 2 10 pixel) prototype are included in Section IV.
II. MODIFIED BCS ALGORITHM AND CELLULAR ARCHITECTURE
We adopted the BCS algorithm, as described in detail in [1] and [2] , for analog continuous-time implementation on a hexagonal grid, extending in three directions u, v, and w on the focal plane as indicated schematically in Fig. 2 .
In the implemented circuit model, a pixel unit consists of a photosensor (or random-access analog memory) sourcing a current indicating light intensity, gradient computation and rectification circuits implementing simple cells in three directions, and one complex (hypercomplex) cell and one bipole cell for each of the three directions.
A. Definition and Notation
The BCS equations and architecture involve the notion of vector fields, mapped on a cellular architecture and discretized both in space and angular resolution.
For notational convenience, let subscript zero denote the center pixel and 6u, 6v, and 6w its six neighbors, depicted in Fig Similarly, the bipole vectors are denoted by B i at grid locations i, or componentwise as B j i in the three directions j . Nonvector (i.e., scalar) fields, such as input intensity, only take a subscript index for location.
B. Simple, Complex, and Hypercomplex Cells
The photosensors generate a current I i that is proportional to intensity. Through current mirrors, the currents Ii propagate in the three directions u; v, and w as noted in The bipole cell-resistive grid (Fig. 4) implements a three-fold crosscoupled directionally polarized long-range diffusive kernel, formally expressed as follows:
where K u u , K u v , and K u w represent spatial convolutional kernels implementing bipole fields symmetrically polarized in the u, v, and w directions. Diffusive kernels can be efficiently implemented with a distributed representation using resistive diffusive elements termed diffusors [7] , [15] . One key advantage of diffusor elements is that they preserve a cellular nearest neighbor topology of cell interconnectivity while implementing a long-range diffusive kernel extending across the entire array of cells. Furthermore, a linear kernel is obtained in the current domain, even though the device characteristics of the MOS-transistor circuit elements used are highly nonlinear. The capability to construct linear transfer functions through the use of exponential transconductance devices is one of the advantages offered by translinear circuits, implemented with MOS transistors operated in the subthreshold region [7] .
A 1-D linear diffusive network spanning one of three directions u is shown in Fig. 3 [7] . This network forms the basis for constructing the (6) where lat = g lat =g vert and cross = g cross =g vert .
III. ANALOG VLSI IMPLEMENTATION
The simplified circuit diagram of the BCS cell, including simple, complex, and bipole cell functions on a hexagonal grid, is shown in Fig. 5 .
A. Photosensor and Simple Cells
The image is acquired either optically from phototransistors on the focal plane, or in direct electronic format through random-access pixel addressing, Fig. 5(a) . The advantage of including a randomaccess electronic interface is modularity and expandability in the architecture. This allows, for instance, the ability to interface the chip with other stages of processing, such as a silicon retina [8] or shunting-neuron [14] preprocessing stage for contrast enhancement and dynamic range compression of the input image to BCS. Several BCS chips can be combined in parallel to increase the available image size, and a separate imager with elementary preprocessing allows a higher fill factor.
The simple cell portion in Fig. 5(b) combines the local intensity I 0 with intensities I v and I w , received from neighboring cells, to compute the rectified gradient in (1) using distributed current mirrors and an absolute value circuit [7] . A PMOS load converts the complex cell output into a voltage representation C u 0 for distribution to neighboring nodes and complementary orientations.
B. Complex Cells
A complex cell, performing local inhibition for spatial and directional competition according to (1) , is shown in Fig. 5(c) for one of 
C. Bipole Cells
Long-range cooperation is performed in the bipole layer, of which one cell in direction is shown in Fig. 5(d) . The directionally selective diffusive kernel (6) is implemented in current mode using subthreshold MOS transistors by extending the MOS equivalence between Fig. 3(a) and (b) to the structure of Fig. 4 , with three families extending in each direction with crosslinks for angular dispersion.
Voltage biases control the spatial extent and directional selectivity of the interactions, as well as the level of renormalization for the interaction between complex and bipole cells. The values for gvert, g lat , and g cross controlling the bipole kernel are set externally by applying gate bias voltages V vert , V lat , and V cross , respectively. Likewise, the constant in (1) is set by the applied source voltage V . Global normalization and thresholding of the bipole response for improved stability of edge formation is achieved through an additional diffusive network, which acts as a localized Gilbert-type current normalizer between complex and bipole cells [only partially shown in Fig. 5(e) ].
IV. EXPERIMENTAL RESULTS
A prototype 12 2 10-pixel array has been fabricated and tested. A micrograph of the tiny 2.2 22.2-sq. mm chip, fabricated through MOSIS in 1.2-m CMOS technology, is shown in Fig. 6 . The pixel unit, illustrated in Fig. 7 , has been designed for testability and has not been optimized for density. The pixel contains 88 transistors including a phototransistor, a large sample-and-hold capacitor, interface circuitry, and three networks of interconnections in each of the three directions, requiring a fan-in/fan-out of 18-node voltages across the interface of each pixel unit.
We have tested the BCS chip both under focal-plane optical inputs and random-access direct electronic inputs. Input currents from optical input under ambient room lighting conditions are around 30 nA. The experimental results reported here are obtained by feeding test inputs electronically. The three components of complex and bipole outputs of the array, together with the acquired input image, are multiplexed out using a separate address decoder.
The response of the BCS chip to two test images of interest are shown in Figs. 8-10 . For graphical clarity, the simple, complex, and bipole fields are reproduced as bars in three directions, of which the thickness indicates the measured activity in each of the three orientations.
The processing through different stages in the BCS chip is illustrated in Fig. 8 , showing the reconstructed image, the rectified gradient field, the inhibition by the complex interactions, and the excitation by the bipole interactions feeding back onto the complex cells. Fig. 9 illustrates the interpolating directional response to a curved edge in the input, varying in direction between two of the principal axes (u and w in the example). Interpolation between quantized directions is important, since implementing more axes on the grid incurs a quadratic cost in complexity.
The third example image contains a bar with two gaps of different diameter for the purpose of testing BCS's capacity to extend contour boundaries across clutter. The response in Fig. 10 demonstrates, to a certain extent, the bipole property in which short-range discontinuities are bridged, but large ones are preserved.
V. CONCLUSIONS
An analog VLSI cellular architecture implementing the BCS on the focal plane has been presented. A diffusive kernel with distributed resistive networks has been used to implement long-range interactions of bipole cells without the need of excessive global interconnects across the array of pixels. The cellular model is fairly easy to implement and succeeds in selecting boundary contours in images with significant clutter.
One area for improvement of the cellular architecture is the angular resolution, which is quantized to multiples of 60 (=3) in a hexagonal arrangement. Interpolation effectively improves resolution, as shown in Fig. 9 , but at some expense in directional selectivity of the bipole cells. In principle, it is possible to extend the resolution, without compromising directional selectivity, by incorporating additional cells tuned at different orientations in each pixel, although the number of interconnects rises sharply (quadratically) with the number of cells per pixel. One possible idea to extend the present approach to a continuous angular resolution is the use of tunable directional filters, e.g., as described in [10] . Nevertheless, the quantization effects in orientation appear to be minor in typical imagery, when viewed at a larger scale, for spatial frequencies beyond the Nyquist limit.
Experimental results from a 12 2 10 pixel prototype demonstrate expected BCS operation on simple examples. While this size is small for practical applications, the analog cellular architecture is fully scalable toward higher resolutions. Based on the current design, a 10 000-pixel array in 0.5-m CMOS technology would fit a 1-cm
