We describe the implementation of a vision system based on a hardware neural processor. The architecture of the neural network processor has been designed to exploit the computational characteristics of electronics and the communication characteristics of optics in an optimal manner, thus it is based on an optical broadcast of input signals to a dense array of processing elements. The vision system has been built by use of a prototype implementation of a neural network processor with discrete optic and optoelectronic devices. It has been adapted to work as a Hamming classifier of the images taken with a 128 ϫ 128 complementary metal-oxide semiconductor image sensor. Its results, performance characteristics of the image classification system, and an analysis of its scalability in size and speed, with the improvement of the optoelectronic neural processor, are presented.
Introduction
Neural networks can be widely used in industrial vision systems as they can deal with many image classification and pattern recognition tasks. 1,2 A neural network consists of a set of simple processing elements with a high degree of interconnectivity between them. The processing elements or neurons are connected by weights that can be adapted to improve performance. The computation strength of a single processing element is small, but when huge numbers of them work together, the result is a powerful machine that could be suitable for tasks such as pattern recognition and classification. 3 In most cases neural networks are emulated by software, but software implementations are often insufficient to meet the real-time requirements of many industrial vision applications. To alleviate this problem, hardware platforms are used that are capable of increasing the speed compared with conventional digital processors based on the Von Neumann architecture. 2 Most of the commercial neurocomputing hardware, embedded neurosystems for special applications, PC accelerator cards, and neurocomputers are built by use of neural network chips. These chips have been designed to carry out the basic operations performed by neural network algorithms in parallel; in other words, broadcast the input, multiply, and add. 3 The operation speed increases as the number of neural operations that can be done in parallel increases, along with the number of processing elements that make up the neural processor architecture. In a neuroprocessing hardware system, the number of possible interconnections increases with the square of the number of processing elements within the neural system. It is desirable to build architectures composed of hundreds or thousands of processing elements with massive interconnections between them. It can be inferred that one of the key issues for the construction of hardware architectures for neural networks is interconnection and parallel processing. 4 The main drawback of wired microelectronic neural network chips is that they have a low interconnection capacity, so they are difficult to scale up in a number of processing elements. The use of optical techniques to implement interconnections in neural network hardware architectures appears to be an attractive alternative. 5 The advantages it hopes to offer over electronic interconnections are massive parallelism, speed, and cross-talk-free interconnections. 6 Although there are potential benefits of optical interconnections for hardware neural networks, 5, 6 there appear to be only a few optoelectronic neural processors that are shown to have applications, but none are in use. Most of them are based on an optical vector-matrix multiplier, such as the first optoelectronic Hopfield neural network proposed by Farhat et al. 7 In these systems, input is introduced into the optical processor by a modulated one-dimensional or two-dimensional source of light. The input beam intensities are individually multiplied by the weight matrix mask [usually accomplished by use of a spatial light modulator (SLM)], and the resulting optical signals are distributed to the output plane, in which an array of optical detectors add their contribution to form the output. 5, 7, 8 Some problems encountered with these systems are optical alignment and interconnection weight reconfiguration and assignment with a SLM. Although SLM technology has advanced over the past few years, 9 many demonstration systems perform associative memories in which the weight mask is fixed.
As mentioned above, the basic operations performed by neural network algorithms are to broadcast the input, multiply, and add. 3 Multiplication and addition can be carried out much easier by electronics than with an optical system as all present-day computing systems demonstrate. On the other hand, optics has performed better at massive communication at high speed, as demonstrated by present-day communication systems. To our knowledge, only a recent application demonstration system of an optoelectronic neuroprocessor has managed this and it uses optics to communicate and electronics to process; it is the optoelectronic neural-network scheduler described in Ref. 10 . The optoelectronic architecture basically consists of a set of winner-take-all (WTA) networks with a regular interconnection pattern. The particular characteristic of the system is that there is a constant weight associated with each interconnection, so the presence of the SLM is not necessary. The authors have improved the system by modifying the electronic part, 11 its main disadvantage being that the optical system has been designed particularly for optimization tasks.
Lamela et al. 12 described a novel optical broadcast architecture for neural networks. It is a hybrid optoelectronic architecture that does not attempt to compute in the optical domain. Our intention is to exploit the communication strength of optics and the computational strength of electronics in an optimal manner. Interconnections are carried out in the optical domain whereas interconnection weight is assigned to the electronic domain. Thus, with this approach, reconfigurable optical interconnections are not required and the high fan-out and massive parallelism of optical beams are exploited.
Here we focus on the description of a vision system that uses an optical broadcast neural network as a processing core. In Section 2 we describe the conception of optical broadcast hardware. In Section 3 we give a description of the design of the optoelectronic neurons that make up the architecture. In Section 4 we describe the neural network model (a Hamming classifier) used in the vision system and its implementation based on an optical broadcast neural network architecture. In Section 5 we focus on the whole image classification system comprised of an image sensor, a memory to store the sample pattern, a controller, and a core optoelectronic neural processor. Results are presented in Section 6. In Section 7 we present an evaluation of the projected performance of the system, compared with pure electronic implementations. Conclusions and further research are presented in Section 8.
Optical Broadcast Architecture for Neural Networks
The basic operation of one neuron in a neural network is to provide one output that is a nonlinear function of weighted inputs. 3 Our proposed optoelectronic hardware architecture 12 is composed of a set of weight up and accumulate neurons that implement that operation in a time-multiplexing scheme. The neurons are grouped in cells, and all the neurons in a cell share the same time-distributed input. The whole architecture ( Fig. 1 ) is made up of K cells with M neurons in each.
The main feature of our architecture, compared with electronic architectures, is the global interconnection performed by use of a special holographic diffuser that efficiently broadcasts the input to all the neurons in a cell. By attaching these cells one on top of another into noninterfering planes, parallel processing of the input data is possible.
The main difference with this system compared with other optoelectronic neural network hardware implementations is the time multiplexing of interconnection weights as opposed to spatial multiplexing, i.e., there is one input in one time slot that is broadcast to all the neurons in a cell. Figure 2 is a block diagram of one optical broadcast cell. The cell works as follows. First, all the neurons are cleared. The operation cycle of a cell is divided into time slots; in the first time slot the first input is introduced and optically distributed to all the neurons. Each neuron executes the product of the first input with the corresponding interconnection weight and stores the result. In the next time slot the second input is broadcast, multiplied by the interconnection weight and added to the previous result. At the end of the operation cycle all the inputs have been introduced and the output of the cell is the product of the input vector and the interconnection weight matrix. These outputs can be connected to different hardware blocks, such as threshold electronic circuits or WTA circuits, which suppress all output other than that whose initial input was the maximum.
This architecture is essentially a hybrid (opticalelectronic) vector-matrix multiplier in which communication fan-out is done optically and the interconnection weight assignment and other neuron operations are done electronically. The optical timemultiplexing scheme allows for a number of detectors to be minimized; only one per neuron is needed (M detectors per K cell), a number of optical emitters to be minimized [only one per cell is needed (K)], and the design of the optical interconnection device to be simplified as well to facilitate optical alignment. Neural system synchronization could also be carried out by optical broadcast of the clock signal.
Optoelectronic Neuron Designs
The input vector is distributed sequentially to all the neurons in the cell by means of an optical signal. The optoelectronic neuron must multiply each input with the corresponding interconnection weight and accumulate the result until the entire input vector has been introduced at the end of the operation cycle. The basic neuron scheme that implements that operation 12, 13 is presented in Fig. 3 . The light pulses that reach each neuron are converted into current pulses by the photodetector. An analog multiplexor, controlled by a corresponding interconnection weight, connects the detector or not (binary weights) to a capacitor that acts as the storage element. In this way, the product operation is implemented by the analog multiplexor and the accumulation function of the neuron is implemented by the capacitor [Eq. (1)]. Additionally, there is a switch controlled by the clear (CLR) signal that resets the capacitor at the beginning of an operation cycle.
Equation (1) summarizes how a neuron works. It represents the increase in the voltage of the capacitor in a time slot that is proportional to the product of the input (I) and interconnection weight (W) and the design parameters of the system. These are the values of the storage capacitor (C), the optical power that reaches the detector (P), the responsivity of the detector (R), and the time slot ͑⌬t͒. For binary unipolar inputs and interconnection weights, if input I is 0 the voltage in the capacitor does not increase because no light hits the detector and no photocurrent is generated; if input I is 1, the voltage in the capacitor increases only if the interconnection is 1 because in this case the detector is connected to the capacitor:
(1) Figure 4 shows temporal processing of a neuron in the architecture. The input vector is introduced into an operation cycle. The operation cycle is divided into time slots and each element of the input is introduced into one time slot. Figure 4 shows an example of evolution of the signals for binary input and binary interconnection weights; the input and the interconnection weights of one processing element are shown. The result of the accumulation can be read at the end of an operation cycle.
The neuron design described above allows only binary input patterns and binary interconnection weights with values {0,1}. Higher resolution of inputs and interconnection weights can be obtained by proper modulation inside a time slot; inputs can be codified as pulse-frequency modulated signals and interconnection weights as pulse-width modulated signals. 14 In this way we can exploit the high bandwidth of the optical emitters and interconnections. Figure 5 shows the waveforms for a highresolution operation of the optoelectronic weight up and accumulate neurons. In Fig. 5 there are six oscillograms that represent the behavior of one neuron with different inputs and interconnection weights. Inputs and weights are in the range (0,1). The maximum resolution of inputs is limited by the ratio between the width of the pulses of the optical emitter (laser diode) and the time slot associated with one input; this is 6 bits in the example given. The resolution of interconnection weights is limited by the weight storage memory, which is a digital memory with an 8-bit word length in this first prototype. The oscillograms at left represent the evolution of the voltage ͑V C ͒ in the accumulation capacitor in an operation cycle with four inputs (four time slots); the oscillograms at right span the duration of a time slot. We observed that the final voltage in the capacitor is proportional to the accumulative product of the inputs and their corresponding interconnection weights (see Fig. 5 ); the increase of the voltage in a time slot is
where C is the capacity of the storage capacitor, P is the optical power that reaches the neuron detector, R is the responsivity of the photodetector, and T XϫW is the time that both input (I) and weight (W) signals are at a high level, which is proportional to the product of the input and interconnection weight. For the basic neuron design described in Fig. 3 and the input and weight data represented as the pulsefrequency modulated and pulse-width modulated signals, respectively, we introduce high and variable resolution multiplication without the use of electronic multipliers.
It is also possible to design the neuron circuitry to allow bipolar weights. This particular question is of interest compared with traditional optical vectormatrix multipliers based on the modulation of input beams by means of a SLM. Because of the unipolar nature of light, it is necessary to duplicate the system for excitatory and inhibitory interconnections 7 or resort to more complicated techniques such as polarization control. 15 In our optical broadcast architecture, as interconnection weight assignment is implemented in the electronic domain, we can modify the neuron circuitry to allow for bipolar interconnection weights. One example of this is the scheme presented in Fig. 6 . The XOR function computes the sign for excitatory or inhibitory contributions and the capacitor, as a storage element, increases or decreases its voltage according to the result of the multiplication operation. Oscillograms showing how this scheme works are given in Section 4.
Description of the Optoelectronic Hamming Network

A. Neural Network Model
The neural network hardware model we chose to implement the image identification system is a Hamming classifier. It has been demonstrated as the most efficient classifier for binary patterns compared with other content-addressable memories such as the Hopfield network. 16 It comprises two layers, 3 as represented in Fig. 7 . The first layer consists of as many nodes or neurons as the number of different classes that can be classified (P); one pattern is assigned for each processing node. The inputs are the input image pixels (N). The sample pattern pixel values are stored in the interconnection weights for each processing node. The output of each processing node is the distance between the input image and the stored pattern, the strongest response of a neuron, the closest to the stored pattern. The second layer of the Hamming classifier receives the matching scores from the first layer. Its function is to suppress the values at the output nodes except for the output node of the first layer that was initially the maximum.
B. Optoelectronic Hamming Classifier
The block diagram of the first prototype implementation of the optoelectronic Hamming classifier is presented in Fig. 8 . The calculate matching scores layer was implemented based on our optoelectronic architecture. The first prototype is composed of four processing elements, so this allows a choice to be made among four different classes ͑P ϭ 4͒. The MAXNET layer has been implemented by use of an electronic circuit based on the circuit proposed in Ref. 17 .
The neural network model we chose to implement the image classification system works with binary patterns for which input and interconnection weight values are ͕Ϫ1, ϩ1͖. To calculate the matching score between the input and each of the sample patterns we must determine the scalar product of the input pattern and the reference pattern for each processing element. This operation can be carried out by use of the neuron circuitry presented in Fig. 6 .
The system works as follows: the input image pixels are multiplexed in time and optically distributed to all neurons. Each neuron receives each input as an optical signal, the photodetector provides a photocurrent proportional to the optical power, which is converted into a voltage level by a transimpedance amplifier and then threshold, so that the input to the neuron is binary. The multiplication of input and interconnection weight is carried out by a logical XOR that gives a result of 0 if the input and interconnection weight (corresponding pixel value of the sample) are equal and 1 if they are different. The storage element is a capacitor that increases its voltage if the product is 0 or decreases it if the product is 1. The amount of charge that increases the voltage in the capacitor is controlled by the current source I char (Fig.  6) , which is connected when the product is 0. The amount of charge that decreases the voltage in the capacitor is controlled by the current source I dis , which is connected when the output is 1. This neuron scheme provides excitatory and inhibitory connections and also controls the value of the interconnection weights by controlling the current sources. This neuron scheme allows the product of an input pattern and a reference pattern to be found. At the end of an operation cycle, when the whole input vector has been sequentially presented (N time slots), the voltage in the capacitor is proportional to the number of matching elements between the input and the corresponding reference pattern.
To observe the waveforms of the neurons we use the patterns presented in Fig. 9 as examples ͑N ϭ 8 ϫ 8͒. Each of the four diagrams presented in Fig.  10 corresponds to a different neuron. Waveform 1 is the CLR signal, common to all neurons. Waveform 2 is the input pattern, common to all neurons. Input pixel values for the optoelectronic neural network are presented sequentially so input images are read from left to right and from the top down. In the four oscillograms it is obvious that the input pattern is A. Waveform 3 is the reference pattern; A for the oscillogram in Fig. 10(a) ; E for the oscillogram in Fig. 10(b) , C for the oscillogram in Fig. 10(c) , and (negative A) for the oscillogram in Fig. 10(d) . Waveform 4 is the output product. Waveform 5 is the voltage in the storage capacitor that increases when the input and reference patterns match and decreases when they do not. We observed that at the end of the operation cycle the voltage in the capacitor is proportional to the number of pixels in the input pattern that are equal to those in the reference pattern.
The WTA electronic circuit is presented in Fig. 11 . It is similar to the circuit proposed in Ref. 17 , but off-the-shelf bipolar discrete transistors are used instead of a complementary metal-oxide semiconductor (CMOS) integrated circuit. The circuit has four inputs M i (analog voltage levels) and provides four digital outputs O i ; 0 for the highest input voltage and 1 for the others. The circuit is made up of four identical sections, one per processing node. Each section i, which is basically a voltage buffer, consists of a differential amplifier (Q i1 and Q i2 ) with a mirror active load (Q i3 and Q i4 ) and a voltage follower ͑Q i5 ͒. As all the voltage followers share the same output, the voltage ͑V max ͒ always follows the maximum input voltage at the input node ͑M i ͒. Transistor Q i5 of the winner section remains ON, while the other three are OFF. Each comparator ͑A i ͒ provides a level of 0 at its output ͑O i ͒ if it matches the winner section. If not, the level of output is 1. In the diagrams presented in Fig.  10 , we observed that at the end of the operation cycle node 1 is the winner.
Vision System Description
The vision system has been designed to capture an image with a 128 ϫ 128 pixel CMOS image sensor to compare it with a set of stored patterns and to provide an output that indicates which pattern best matches the input. The block diagram of the system is presented in Fig. 12 . 
A. Optoelectronic Hamming Classifier
The core of the vision system is the optoelectronic neural network described in detail in Section 4. It can be deduced that the number of different classes we can classify depends on the number of optoelectronic processing elements that are implemented in the system, which is four in this prototype ͑P ϭ 4͒. The size of the input vector can be configured, we just need to change the number of time slots within an operation cycle. We have tested up to N ϭ 64 ϫ 64 pixel input images, limited by the size of the address bus in the control system.
B. Memory Block
The reference patterns that are to be compared with the input image are stored in a static random-access memory. The reference patterns that define each class are presented in Fig. 13(a) . These gray-level patterns are threshold and the resolution is reduced to 64 ϫ 64 pixels. The resulting images [ Fig. 13(b) ] are stored in the memory of the vision system.
C. Image Sensor
The input image is captured by a CMOS image sensor ULL128. 18 The sensor specifications are resolution of 128 ϫ 128 pixels, array size of 4 mm ϫ 4 mm, supply voltage of 5 V, output swing of 3.5 V, and responsivity of 17 V͑͞lx s͒. The CMOS sensor was designed with the photosensors separate from the storage elements so that it also works as a random readout analog memory with storage time of the order of tens of seconds.
The sensor array has three operation modes controlled by the controller. The reset mode must be activated before taking a new image. The discharge mode must be activated to take a new image; controlling the time in the activated mode means that the exposure time can be controlled. The readout mode: the CMOS sensor stores the last captured image and the image pixels can be read randomly. The sensor works as a random-access analog memory of the image pixel values.
Because the optoelectronic Hamming network classifies binary patterns, the analog output voltage from the CMOS sensor is thresholded using a comparator. The binary output is the input for the optoelectronic Hamming classifier.
An example of a 64 ϫ 64 pixel input image captured by the camera is presented in Fig. 14(a) . The resolution of the image has been set to 64 ϫ 64 because we have only 12 pins available in the microcontroller (8051F226) to provide the pixel address. The camera has been connected so that we can read pixels in the even rows and columns of the sensor array. Figure 14(b) shows the threshold image that is the input for the optoelectronic Hamming classifier.
D. Control and User Interface
The system works in two modes. In the configuration mode the controller communicates with a PC by means of the RS-232 standard. The user can select the resolution of the input image and the reference patterns; the reference patterns can be images captured by the CMOS sensor or any binary image stored in the computer. Once the patterns are selected they are stored in the memory block. In the operation mode, first the image is captured, then its pixel values are addressed sequentially along with the content of the memories. At the end of the operation cycle, when all the image pixels have been presented, the controller reads the output provided by the optoelectronic classifier. We observed that the control signals and the information managed by the controller do not depend on the size of the Hamming classifier.
Vision System Results
Here we describe the results obtained with the vision system described in Section 5. A picture of the whole system is shown in Fig. 15 . Figure 16 shows the results obtained by the vision system when the reference patterns were the ones presented in Fig. 13 and the input image was the one presented in Fig. 14(b) . The first waveform shows the beginning of the operation cycle and the second waveform its end. The next four waveforms represent the evolution of the neurons' activation throughout an operation cycle for the input image presented in Fig. 14(b) . The operation cycle is divided into N ϭ 64 ϫ 64 (4096) time slots; one slot per input image pixel. The next four waveforms correspond to the output of the WTA circuit. It is obvious that, at the end of the operation cycle, the higher activation corresponds to the pedestrian cross- ing traffic signal, which means that the image captured by the camera has been correctly recognized.
Projected Performance of the System Versus Electronics
Here we present an evaluation of the projected performance advantages of our system compared with pure electronic neural systems. Electronics is a mature technology that has provided many implementations of specific neural network processors. Many commercial neural chips emerged in the early 90s 19 ; most of them are discontinued 19 -21 but new chips, evolved from previous implementations, have appeared recently. 22, 23 Table 1 summarizes the characteristics of these chips. The parameters shown are the number of neurons ͑N N ͒ or processing elements (PE) that compound the hardware neural network architecture; the number of connections allowed per processing element (N W per PE), and the speed, measured in connections per second (CPS), for which a connection means a multiplication and an addition. These commercially available neural chips have been Comparing these systems, we can determine how electronic neural system implementations have evolved in the past decade. However, this evolution is not surprising if we compare it with the evolution of conventional digital processors. 25 Their operation speed has increased by approximately 2 orders of magnitude, following the same path as conventional digital processors. Other important parameters for neural networks, such as the number of neurons or processing elements and the number of weighted interconnections per neuron, have not been increased significantly over the past decade. Pure electronic neural products have been designed to perform specific tasks faster and at lower power than neural network algorithms running on general-purpose hardware. Digital chips, such as the VindAx processor, 23 are good candidates for the implementation of small neural network classifiers, for which portability and real-time operation are application specifications. An analog neural network, such as the ACE16K chip, 22 is a good solution for real-time, lowlevel vision applications, but it is not useful for highlevel vision applications because of its limited interconnected architecture. It seems that pure electronic neural hardware cannot cope with large networks composed of a large number of neurons and a large number of interconnections per neuron.
It is also important to mention the fact that, as CMOS technology improves, so does the performance of transistor speeds and densities, but this is not the case with the chip interconnections. 26 As neural network hardware architectures need to be composed of a large number of interconnections, pure electronic neural network implementations will be more difficult to scale up than other electronic processing architectures. Optical interconnections would help electronic neural systems to scale up the number of processing elements, number of interconnections per processing element, and the speed. In particular, optical interconnections have been shown to be the leading appraoch in a multigigahertz frequency range long-distance (chip scale) signal distribution, such as clock distribution networks. 6, 27 The main advantages are the reduction of jitter, skew, and power consumption. 6 The optoelectronic architecture we propose for neural networks is based on these principles; each cell in our system is composed of a signal source that is broadcast to many different processing nodes, so communication will be faster and use less power than in a wired network.
Our system can readily be scaled up to a large number of neurons by simply increasing the number of detectors and replicating their attached electronics. A foreseeable system would comprise 100 cells with 100 processing elements per cell. In a scaled integrated system of this nature, the memory of interconnection weights should be physically distributed among the neurons. Thus, local connections would be electronic, as their performance appears to be sufficient, 27 whereas global interconnections would be optical. Preliminary work 28 centered on the integration of electronic circuits in a standard CMOS process, and the fabrication of an efficient optical interconnection element with volume holograms, suggests that we would be able to package such a system-comprising 100 cells with 100 neurons each-into a cube with sides measuring 15 mm. 28 Besides this, the processing speed is now limited by the response time of the large area detector used in the prototype and the discrete component electronic pixels. Faster speed has been achieved in a recent optoelectronic neuron design with a bandwidth of up to 150 MHz. 29 Higher bandwidths, in the gigahertz range, are currently being tested. Table 2 shows a summary of performance parameters of existing commercial neural systems and our optoelectronic system. The parameters in bold type are those that have already been demonstrated.
Conclusions and Further Studies
We have built a vision system based on a CMOS image sensor and an optoelectronic broadcast neural network, configured as a Hamming classifier. We have shown the ability of the system to capture an image, 64 ϫ 64 pixels in size, compare it with four reference patterns, and provide an output that indicates which of the reference patterns best matches the input image. The number of classes into which the input pattern can be classified depends merely on the size of the optoelectronic neural network, as the control signals will remain the same if the system is scaled up. As shown by the evaluation of the projected performance, our system could be used for large neural network implementations that cannot be carried out by purely electronic systems. The main advantage of optical interconnections in an optoelectronic classifier is that our system is potentially scalable to a large number of neurons and interconnections as discussed in Section 7. In Table 2 we have compared the performance parameters of our system with recent electronic neural network implementations. Although our prototype implementation is composed of four processing elements, we have shown a number of interconnections per neuron equal to 4096 (for a 64 ϫ 64 pixel pattern). This number is limited by the memory size and not by the hardware architecture, as in the ACE16K chip. 22 The operation speed of our optoelectronic classifier for 150᎑MHz bandwidth 29 is 9 ϫ 10 6 classifications per second for a 16-element pattern; we used this pattern size for comparison as it is the maximum size of the patterns that can be managed by the VindAx processor in its current implementation. 23 If the size of the patterns is increased, the classification speed is reduced proportionally. For patterns of 128 ϫ 128 elements (maximum image size for the ACE16K chip), our vision system would perform them at a classification rate of 8800 frames͞s.
The classification speed of our system compared with both electronic processors in Table 2 is faster, but the main advantages are that (1) it implements a higher number of neurons compared with the digital neural processor, so the classification is made between a higher number of classes and (2) it implements a higher number of connections compared with an analog neural processor, so high level vision application can be implemented. Assuming a scaled system based on the optical broadcast architecture with 100 cells and 100 neurons per cell 28 and the operation speed (for 150᎑MHz bandwidth 29 ), measured in connections per second, we obtained 1.5 ϫ 10 12 CPS; much faster than the electronic neural processors presented in Table 1 . Our recent effort is on the implementation of a scaled and compact prototype 28 and on the improvement of the vision system capacity. 
