During the period of this report, we completed construction of what we believe to be the first neural network of biology-like pulsating neuron. The network consists of N=32 PUTONs (programmable unijunction transistor neurons). The coupling between neurons was realized by way of a JPL programmable synaptic weight chip. Methods for monitoring the collective firing activity and characterizing the spiking behavior of neurons in the network were also developed. A simplified circuit diagram of the network, with only one PUTON shown, is given in Fig. 1 . The mode of coupling between neuron and mutual control of firing activity in this first embodiment is via modulation of the grid voltage of the PUTON by the net input from its presynaptic neurons. Other modes of coupling, e.g. by modulating the extinction voltage will be studied in the future.
the neurons from the network, and then re-inserting it. This is accomplished by temporarily setting the weights between a specified neuron and each of the remaining neurons to zero, and then returning them to their original values.
A way of allowing the system to develop in a specific order is to include only two neurons in the network in the beginning, and then gradually include one by one other neurons into the network. This procedure can be realized by setting the weights to zero for those neurons not included in the network, and then gradually changing the relevant weights to the desired values. The order in which the neurons are added can however influence the steady-state solutions of the system.
In our experiments on the network, we focused on the phase-locking phenomenon of the network as a possible mechanism for solving optimization problems. There are, however, other dynamics of the network that can be exploited. From the experiments performed on single neurons, we know that our neuron models can exhibit chaotic dynamics. Therefore, there are other possible applications of such behavior to be explored. Further work in studying the network of pulsed neurons is to build networks that implement other mechanisms of coupling among the neurons. It can be expected that such networks will exhibit more complicated dynamics than those exhibited so far by our current network.
One goal of our work with the spiking network is to gain insight into how the order of firing of neurons can be controlled by the weights matrix and initial conditions. This facet of the work is important because it is related to the more challenging task of finding a rule or algorithm for learning in pulsating neural networks. A learning rule for pulsating neural network would have far reaching implications because it implies ability to learn, recognize, or generate spatio-temporal patterns more naturally and directly than in the limited methods developed for sigmoidal networks.
During the period of this report, two reports of our research under this grant were published:
1. Z. Wen, A. Baek, and N.H. Farhat, "Optoelectronic neural dendritic tree processing with electron-trapping materials", Optics Letters, vol. 20, pp. 614-616, March, 1995.
1. J.R. Tower and N. Farhat, "The transversal imager: a photonic neurochip with programmable synaptic weights", IEEE Trans, on Neural Networks, vol. 6, pp. 248-251, Jan. 1995.
Copies of these publications are included with this report.
JPL32x32 Synapse Chip Output Pulses
Computer Control Electron-trapping materials (ETM's) have received much attention in optical information processing, erasable three-dimensional optical memory, and optical neural networks 1 " 3 because of their many attractive properties, such as large linear dynamic range, high usable resolution, and fast response times. Recently we initiated a study of the dynamics of ETM's under sumultaneous illumination of blue light and IR light 4 and uncovered a new set of dynamics that makes ETM's uniquely suited for use in optoelectronic implementation of large-scale biology-oriented spiking (or pulsating) neural networks. In a previous Letter we demonstrated the utility of ETM's in producing dense arrays of optoelectronic spiking neurons. 5 In this Letter we describe the use of ETM's in producing optically controlled dendritic responses for optoelectronic spiking neural networks. The importance of neurocomputing with spiking neurons, in particular the role played by the neurons' dendrites, is first discussed. Then computer simulation results of dendritic responses in biological neuron and experimental results of ETM dynamics are presented. It is shown that ETM's are well suited for implementing optically controlled dendritic responses, i.e., postsynaptic responses or potentials, of the kind needed in optoelectronic implementation of large-scale biology-oriented pulsating neural networks.
In contrast to most conventional neural network models studied today, which assume that computational power stems from the massive interconnections (synaptic weights) among the large numbers of functionally simple processing elements (sigmoidal or binary neurons) and which ignore explicitly the spiking nature of biological neurons as well as the role of relative timing between action potentials, biological neurons are both functionally and structurally complex processing elements. It is reasonable to expect that this functional complexity would reflect itself in the higher-level computational power of the networks they form, and therefore development of artificial neuron models that emulate the functional complexity of biological neurons would lead to a new generation of neural networks with vastly enhanced processing power. Recent physiological experimental results strongly support these conjectures (see, e.g., Refs. 6 and 7). The functional complexity of biological neurons stems from two properties: (a) the complex dynamics of the excitable biological membrane and (b) the preprocessing conducted by dendrites on incident spike trains from other neurons. Our study in modeling the excitable biological membrane, based on a limit-cycle oscillator employing an S-shaped nonlinearity that emulates the excitable biological membrane to form what we call bifurcation neuron, shows that the bifurcation neuron model preserves much of the functional complexity of biological neurons and, particularly under periodic activation potential, exhibits complex firing modality such as integer harmonic and subharmonic phaselocked firing, period-m firing, quasi-periodic firing, and period-doubling routes to chaos, depending on the parameters of its periodic activation. 8 A crucial assumption in the above study is that dendritic tree processing produces periodic activation potential when the impinging incident spike trains are coherent. Such coherent incident spike activity occurs whenever a network of spiking neurons phase locks. Therefore dendritic tree processing plays a central role in spiking neural networks in their abilities to phase lock and to exhibit rich functional complexity.
In biological neural networks, the arrival of an action potential at a synapse site on the dendrite of a target neuron triggers a chain of electrochemical events. The result is the production of a postsynaptic potential (PSP) that alters the activation potential (membrane potential) at the target neuron's hillock, which is the trigger zone for action potential generation. The polarity, amplitude, and shape of the PSP waveform depend on the type of neurotransmitter involved in synaptic transmission, on the location of the synapse on the dendritic tree, and on the propagation from the synapse to the hillock. In an actual situation, the activation potential at the hillock of a neuron is the superposition of a large number of such PSP's produced by the barrage of ac- where n(t) denotes the trapped electron density; hit), I m (t), and I 0 {t) represent the blue light, the IR light, and the emitted orange light intensities; a and K are coefficients for trap creation and fluorescence by the blue light; and y and ß are coefficients for IRstimulated emission and trap erasing by the IR light, respectively. The dynamics of interest here is identical to the situation considered in Ref. 5 , i.e., im is static and I B (t) consists of a static term and a pulsemodulated term as follows:
where u(t) denotes the unit step function, I 0 and h are the static and pulsed blue light intensities, and T and t w are the pulse period and the pulse width, respectively. The solution for the emitted orange light I 0 (t) during the iVth period is
t -NT for NT <t<NT + t u for NT + t w < t < (N + 1)T (4)
tion potentials from other neurons arriving at any time at the synapse sites on its dendrites. The importance of dendritic tree processing was studied recently. 9 ' 10 By assuming the dendrite to be passive, it is possible for one to derive a transient diffusion differential equation (the cable equation) and to determine the response of a dendrite to current pulses injected at different positions along its length. 9 Injected current pulses represent the effect of arriving action potentials. Using commonly listed dendritic parameters, 9 we have carried out such a computation for the simplified arrangement shown in Fig. 1 (a) for current pulses of uniform amplitude injected at three synapses along the dendrite. The results, given in Fig. Kb) , show how the shape of the PSP and its rise time, peak value, latency (peak position), and decay time all depend on synapse position (i.e., position of injected current). In reality, the peak value also depends on the synaptic efficacy. We also carried out simulations that confirm that when the incident spike trains are correlated, the smoothing responses of dendrites lead to PSP's that sum at the neuFrom Eq. (4) we observe that in the emitted orange light the contributions that are due to IR-stimulated emission and fluorescence coexist. Fluorescence is manifested by sudden jumps at the onset and termination of the blue light pulses. On the other hand, IR-stimulated emission first undergoes a linear increase during the blue light pulse in the charging process and then decays exponentially after the pulse in the discharging process. An experiment has been carried out to study ETM dynamics in the above situation by using the same experimental setup used in Ref. 5 . Figure 2 shows the experimental results. In all these plots. I l = 3.98 mW/cm 2 , t w = 25 ms, and T = 250 ms. Figure 2(a) shows the ETM responses for three different levels of the static blue light intensity I 0 , whereas the IR light intensity I m is constant. In Figs. 2(b) and 2(c), I 0 remains unchanged, whereas 7 IR is varied. From Fig. 2 one observes the fluorescence jumps at the blue light pulse edges, linear buildup during the pulse, and trailing exponential decay after it, as predicted by Eq. (4). Equation (4) indicates that varying I 0 shifts the ETM response up and down, and the results shown in Fig. 2 (a) generally agree with this. The results in Fig. 2 show, however, that the heights of the fluorescence jumps, as well as the magnitude of the exponential portion of ETM response, vary with 7 IR and 7 0 used. Furthermore, it appears that the upward jump and the downward jump have different heights. The firstorder linear model of ETM dynamics given in Eqs. (1) and (2) fails to predict these subtle effects because it neglects nonlinear effects such as trapped electron density saturation and dependence of electrontrapping efficiency on existing trapped electron density. Clearly, to utilize ETM's effectively in applications that make use of their unique dynamics it is necessary to develop an improved theoretical model that would take into account the neglected effects mentioned above. We are in the process of developing such a model and verifying it experimentally; the results will be reported elsewhere. Nonetheless, we draw attention now to the similarity between the PSP's in Fig. 1 , which represent biological neurons, and the ETM responses in Fig. 2 . By varying the values of I 0 and/or Im, one can con " trol the ETM response in various ways, e.g., up and down shifts, different heights of fluorescence jumps and magnitude of exponential response, and faster or slower exponential decays. So we have a unique method for crudely emulating dendritic response in biological neurons in a fashion useful for realizing a dense optically controlled dendritic response array (or two-dimensional dendritic trees) for use in largescale biology-oriented optoelectronic spiking neural networks.
The spiking nature of biology-oriented neural networks makes studying their behavior by computer simulation impractical because of the lengthy computing time involved. In a recent paper Prange and Klar.
11 argued convincingly that the best way of studying and realizing biology-oriented neural networks is through analog complementary metal-oxide semiconductor technology rather than digital hardware. However, they showed that the number of neurons one can accommodate on a VSLI chip is limited to a hundred or so, even when submicrometer complementary metal-oxide semiconductor technology is used, because of the relatively large size of the neuron/dendrite cell. By combining ETM's with state-of-the art technologies such as smart pixel spatial light modulators and image intensifiers to form dense arrays of spiking neurons and dendritic responses, it is feasible that large-scale optoelectronic spiking neural networks of 10 3 -10 4 neurons can be realized. Such networks would be invaluable tools for modeling, studying, and demonstrating the role of synchronicity, bifurcation, and chaos in higher-level cortical functions and exploring their use in enhancing the performance of artificial neural systems.
Brief Papers. Abstract-A photonic neural processor implemented in NMOS/CCD integrated circuit technology is described. The processor performs outer-product processing utilizing optical input of the synaptic weights and electrical input of the state vector, or, visa versa. The performance of the 32-neuron, 1024-synapse processor is presented.
I. INTRODUCTION
THIS paper describes the design, fabrication, and performance of an NMOS/CCD integrated circuit which performs outer-product, neural network computations. The transversal imager integrated circuit utilizes a unique architecture which permits (1) the synaptic weights to be introduced electrically to the complete neural network in parallel, and the one-dimensional state vector to be introduced optically, or (2) the synaptic weights to be introduced optically to the complete neural network in parallel, and the one dimensional state vector to be introduced electrically. In both cases, the computation of the outer-product between the two-dimensional synaptic weight matrix and the one-dimensional state vector is performed simultaneously across the integrated circuit.
The optical input to this electro-optical neural network can thus be reduced to the one-dimensional state vector. This simplification of the optical input requirements from a two-dimensional pattern to a one dimensional pattern has significant impact on system hardware performance, and system computational throughput.
II. DESCRIPTION OF PROCESSING ALGORITHM
As background for understanding the transversal imager architecture, it is useful to discuss the motivation for the architecture, namely, the efficient implementation of the Hopfield neural network. In the basic Hopfield neural network [1] synaptic matrix is of dimension N x N. The state of network is updated according to (1) the vector-matrix multiplication:
and (2) a transfer function:
where Vi is the updated state of the ith neuron. 
III. PRIOR CCD NEURAL NETWORK APPROACHES
All of the charge coupled device neural processing integrated circuits reported to date (1) exploit the analog, rather than digital, capability of CCD's, and (2) use charge domain processing to reduce the area required for interconnections and computational circuitry. In the work reported by Sage et al. of MIT Lincoln Laboratories [2] , surface channel CCD technology and metal nitride oxide semiconductor (MNOS) technology is combined to produce circuits capable of outer product processing. In this approach, state vectors are introduced electrically, and the analog synaptic weights are produced electrically and stored in MNOS gates. The MNOS-CCD gates function to store the synaptic weight and to control the flow of charge, thereby producing the product of the input state and the stored weight. The sum of products is produced in the charge domain and then converted to the voltage domain to produce, in a Hopfield net application, the desired outerproduct result.
In the CCD neural network processor work by Chiang et al of MIT Lincoln Laboratory [3] , buried channel CCD structures are utilized for a variety of functions including the implementation of multiplying digital-to-analog converters (MDAC's). In this architecture, when applied to the Hopfield algorithm, the state vector is introduced electrically to an analog CCD register. The synaptic weights are introduced electrically to 6-bit CCD digital memories. The MDAC's perform the outerproduct multiplication of the analog state vector and the 6-bit digital synaptic weights, with the summation of products being achieved in the charge domain. The output charge packets are then converted into the voltage domain to produce the final outer product result. The CCD neural processor which is most similar to the transversal imager is that reported by Agranat, et al. of the California Institute of Technology [4] . In this work a Hopfield network is realized utilizing buried channel CCD technology. The state vector input is binary (+1,0) and is introduced electrically. The synaptic weights are input electrically or optically, and are analog. The outer product multiplication of the binary state vector and each analog synaptic weight is performed by controlled (+1 charge flow, 0 no charge flow) gating of replicas of the recirculating synaptic weights. The gated products are added together in the charge domain, forming the outer product sum-of-products. The outer product terms are then serially read out in the charge domain to an output where the final voltage output is produced. This architecture therefore utilizes a two-dimensional optical input to introduce the analog synaptic weights, and the binary state vector is introduced electrically.
IV. TRANSVERSAL IMAGER ARCHITECTURE
The transversal imager concept grew out of work conducted in the late-1970's and early 1980's in the field of programmable CCD analog-binary transversal filters [5] , [6] . The transversal imager architecture we use is shown in Fig. 1 . This signal processing imager [7] , [8] was designed to perform vector-matrix multiplication by (1) imaging the onedimensional input state vector (V) spread across the rows of the N x N image area, (2) performing the interconnect, synaptic weighting (W) at each pixel location in parallel, (3) performing the summation along each column in parallel, and (4) reading out the sum-of-products terms serially. The architecture was chosen to realize the following desirable features:
1) Optical input of the one-dimensional state vector.
2) Electrical input of the matrix weighting at each vectormatrix multiplication. 3) Simultaneous ternary weighting (-1,0,-1-1) at each pixel. 4) High vector-matrix multiplication frame rate; with frame rate dominated by serial output register readout rate. 5) High optical integration time duty-cycle; with integration time a very high percentage of frame rate. Each pixel of the transversal imager consists of two photocapacitors which see the same optical input, two stages of an NMOS digital shift register, control gates, and summation busses (E +and E~)-The ternary weighting is achieved by controlling the transfer of the photocharge integrated in the two photo-capacitors. If both photo-capacitors contribute charge to the summation plus (£+) bus the pixel weight is +1, if both contribute to the summation minus (£-) bus the pixel weight is -1, and if one contributes to the ]T while the other contributes to the £}~, the weight is 0.
The NMOS digital shift registers control the selective transfer of photocharge to the J2 + bus based on the data in the shift register. Once the selective transfer to the column ^ busses is completed, all of the photo-capacitors transfer the remaining charge to the respective column J2~ busses. In this fashion the imaged photocharge is weighted in parallel by selective charge transfer.
The column X) +and XT busses receive the photocharge and perform a nearly instantaneous summation of the column products in the charge domain. The weighted and summed photocharge from the N columns are pulled off of the N-J] andN-J^" busses and transferred into a serial output register by charge primed couplers [9] . The serial output register switches the Y.
+ and E~ signals to respective floating diffusion output amplifiers after appropriate stage delays. The J2 + and XT output amplifiers are differentially sensed to produce the desired outer product result.
The primary operating configuration is shown schematically in Fig. 2(a) . The optical input of the state vector is provided by a display, or more commonly an N element LED array and anamorphic optics, which spreads the light across the rows of the transversal imager. The state vector may be analog, but is usually confined to a binary on or off. Each of the N (LED) elements illuminates all of the photo-capacitor pairs in it's associated row. The N x 2N elements of the ternary synaptic weight matrix are introduced through multiple parallel electrical input ports.
An alternate operating configuration is shown schematically in Fig. 2(b) . In this configuration, the synaptic weights are input optically. A display with N x 27V resolution elements or a spatial light modulator with at least N x 2JV elements projects the weights. The state vector is introduced electrically with either unipolar binary, bipolar binary, or ternary weights. The power of this approach is that bipolar analog synaptic weights may be achieved with this configuration. Thus, a continuous range of both inhibitory and excitatory weights may be introduced to the outer product.
V. OPERATING CYCLE
To maximize the speed of the transversal imager the device was designed to perform simultaneous operations. The device operating cycle starts with the initiation of the electrical weight read-in through multiple digital input ports (Tl) . At the completion of the operation, the weighting and summation is performed by selective transfer of the photocharge (T2). The sum-of-product terms are then transferred into the serial output register (T3). This is followed by serial readout and development of the final outer-product terms (T4).
The optical input and the integration of the photocharge may occur continually. The only time period where erroneous signal can be produced is during the selective transfer (weighting) process. This time period is a very small portion of the complete cycle time. Therefore, a choice can be made between either a small error in the signal with the optical input always present or an operational procedure whereby the optical input is turned off during the selective transfer process.
The high optical duty cycle and the rapid weighting operation are complemented by a design which permits a rapid transfer into the serial output register. These characteristics contribute to a high sensitivity and a total latency which is dominated by the serial output rate.
To assure that the weight introduction process is not adversely impacting the transversal imager cycle time, the digital register design incorporates multiple input ports and a digital clock rate such that the weight introduction time (Tl) is less than the serial readout time (T4). Furthermore, the weighting/summation process (T2) and the transfer into the serial register (T3) are rapid compared to the serial readout time (T4). Thus, the device may be run in the manner shown in Fig. 3 , assuring a cycle time essentially equal to the serial output register readout time (T4).
An approximation of analog synaptic weights can be achieved with this processor architecture. By performing the above described operating cycle on a series of synaptic weight "bit planes," accumulating the outer product terms, and then applying the appropriate l/2 n weighting, the synaptic weights can be implemented to n-bits. The penalty of this increased resolution in the weights is the increase from a single computational cycle to n cycles.
VI. PERFORMANCE RESULTS
A 32 x 32 transversal imager has been fabricated in a 3-nm NMOS/ buried channel CCD process. The device has a square 200 x 200-/xm pixel pitch. (The pixel size is primarily determined by coarse metal design rules, since the host device on the wafer is a back-side illuminated, thinned visible imager.) Each pixel consists of two photo-capacitors, two stages of digital shift register, and the associated circuitry to perform ternary weighting.
The results shown are for a serial output register rate of 1.0 MHz. The various Tl, T2, T3, T4 portions of the cycle are being run non-overlapped to simplify the operating electronics requirements. The resulting cycle rate is 2500 cycles/s. In this operating mode, 8 x 10 4 differentially sensed outer-product terms are produced per second. One of the advantages of the transversal imager architecture is that the operating cycle does not change if the operating mode is changed. The same operating cycle is utilized in the optical state vector and electrical synaptic weights mode as is used in the optical synaptic weights and electrical state vector mode. The transversal imager therefore sustains the same outer-product rate in either mode.
In the results presented below, the actual photocharge weighting and summation in the charge domain is completed in parallel across the complete array in a time T2 = 20 ps. utilizing a 200 x 200-jUm cell pitch, 1.5-MHz clocking, and a non-overlapping operating cycle achieves 5 000 000 analog binary operations/s (2 500 000 multiplications, and 2 500 000 additions). As a neural network processor the current implementation achieves 32 fully interconnected neurons with an interconnection update rate of 2.5 x 10 6 interconnection updates/s. This architecture, the overlapped operating cycle discussed, and today's silicon IC technology would permit scaling to a 75 x 75-jum cell pitch and 10-MHz clocking. This would permit realization of a 128 x 128 processor with 128 fully interconnected neurons processing at an update rate of 7 x 10 8 interconnection updates per second. Furthermore, the architecture permits, even at these high processing rates, the optical input and the electrical input to both be updated during each cycle. This would be very useful for learning algorithms where it is desirable to update the synaptic weights frequently.
The architecture permits extension to larger neural networks by tiling of the transversal imager devices in2x2or4x4 mosaics. In the configuration with optical input of the state vector, to achieve these larger networks, an optical input device with twice or four times the number of elements would be required. The optical input would be spread across the rows of the mosaic and the column partial outer-product outputs from each device would be combined within the columns of the mosaic to produce the desired outer-product result. The transversal imager devices would be run in parallel with common clocking signals. In this manner the architecture can be extended to realization of larger electro-optical neural processors. The preliminary results shown in Fig. 4 show the final differential output of 18 columns for a uniform light input and a complete column weighting of (left-to-right): -i 1-+0000 0000 -+ -I
. (A portion of the small non-uniformity in response is due to non-uniform illumination.) Fig. 5 shows the final transversal imager output for a uniform negative weight and a wedge-shape optical input pattern.
VII. CONCLUSION
A novel electro-optical neural network processor has been developed. The processor is based on MOS/CCD integrated circuit technology. The 32 x 32 transversal imager demonstrated performs ternary weighting. This first implementation
