A novel, compact optoelectronic hardware neural network architecture based on an optical broadcast scheme is proposed and demonstrated. The basic cell in the system is composed of electronic neurons that share the same time-distributed, optical broadcast input. The system combines the computational strengths of electronics and the communication strengths of optics, and employs a modular reconfigurable architecture that is potentially scalable to a very large number of neurons while maintaining compactness. These characteristics are realized in an architecture that combines the integration of the processing elements in complementary metal-oxide semiconductor (CMOS) technology with the construction of efficient optical interconnection elements with focusing properties based on multiplexed volume holograms.
Introduction
A neural network consists of a set of very simple processing elements with a very high degree of interconnectivity between them. The computation strength of a single processing element is almost null, but the massive cooperation between lots of them results in a powerful machine that may be suitable for such tasks as pattern recognition, speech recognition, classification, and many others that may be computationally intensive for conventional digital electronic computers. 1 It is believed that the construction of specific hardware for neural networks will allow wide application of neural networks to real-world problems. 2 Dedicated specific hardware architectures should be composed of lots of neurons or processing elements, there should be an even higher number of synapses or weighted interconnections between neurons ͑e.g., fully interconnected͒, the architecture should be capable of implementing a variety neural network algorithms, and it should be endowed with a mechanism for updating interconnection weights to allow learning and training. Furthermore, compactness and power efficiency are also necessary. 3 In a fully interconnected neural network, the number of interconnections grows with the square of the number of processing elements. So a hardware architecture for neural networks should be interconnection centric. Optoelectronic neural network hardware implementation has been an active area of research since the first model was proposed by Farhat et al. 4 The key motivation is the high parallelism provided by optical interconnections, which promises large capacity over electronic counterparts. 5 Most optoelectronic proposed systems are based on vector-matrix or matrix-tensor multipliers. In these systems, an input is introduced into the optical processor by a modulated 1-D or 2-D source of light, the input beam intensities are individually multiplied by the weight-matrix mask ͓usually accomplished with a spatial light modulator ͑SLM͔͒, and the resulting optical signals are distributed to the output plane, where an array of optical detectors add their contribution to form the output. 4 -8 These systems are often bulky and difficult to scale up to high numbers of processing elements. Some problems encountered in the realization of large-scale optoelectronic systems are: optical alignment, interconnection weight reconfiguration and assignment by the SLM, and the construction of the associated optomechanical subsystems with reduced dimensions.
Optical interconnects have also been shown to be the leading approach in multigigahertz-frequency-range clock distribution networks; the main advantages are the reduction of jitter, skew, and power consumption. 9 Like clock distribution networks, one possible architecture for neural network optical interconnects is that of a signal source that would broadcast to many different nodes in the system. 10 Based on these principles, we have previously reported a first prototype implementation of an optical broadcast architecture for neural networks based on discrete optics and optoelectronic devices, and it allowed us to study its performance parameters as a neural processing architecture. [11] [12] [13] [14] In this work, we further analyze the advantages of an optical broadcast ͑interconnection centric͒ architecture for hardware neural networks. We also focus on the two main tasks involved in the construction of our compact neural network system: 1. the integration of the optoelectronic processing elements, and 2. the construction of a high efficiency interconnection element based on multiplexed volume holograms.
Optical Broadcast Architecture

Architecture Description
A new optoelectronic architecture for neuroprocessors has been proposed and tested. [11] [12] [13] [14] The basic cell in the system is composed of electronic neurons that share the same time distributed input; the input is introduced by means of optical broadcast. The system combines the computational strength of electronics and the communication strength of optics, and is potentially scalable to a very large number of neurons.
The main feature of our system compared with other optoelectronics systems is that our system uses global interconnections performed by the use of a holographic diffuser ͑see Fig. 1͒ . That is, any neuron in one plane can be connected to any other neuron in the following plane at any time instead of a sparse previously determined, fixed interconnection path. Our system, shown in Fig. 1 , contains K cells that work in parallel. The basic constituent cell is composed of an optical emitter, a holographic diffuser, and an array of ''weight-up and accumulate'' neurons ͑optical detectors with their associated electronics͒. The detailed architecture of the cell is presented in Fig. 2 . The emitter introduces the N-element input sequentially as light pulses, and the holographic diffuser broadcasts the information to an array of optically addressable optical detectors. There is one detector per neuron that generates the signal to the activation circuit of the corresponding weight-up and accumulate neuron. By stacking these cells one on top of the other into noninterfering planes, higher parallelism and compactness are obtained. The optical time-multiplexing scheme employed allows us to: 1. minimize the number of detectors so that only one per neuron is needed ͑M detectors per cell͒; 2. minimize the number of optical emitters, only one per cell is needed ͑K cells͒; and 3. simplify the design of the optical interconnection device in addition to facilitating optical alignment. Figure 2 is a block diagram of an optoelectronic cell. The cell works as follows. First, all neurons are cleared. The operational cycle of a cell is divided in time into N slots. In the first time slot, the first input is presented and optically distributed to all neurons in that cell. Each neuron executes the product of first input and the corresponding interconnection weight, and stores the result. In the next time slot, the second input is broadcasted, multiplied by the interconnection weight, and added to the previous result. At the end of the cycle, composed of N time slots, all inputs have been presented and the outputs are the product of the input vector and the interconnection weight matrix. These outputs (O i in Fig. 2͒ can be connected to different hardware blocks, for example, threshold electronic circuits or winner take all ͑WTA͒ circuits that suppress all outputs other than the initially maximum input.
This cell is essentially a hybrid ͑optical electronic͒ vector-matrix multiplier, in which communication fan-out is done optically, and the interconnection weight assignment and the other neuron operations are done electronically. The size of the input vector is determined by the number of time slots within an operation cycle ͑N͒. The maximum size of the output vector is determined by the number of neurons. The multicell system ͑Fig. 1͒ allows us: 1. to increase the number of output neurons ͑up to K ϫM ) if all cells are feed with the same input; 2. to implement a parallel multilayer neural network ͑up to K layers composed of up to M neurons each͒, in this case the output of one cell would be the input to the next after being stored electronically and sequenced to the next cell ͑the multilayer system works in a pipeline fashion͒; 3. additionally, local wired interconnections can be provided in the detector plane between neurons that belong to different cells; this could improve neuron performance and provide feedback connections in a multilayer neural network.
Prototype Implementation
In the past, we have implemented a first prototype of a broadcast cell composed of 16 neurons. The system was built using off-the-shelf discrete electronic and optoelectronic devices. The elements of this system are ͑see Fig. 2͒ an optical emitter, a diffuser, an array of detectors with their associated neuron circuitry, and a digital memory to store interconnection weights. Additionally, a controller provided timing control. Ruiz-Llata, Lamela, and Warde: Design of a compact neural network . . . Figure 3 shows the optoelectronic circuitry of one neuron for the first prototype implementation of the optical broadcast neural network architecture. 11 It is composed of a photodetector that converts input light pulses ͑I͒ into current pulses; an analog multiplexer controlled by signal weight ͑W͒, which implements the multiplication operation; and a capacitor that implements the accumulation function. The analog switch controlled by signal CLR resets the capacitor at the beginning of the operation cycle. Figure 4 shows the operation cycle of one neuron when binary unipolar inputs and binary interconnection weights are allowed. The first waveform is signal CLR. When it is activated, it shorts the storage capacitor so that the output V c ͑last waveform is the oscillogram͒ is set to 0 V; CLR must be deactivated for the operation of the system. The second waveform ͑CLK͒ is for synchronization: inputs and interconnection weights are presented sequentially; in this case, the input vector is composed of 16 elements so the operation cycle is divided into 16 time slots. The third waveform is the interconnection weights ͑W͒ with values of 0 or 1 in each time slot. If W is 1, then the analog multiplexer connects the photodetector with the capacitor; if W is 0, the detector is not connected to the capacitor. The fourth waveform is the input signal ͑I͒. If signal I is 1, the laser driver is connected to the diode laser emitter and a light pulse arrives at the detector; if I is 0, no light is emitted. When both input ͑I͒ and interconnection weight ͑W͒ are 1, there is a photocurrent that increments the charge on the storage capacitor, and then the increment of the voltage in the capacitor (⌬V c ) in a time slot is:
055401-2 Optical Engineering
⌬V C ϭIW ͩ 1 C PR⌬t ͪ ,
͑1͒
where I is the input, W is the interconnection weight, and the terms in the brackets are the design parameters of the system. Here C is the capacity of the storage capacitor, P is the optical power that reaches the detector, R is the responsivity of the detector, and ⌬t is the duration of the light pulse. At the end of the operation cycle, the voltage across the capacitor (V c ) is proportional to the product of the input vector and the corresponding weights vector ͑see the fifth waveform in Fig. 4͒ . In the first prototype implemen- Ruiz-Llata, Lamela, and Warde: Design of a compact neural network . . .
055401-3 Optical Engineering
May 2005/Vol. 44 (5) tation based on discrete optoelectronic devices, 11 the storage capacitor was set to Cϭ10 nF, the maximum operation speed was ⌬tϭ1 s, and the responsivity of the detector at the emitter wavelength ͑ϭ685 nm͒ was Rϭ0.45 A/W. For the oscillogram presented in Fig. 4 , ⌬t was set 10 s and P was set to 270 W, so we obtain ⌬V c ϭ120 mV.
The use of standard digital memory to store interconnection weights increases the flexibility of the system, as interconnection weights can be easily updated, thus allowing learning and training. Although for the description of the neuron behavior we have limited the resolution to 1 bit ͑binary interconnection weights͒, higher resolution of inputs and interconnection weights can be achieved, using the neuron circuit in Fig. 3 , by proper coding of input and interconnection weights signals. It has been shown that inputs and interconnection weights could represent analog values if they are coded as pulse signals to be transmitted with binary values. 15 In this sense, the high bandwidth of optics could be exploited by coding of the inputs as pulse rate modulated ͑PRM͒ signals and coding of the interconnection weighs as pulse width modulated ͑PWM͒ signals. 15 As in the binary case, the product ͑AND͒ would be implemented by the analog switch and the accumulation function by the capacitor.
Comparison with Wired Electronic Implementations
Hardware optoelectronic neural systems could have their place in applications that require very high operation speed or that manage huge amounts of data ͑e.g., where software simulation of a neural network model running on a sequential digital computer cannot be optimized͒. Examples are packet routing in optical communication systems 8 and vision applications. 13, 14 Electronic neural networks are classified into digital or analog. 2 Recent trends on digital neural networks are implementations of relative small neural networks using general purpose commercial chips as digital signal processors ͑DSPs͒ and field-programmable gate arrays ͑FPGAs͒. We can also find digital application specific integrated circuits ͑ASICs͒ that implement specific neural network models. 2 The performance of these systems for high throughput applications is constrained by their hardware architecture.
Analog neural networks can offer higher parallelism and speed than digital implementations, so they are potential candidates to deal with high computational tasks. Their main disadvantage is considered to be their lower accuracy, but it is also argued that collective computation makes them suitable to function in many practical applications. 2 A typical architecture for analog neural networks consists of an array of synapse circuits interconnected such that inputs are distributed in rows, and outputs from synapse circuits are added in the columns. Defining the connectivity of a neural network as the number of synapses per neuron, the connectivity of a neural network chip is related to the synapse circuits' array size. Typical array sizes are in the order of 100ϫ100 synapses, which means a limit of 100 synapses per output neuron. This value for the connectivity, related to the synapse array size, has not been significantly increased, to our knowledge, for the last decade. This seems to suggest the limited scalability of pure electronic neural network chips.
The optical broadcast architecture is a new concept for optoelectronic neural networks that uses optics for interconnects and electronics for computing. Compared with analog electronic neural network implementations, 2 our architecture exploits the key advantages of optical interconnects. Optical interconnections have the benefit that they can be easily scaled up in number and that they can broadcast information at higher speeds and over longer distances ͑if necessary͒, and to a higher number of nodes than electrical wires. 16 Additionally, as input pattern elements are introduced sequentially in our broadcast architecture design, the neuron array size does not restrict the number of synapses per output neuron. In this regard, we have recently demonstrated a scalable four-neuron system that allows 4096 time distributed synapses per neuron, for processing 64ϫ64 ͑ϭ4096͒ element patterns.
14 In that work, we also addressed the problem of implementing a faster prototype, with a new limitation on the electronic circuitry at 150-MHz bandwidth. This implies a time slot ⌬t equal to 7 ns, which is an improvement of 3 orders of magnitude in processing time of our first system demonstrator.
Our neural optoelectronic prototype has been conceived for the construction of arrays of densely populated, massively interconnected processing elements that operate in parallel and at high speed ͑wires for input broadcast are to be substituted by free-space optical beams͒. Massively interconnected optical beams are able to broadcast to higher numbers of processing elements. 16 Although the inputs are distributed in time, neural operations are executed in parallel as K input signals are broadcast to M detectors. The high operation speed of the system is a benefit of using optical communication devices. The combination of these two features ͑optical broadcast parallelism and high speed optoelectronic devices͒ is responsible for the high speed of our neural architecture. If we measure speed as the number of connections per second ͑CPS͒, where a connection is defined as a multiplication and an addition, then, taking the time slot ⌬tϭ7 ns, assuming the number of cells K ϭ100, and the number of neurons per cell M ϭ100, we see that a speed of 1.4ϫ10
12 CPS is possible.
Integration of the Processing Elements
CMOS Integrated-Circuit Processing Elements Array
The structure of a processing element on our neuroprocessor architecture is very similar to the structure of the pixel in a CMOS sensor. 17 Based on this technology, we have implemented the first integrated-circuit prototype of an array of processing elements. It was implemented on 0.6-m CMOS technology ͑three metal layers and two polysilicon layers͒ and it consisted of an array 6ϫ6 pixels. Each of the 4ϫ4 central pixels are controlled by independent weight signals, while the pixels around the perimeter share the same weight signal. Reset is common to all pixels. Figure 5 shows the floor plan of the circuit and the amplification of one pixel. The pixel pitch is 150 m, and the circuit area including pads is 2200ϫ2200 m.
The structure of each pixel is presented in Fig. 6 . It works in a way similar to the processing element of the previous prototype that was based on discrete optoelectronic devices. At the beginning of an operation cycle, signal ''set'' ͑see Fig. 6͒ is activated at a low level. The signal ''set'' is equivalent to signal CLR in the previous prototype; it is common to all pixels and it sets the voltage of the capacitor to its maximum. Following the principle of operation of the optical broadcast architecture, the input elements are presented sequentially and optically distributed to all neurons. Input light signal ͑I͒ is detected by an integrated phototransistor, which generates a photocurrent proportional to the input. In one time slot, one input is presented and signal W is activated according to the corresponding interconnection weight. Thus the voltage on the capacitor is decremented proportionally to the product of the input and the corresponding interconnection weight ͓Eq. ͑1͔͒. At the end of an operation cycle, when all inputs have been presented, we have an array of capacitors with their voltage proportional to the product of the input vector and the interconnection weight matrix.
Performance Evaluation of the CMOS
Integrated-Circuit Processing Element Figure 7 shows an operation cycle of one neuron of the system for binary unipolar inputs and interconnection weights. Equation ͑1͒ summarizes how the neuron works.
Here, it represents the decrement in the voltage of the capacitor in a time slot that is proportional to the product of input ͑I͒ and interconnection weight ͑W͒, and the design parameters of the system ͓storage capacitor ͑C͒, the optical power that reaches the detector ͑P͒, the responsivity of the detector ͑R͒, and the time slot (⌬t)]. For binary unipolar inputs and interconnection weights, if input I is 0, the voltage in the capacitor does not decrease because no light arrives to the detector and no photocurrent is generated. If input I is 1, the voltage in the capacitor decreases only if the interconnection is 1, because in this case, the detector is connected to the capacitor. At the end of the operation cycle, the change in the output voltage is proportional to the product of the input vector and the interconnection weights.
In the integrated-circuit prototype implementation of the neurons for the optical broadcast architecture, the value of the storage capacitor has been reduced 4 orders of magnitude below that of the discrete devices prototype, specifically from 10 nF to 0.45 pF. Keeping in mind Eq. ͑1͒, which relates the operation of the neurons with its design hardware parameters, we can study how the system can be improved. The possible modification of the design parameters are summarized in Table 1 . Assuming similar responsivity of the optical detectors, the effect on the reduction of C can be compensated by decrementing the optical power emitted in each cell by 1 order of magnitude, incrementing the speed of the system by decrementing the time slot by 2 orders of magnitude, and incrementing the number of processing elements by 1 order of magnitude. As the optical power needs to be broadcast to more optical detectors, the optical power that reaches one detector will be reduced by 1 order of magnitude.
All the improvements in the design parameters of the integrated prototype were obtained, except for the operation speed. The operational speed of the integrated-circuit prototype was slower than that of the discrete-element prototype. This is due to the integrated phototransistor, which has not been optimized in CMOS technology for this pro- Ruiz-Llata, Lamela, and Warde: Design of a compact neural network . . .
055401-5 Optical Engineering
May 2005/Vol. 44 (5) totype. Further improvements on CMOS devices are under development.
Optical Interconnection Element for the Broadcast Architecture
Holographic Optical Interconnections
The use of broadcast optical interconnections has several advantages. 1. It is a regular architecture, so the optical interconnection device is identical for all the cells that comprise the architecture; no special changes in angle or orientation are needed if an array of holographic diffusers is used ͑see Fig. 1͒ . 2. Optical interconnections allow the distribution of the optical signal at high speed, because negligible delay occurs between different neurons in the distribution of the input signal. 16 3. No spatial light modulators ͑characterized by a narrow optical frequency response 18 ͒ are needed, only emitters and detectors, thus different wavelengths can be used to distribute the inputs ͑e.g., input and clock signal͒. 4. The optical broadcast architecture allows the construction of a fully interconnected, compact, and reconfigurable ͑in size and speed͒ neural network.
From among the different methods of fabricating holographic interconnection elements for the optical broadcast architecture, we chose the optically recorded volume holographic approach. Optically recorded volume holograms provide high diffraction efficiency, exhibit frequency and angular selectivity, and offer the possibility of superposing different holograms in the same volume. An additional and important advantage of an optically recorded interconnection hologram in contrast with a computer generated phase-only hologram ͑recorded by techniques such as photoreduction 19 or lithographic 20 processes͒, is that larger diffracted angles with higher efficiency are possible with optically fabricated holograms. This means that the distance between the emitter plane and the detector plane can be made much shorter, or that the scale of the system, measured in terms of the number of optical transmitters and receivers, will not result in a prohibitively large optomechanical system.
To optimize power consumption, the size of the spot that reaches all detectors should be the same size or smaller than the area of the photodetector. This condition also avoids unwanted photoelectron generation in the nondetector regions of the integrated circuit system. An optical system for interconnections is composed of three elements: collimating or imaging lenses after the optical emitter, a diffractive optical element, and focusing lenses to concentrate the light onto the photodetectors. 8, 21 In this work, we began testing our technique to produce optical interconnection elements that eliminate both the emitter lenses and the detector lenses. That is, we designed optical interconnection elements with built-in focusing properties. 
Design of Object and Reference Beams for the Desired Interconnections
The optical broadcast architecture is composed of cells where each cell has one optical emitter that interconnects all neurons in that cell. In the first prototype, each emitter was connected to 16 detectors by a diffuser that spreads the optical power from the emitter all over the detector plane, as represented in Fig. 1 , whereas now ͑in Fig. 8͒ , each emitter is connected directly to the 16 detectors with a holographic interconnection element ͑Fig. 8 shows only eight interconnections for simplicity͒.
To optimize the holographic recording process, we need to take into consideration the thickness of the holographic recording material. For plane waves ͑see Fig. 9͒ , the condition for complete power transfer is:
and where d is the material thickness, 1 is the reference beam ͑input͒ angle, 2 is the object beam ͑output͒ angle, ␤ϭ2/, and ⑀ r1 /⑀ r0 is the relative modulation amplitude ͑index modulation͒. Assuming 1 ϭ0 deg and 2 ϭ20 deg, ϭ890 nm, and ⑀ r1 /⑀ r0 ϭ4•10 Ϫ3 ͑typical of the Aprilis photopolymer we used͒, we obtain a required material thickness of dϭ216 m. Some of the tests reported later were done with visible light, for ϭ633 nm, and this requires an optimum thickness of 150 m.
Second, to build focusing and beam steering into the interconnection holograms, we must pay special attention to the recording and readout geometries and wavelengths. In particular, if we define a planar coordinate system (X,Y ) where the hologram is centered in position ͑0,0͒, the single grating specifications can be summarized as follows ͑see Fig. 10͒: • the hologram will be read by a beam coming from a point source located at coordinates (X sr ,Y sr ) • we want the output beam to be focused in the detector located at (X dr ,Y dr )
• the readout wavelength is r .
Another important parameter of the recording material is its recording wavelength sensitivity. For the Aprilis photopolymer material, it is 532 nm. This write wavelength is different from the emitter wavelength ͑readout wavelength͒ we planned to use in our neural processing system. For the first prototype, the readout wavelength was ϭ633 nm ͑from a red He-Ne laser͒, and for future systems, we will use a linear vertical cavity surface emitting laser ͑VCSEL͒ array ͑see Fig. 1͒ with an emission wavelength of 890 nm.
The consequence of different wavelengths for writing the hologram ( w ) and for reading the hologram ( r ) is that the orientation and geometry of the object and reference beams that will write the hologram are very different from the orientation and geometry of the readout and desired output ͑diffracted͒ beams. It can be shown that for plane waves, a change in the recording wavelength does not affect the hologram efficiency if the material is thick enough.
A simplified ray-optics method based on the Bragg condition for determining the relationship between writing and reading beams with different wavelengths was used to design the holograms. According to this method, there is only one way to write the desired hologram with a determined wavelength. Let us assume the optical emitter of the optoelectronic neural network to be a point source located in the position (X sr ,Y sr ) with wavelength r and that we want to focus it on the detector located in the position (X dr ,Y dr ), as shown in Fig. 10 . Then the beams used to write the hologram with wavelength w will be a diverging reference beam that originates from the position (X sw ,Y sw ) and an object beam that focuses on the position (X dw ,Y dw ). Figure 10 shows the write and read beam geometries when r Ͼ w . Ruiz-Llata, Lamela, and Warde: Design of a compact neural network . . .
055401-7 Optical Engineering
May 2005/Vol. 44 (5) We could apply these principles to design a hologram for the first prototype implementation of the optical broadcast architecture. Here, the aperture of the hologram (D h ) would be set to 5 mm. Assuming a laser diode is to be used as the optical emitter with a beam divergence of around 20 deg, then the emitter would be placed at a distance of 13 mm from the hologram to illuminate only the hologram aperture (Y sr ϭ13 mm).
The detector plane would be located at 130 mm from the hologram (Y dr ϭ130 mm). As we have 16 detectors spaced 4 mm, the closest detector would be located at X dr ϭϪ2 mm, and the furthest detector at X dr ϭϪ(16 ϫ4)/2 mm for an on-axis system ͑Fig. 8͒ ͓or X dr ϭϪ(16 ϫ4) mm for an off-axis system, where the 16 detectors are in the right side viewed from the emitter͔. These numbers give a minimum diffraction angle of 1 deg and a maximum diffraction angle of 13 deg for an on-axis interconnection system, or 26 deg for an off-axis system.
To design the volume holographic interconnection element, up to 16 holograms with different diffraction angles must be superimposed. System parameters used were: D h ϭ5 mm, (X sr ,Y sr )ϭ(0,13) mm, (X dr ,Y dr )minϭ(Ϫ2, Ϫ130) mm, and (X dr ,Y dr )maxϭ(Ϫ62,Ϫ130) mm. With these characteristics, the shape of reference and object beams in the writing step were calculated for each single hologram ͑Fig. 10͒, obtaining a different pair of focusing positions (X sw ,Y sw ), (X dw ,Y dw ) for each single hologram.
Fabrication of Optically Multiplexed Volume Holographic Interconnections
The procedure for the making of holograms with focusing properties was verified experimentally. We used the Massachusetts Institute of Technology ͑MIT͒ hologram writer system 23 to record the desired holographic interconnection elements. The interconnection holograms were recorded on Aprilis HMC-050-G-15-A-200 photopolymer plates. The thickness of the photopolymer material was 200 m. This thickness sets the limit on the maximum diffraction angle we can obtain. We used an argon laser tuned at 514.5 nm to generate object and reference beams. The photopolymer plate was placed on the computer-controlled translation and rotation stages of the MIT hologram writer, which allows recording the holograms in any position of the plate and with any orientation. The exposure time was also computer controlled.
To simplify the fabrication process of the interconnection holograms, the reference beam was kept identical for all the different holograms we made. The reference beam was focused on the position ͑0,15͒. The writing object beam was a plane wave whose angle of incidence varied between 0 and 24 deg from the vertical direction. The number of multiplexed holograms recorded into one interconnection element varied from 1 to 16. This was done by changing either the orientation or the diffraction angle. Figure 11 shows the output beam for different interconnection elements when they are read out with a He-Ne laser ( r ϭ633 nm). In particular, Fig. 11͑a͒ shows the reading of a single hologram. The input beam is bigger than the hologram, so the undiffracted readout beam can be seen on the left with the shadow of the hologram in it. The spot on the right is the diffracted beam, which focuses at position ͑Ϫ20,Ϫ110͒ mm from the hologram. The diffraction angle is 11 deg and the diffraction efficiency, measured as the ratio of the optical power of the first diffracted order and the total power at the output of the hologram, is 80%. Figure 11͑b͒ shows the readout of an interconnection element with eight multiplexed holograms where the angle and orientation have been changed. Figure 12 shows the readout of an interconnection element formed with 16 multiplexed holograms with different angles ͑from 1 to 26 deg when read with r ϭ633 nm). The output beams focus at an average distance of 120 mm from the hologram and the spots are spaced 4 mm apart in the focusing plane. Thus, we get the desired interconnection scheme presented in Fig. 8 with 16 interconnection beams. As expected from the theoretical calculations, the maximum diffraction efficiency for each single hologram will be achieved at a different position, because we wrote all the holograms with the same reference beam. Higher efficiency can be achieved if the reference beam is modified for each single hologram.
The single holograms were also read with an infrared semiconductor laser diode ( r ϭ850 nm). As expected, we observe that the focusing plane was closer, at an average distance of 40 mm from the hologram, with a spot spacing of 1.6 mm.
Conclusions
We show that the optical broadcast architecture for neural networks can benefit significantly from the use of CMOS integration of the processing elements, and directed, high diffraction efficiency holographic optical interconnections. Its hardware architecture is composed of multiple cells that can be stacked one on top of the other, and whose interconnection patterns are identical. At the same time, the architecture exploits both the processing capabilities of electronics and the communication strengths of optics.
The implemented CMOS integrated-circuit prototype of the optoelectronic processing elements uses a 6ϫ6-element array of detectors with attached electronics. We show that the integrated prototype is able to perform the basic neural operations of multiplication and addition it was designed for. However, the speed processor is slow compared with the discrete-element prototype. The slow operation is a result of the phototransistor not being optimized in the CMOS technology used. Much faster devices can be built using CMOS technology. 24 We also design, build, and test an optical setup to fabricate holographic optical interconnection elements with built-in focusing properties, so that no additional emitter or detector optics will be necessary. In particular, we show the feasibility of directly connecting and focusing one laser emitter onto 16 detectors.
Finally, the work presented allows us to evaluate how the size of the array of neurons of the integrated prototype system scales up with an increasing number of processing elements. It is shown that a pixel pitch of 150 m and diffraction angles up to 27 deg are feasible for this optical broadcast neural processor. So a 100-cell system with 100 processing elements in each cell can be packaged in a cube 15 mm on a side. Related work on a fully integrated multilayer system is also ongoing. 23 
