In this paper, we present theoretical details and the underlying architecture of a hybrid optoelectronic correlator (HOC) that correlates images using spatial light modulators (SLMs), detector arrays, and field programmable gate array (FPGA). The proposed architecture bypasses the need for nonlinear materials such as photorefractive polymer films by using detectors instead, and the phase information is yet conserved by the interference of plane waves with the images. However, the output of such an HOC has four terms: two convolution signals and two crosscorrelation signals. By implementing a phase stabilization and scanning circuit, the convolution terms can be eliminated, so that the behavior of an HOC becomes essentially identical to that of a conventional holographic correlator (CHC). To achieve the ultimate speed of such a correlator, we also propose an integrated graphic processing unit, which would perform all the electrical processes in a parallel manner. The HOC architecture along with the phase stabilization technique would thus be as good as a CHC, capable of high-speed image recognition in a translation-invariant manner.
INTRODUCTION
Target identification and tracking is important in many defense and civilian applications. Optical correlators provide a simple technique for fast verification and identification of data. Over recent years, we have been investigating the feasibility of realizing an all-optical high-speed automatic target recognition correlator system using the inherent parallelism of optical techniques [1] [2] [3] [4] [5] [6] [7] . Other groups have also pursued the development of such correlators [8, 9] . The simplest form of such a system is the basic Vander Lugt [10] optical correlator, which is illustrated schematically in Fig. 1 . Here, each lens has a focal length of L. The process starts with an image [e.g., the reference image, possibly retrieved from a holographic memory disc or with a spatial light modulator (SLM) connected to a computer] in the input plane P 1 . The reference image passes through the lens producing Fourier transform (FT) of the image at plane P M . Now a plane wave is applied, at an angle φ in the y-z plane, to interfere with the Fourier transformed image in the plane P M . The interference is recorded in a thin photographic plate, which produces a transmission function that is proportional to the interference pattern. Once the recording is fixed, the query image (e.g., from a camera connected to an SLM) is presented in the input plane P 1 . After passing through the first lens, the FT of the query image passes the photographic plate. After passing through the second lens, the cross-correlation signal is observed at the output plane, P 2 . The amplitude of this crosscorrelation signal is high when the reference image and the query image are matched. If the query image is shifted with respect to the input image in the x-y plane, the correlation spot will also appear shifted. The other signals produced (for example, the convolution between the two images) during this process do not overlap with the cross-correlation signal if φ is chosen to be sufficiently large. The limitations of such an architecture is that the recording process is very time consuming. This constraint is circumvented in a joint transform correlator (JTC), where a dynamic material such as photorefractive polymer film is used so that the recording and correlation take place simultaneously. A JTC makes use of a dynamic nonlinear material, such as photorefractive thin film produced by Nitto-Denko [1, 2] . The primary limitation of this system is the poor nature of the material used for making the JTC. First, it is very fragile, and gets destroyed rather easily for reasons not well understood. Second, the diffraction efficiency is rather small, and it produces a lot of scattering, leading to a very poor signal-to-noise ratio (SNR). Third, after carrying out some correlations, the residual gratings generated in the medium have to be erased by applying a high voltage; this process takes quite a bit of time. To our knowledge, all dynamic holographic films suffer from similar limitations.
The scheme presented in this paper gets around this problem by making use of concepts developed recently in the context of digital holography [10] . Namely, the nonlinearity provided by the JTC medium is replaced by the nonlinearity of high-speed detectors (since detectors measure intensities, they are inherently nonlinear). Of course, this requires some modification of the architecture, as well as post-processing of signals. In this paper, we specify an explicit and novel architecture that enables the process of correlating images using detectors only, and the phase information is conserved by interfering with plane waves. We would like to point out that Javidi and Kuo [11] demonstrated a JTC that also makes use of detectors, in a manner similar to what we described here. However, this approach has some key limitations. The reference and the query images have to be placed in the same field of view. In order to do so, it is necessary to convert each image to a digital format first, and then create a composite image, which is sent to an SLM, for example. This process precludes the use of a scenario, necessary for very rapid search, where the reference images are retrieved directly from a holographic memory disc. For this scenario, it would be necessary to combine the reference image, which is in the optical domain, with the query image, also in the optical domain after generation from an SLM, using a beam splitter. When this is done, the correlation will depend sinusoidally on the relative phase difference between the two optical paths, with no obvious option for stabilizing or optimizing this phase difference. Our approach presented here circumvents both of these constraints.
The paper is organized as follows: Section 2 presents the theoretical details of the proposed hybrid optoelectronic correlator (HOC). The phase stabilization and scanning technique, which eliminates the convolution terms, is also described in this section. Section 3 presents some possible future works on realizing an optoelectronic processor that would speed up the whole process of correlation. Section 4 describes simulation results of the proposed HOC using MAT-LAB, illustrating its performance for various cases of interest.
PROPOSED HYBRID OPTOELECTRONIC CORRELATOR
The overall architecture of the HOC proposed here is summarized in Figs. 2 and 3 . Briefly, the reference image, H 1 , is retrieved from the database and transferred to an optical beam using an SLM (SLM-1), and is Fourier transformed using a lens. The Fourier transformed image (M 1 ) is split into two identical ports. In one port, the image is detected by an array of detectors, which could be a high-resolution focal plane array (FPA) or a digital CMOS camera. As an explicit example, we consider the USB2.0 CMOS Camera (DCC1545M), which has 1280 H × 1024 V pixels and sends 10 bit data of each pixel at a 48 MHz clock rate, thus requiring about 27 ms to send an image. The signal array produced by the camera is denoted as B 1 . The camera is interfaced with a field programmable gate array (FPGA) via a USB cable. B 1 can be stored in the built-in memory of the FPGA . In the other port, M 1 is interfered with a plane wave C, and detected with another CMOS camera, producing the digital signal array A 1 and is stored in the memory of FPGA-1. A 1 and B 1 can be expressed as
In addition, the intensity profile of the plane wave (jCj 2 ) is measured, by blocking the image path momentarily, using a shutter (not shown), and the information is stored in the memory component of FPGA-1. FPGA-1 then computes and stores S 1 , which can be expressed as
Here, ϕ 1 x; y is the phase of the Fourier transformed image, M 1 , and Ψ 1 is the phase of the plane wave, C. It should be noted that ϕ 1 is a function of x; y, assuming that the image is in the x; y plane. This subtraction process has to be done pixel by pixel using one or more subtractors available in the FPGA. Consider, for example, the Virtex-6 ML605 made by Xilinx as a candidate FPGA, which has an oscillator frequency of 200 MHz, so that each subtraction takes about 5-10 ns depending on the implementation of the adder circuit. Thus, the total subtraction process of an image of size 1280 × 1024 pixels would take about 6 ms when the subtraction is done with one subtractor. This process can be speeded up by using multiple subtractors that can operate in parallel in many FPGAs.
The captured query image, H 2 , is transferred to an optical beam using another SLM (SLM-2), and split into two paths after being Fourier transformed with a lens. The resulting image in each path is designated as M 2 . In a manner similar to what is described above for the query image, the signal S 2 A 2 − B 2 − jDj 2 is produced using two cameras and an FPGA (FPGA-2) and stored in FPGA-2 memory. Here, D is the amplitude of an interfering plane wave, and the other quantities are given as follows:
As before, ϕ 2 x; y is the phase of the Fourier transformed image, M 2 , and Ψ 2 is the phase of the plan wave D.
In the final stage of the hybrid correlator (as shown in Fig. 3 ), these two signals (S 1 and S 2 ) described in Eqs. (3) and (6) are multiplied together using the multiplier in FPGA-3. Four quadrant multiplication can easily be implemented using an FPGA. The resulting signal array, S, is stored in FPGA-3 memory. This can be expressed as
where α ≡ jCjjDje jψ 1 ψ 2 ; β ≡ jCjjDje jψ 1 −ψ 2 . This signal array, S, is now transferred to another SLM (SLM-3) from FPGA-3 through the digital visual interface (DVI) port. For the X-Y series SLM made by Boulder Nonlinear Systems (BNS), for example, this image update would take about 65 ms. Since S can be positive or negative, the SLM should be operated in a bipolar amplitude mode. The optical image produced by SLM-3 is Fourier transformed using a lens, and detected by an FPA. The output of the FPA will provide the main correlation signal. The final signal can thus be expressed as
Here, F stands for the Fourier transform and α; β are constants. Since M j is the FT of the real images H j , j 1; 2, using the well-known relations between the FT of products of functions, and convolutions and cross-correlations, we can express the final signal as the sum of four terms:
where ⊗ indicates two-dimensional convolution, and ⊙ indicates two-dimensional cross-correlation. We can now make the following observations:
• T 1 represents the two-dimensional convolution of the images, H 1 and H 2 .
• T 2 represents the two-dimensional convolution of the images, H 1 and H 2 , but with each conjugated and inverted along both axes.
• T 3 represents the two-dimensional cross-correlation of the images, H 1 and H 2 .
• T 4 represents the two-dimensional cross-correlation of the images, H 2 and H 1 . (Crosscorrelation is noncommutative; hence, T 3 is not necessarily equal to T 4 ).
• If the images, H 1 and H 2 , are not symmetric in both x and y directions, we have
• If the images, H 1 and H 2 , are symmetric in both x and y directions, we have
The cross-correlation technique is usually used to find matches between two objects. In our final result we have convolution terms (T 1 and T 2 ) in addition to cross-correlation terms (T 3 and T 4 ). The convolution terms can be washed 
A. Phase Stabilization and Scanning Circuit
From Eq. (9), it is obvious that the final signal S f depends nontrivially on Ψ 1 and Ψ 2 . To make this dependence more transparent, we can rewrite Eq. (7) as follows:
where M 1 M 2 jM 1 jjM 2 je jϕ M x;y and M 1 M 2 jM 1 jjM 2 je jφ M x;y . Here, the first term corresponds to the convolution and the second term corresponds to the cross-correlation. To eliminate the convolution term, we continuously scan (Ψ 1 Ψ 2 ) over a range of 2π at a certain frequency ω s , while keeping (Ψ 1 − Ψ 2 ) zero. The convolution term varies as we scan (Ψ 1 Ψ 2 ), whereas the cross-correlation term remains constant (since Ψ 1 − Ψ 2 0). While scanning is going on, we pass the signal S through a low-pass filter (LPF) with a bandwidth less than ω s , so that the last term which corresponds to cross-correlation is passed. Such a filter can be easily implemented with the FPGA. The low-pass filtered version of S is Fourier transformed using a lens and detected by an FPA, producing only the cross-correlation signals. Figure 2 shows the architecture for phase stabilization and scanning, where (Ψ 1 − Ψ 2 ) is kept to a constant value using a simple interferometer along with a feedback loop and (Ψ 1 Ψ 2 ) is scanned over a range of 2π using a piezoelectric transducer (PZT). To start with, the output of the laser is split into two paths. One path is used for generating images via reflections from SLM (as already shown in Fig. 2) . The other path is used for generating both reference beams, C and D. Using a PZT (PZT-1) mounted on a mirror before the beam is split into C and D, and applying a saw-tooth type ramp voltage on it, result in a repeated linear scan of (Ψ 1 Ψ 2 ) over a range of 2π. Pieces of C and D are now split off and interfered with each other. The resulting interference pattern along with a feedback signal applied to another PZT (PZT-2) mounted on a mirror in the path of C, can be used to control the value of (Ψ 1 − Ψ 2 ). To be specific, note that, The interference signal is detected by a pair of matched detectors (D 3 and D 4 ). The voltages from these detectors are subtracted from each other using a subtractor circuit. The resultant voltage from the subtractor is added with a bias voltage using an adder circuit and the output of this adder is fed to PZT-2. The feedback loop can lock the interference pattern at any desired position. By performing several correlation operations of two identical images with this setup at different bias voltages, we find the bias voltage that gives the maximum peak value of the cross-correlation signal. At this position,
With the system locked at this position, Ψ 1 Ψ 2 is varied over a range of 2π by applying a ramp voltage to PZT-1, as mentioned above. The response of the feedback loop should be faster than ω s , in order to ensure that the servo can hold Ψ 0 1 − Ψ 0 2 to a constant value. While the scanning is going on, the signal is passed through an LPF with a bandwidth less than ω s . This low-pass filtered version of S is then processed to yield cross-correlation signals only, as discussed in Section 2.
While the image detection process is going on, the stability of the phases should be checked after some characteristic time, T c . This characteristic time is defined as the time during which (Ψ 1 − Ψ 2 ) can drift within a certain allowable range, for example, a few milliradians. After time T c , we have to adjust the bias voltage again and perform several correlation of two known images with the HOC to get the highest correlation peaks. The characteristic stability time would depend on the stability of the optical mounts, and can easily exceed hundreds of seconds in a welldesigned system.
POSSIBLE FUTURE WORK TO SPEED UP THE OPERATION OF THE HOC
In describing the architecture of the HOC in Section 2, we have considered the use of commercially available components, such as cameras, FPGAs, and SLMs. However, it is obvious from the analysis that the overall process is severely slowed down during the serial communication between these devices. In order for the HOC to achieve its ultimate operating speed, it is thus necessary to resort to novel components that operate in parallel.
Consider the set of steps (shown in Fig. 2 ) whereby signals captured by the cameras are processed to produce the signal S, which then appears as an optical field at the output of the final SLM (SLM-3). This process is serving as a conduit between signals that start in the optical domain (i.e., inputs for the cameras) and end in the optical domain (i.e., output of the SLM-3). With proper use of current technologies, it should be possible to combine these tasks in an integrated graphic processing system for high-speed operation, as shown in Fig. 4(a) . Figure 4(b) shows the block diagram of the specialized integrated graphic processing unit (IGPU) that combines, in parallel, the FPA, the application-specific integrated circuit (ASIC) chip for signal processing, and the highspeed SLM.
Briefly, FPA-1A, FPA-1B, and FPA-1C would capture, respectively, signals, A 1 ≡jM 1 Cj 2 , B 1 ≡jM 1 j 2 , and jCj 2 , corresponding to the reference channel [12] . Similarly, FPA-2A, FPA-2B, and FPA-2C would capture, respectively, signals, A 2 ≡jM 2 Dj 2 , B 2 ≡jM 2 j 2 , and jDj 2 , corresponding to the query channel. The FPA would be connected to the ASIC chip through indium bonding, for example. The ASIC chip would consist of 512 × 512 signal-processing units, corresponding to each pixel and geometrically matched to the FPA. Each unit of the ASIC chip would have two analog subtractor circuits, one analog multiplier circuit and an analog LPF. The signals from the matching pixels in the three FPAs (FPA-1A, FPA-1B, and FPA-1C) will be processed by a single subtractor element (denoted as SUB-1) in the ASIC chip, producing the signal
Similarly, FPA-2A, FPA-2B, and FPA-2C, along with SUB-2, would generate the signal array S 2 , corresponding to the query channel. These two signals, S 1 and S 2 , would be applied in parallel, to an analog multiplier, producing the signal array S. The signal S is passed through an LPF to get rid of the convolution terms, as discussed in Section 2. On the back end of the ASIC chip array would be the SLM array, connected via indium bonding.
The speed of operation of such an IGPU would, of course, depend on the specific technology employed. In what follows, we describe a specific example of the technology that could be employed to realize each part of the IGPU, thus enabling us to reach a definitive estimate of the speed of operation.
Consider first the FPA. One possible choice for this would be an array of nanoinjection detectors [13, 14] . These detector elements operate at a low bias voltage (∼1 V) at room temperature, and have a response time of 1 ns. Consider next the subtractor elements. Each of these could be implemented easily with a simple operational amplifier consisting of as few as six transistors [15] . The response time of such an operational amplifier is expected to be similar (∼0.3 us) to that of a bulk operational amplifier chip, such as LM741C. The multiplier could be realized with the Gilbert cell, which requires only six transistors [13] . The properties of such a multiplier should be similar to that of a bulk multiplier chip, such as AD835, which is a complete four-quadrant, voltage output analog multiplier. It generates the linear product of its X and Y voltage inputs with a rise time of 2.5 ns. The LPF can be implemented with 0.1 nf capacitance and 10 ohm resistance would have a response time of 1 ns.
Finally, the SLM could be realized with an array of highspeed stepped quantum wells [16] [17] [18] [19] [20] . An SLM based on these elements has a response time of ∼16 ns, which is orders of magnitude faster than the conventional SLM discussed in Section 2.
Therefore, the IGPU would take less than 0.4 μs to perform the whole process of capturing optical signals, computing the signal S and converting it back to optical domain. In contrast, for performing the same operation of 512 × 512 image, the technology discussed in Section 2 takes about 22 ms since the data communication is done serially. Thus, the IGPU would lead to a speed-up of operation by a factor of nearly 5 × 10 4 . Of course, we have simply outlined a sketch of the type of the IGPU that is required to make the HOC achieve its ultimate speed. We are working with collaborators to design and implement this chip, and results from these efforts will be reported in the future.
RESULTS OF NUMERICAL SIMULATIONS
As discussed in Section 2, the convolution terms can be washed out by scanning continuously (Ψ 1 Ψ 2 ) over a range of 2π at a certain frequency ω s , while keeping (Ψ 1 − Ψ 2 ) zero. This can be verified through some simulation results, shown in Fig. 5 . Two identical but shifted images [as shown in Fig. 5(a) ] are inputs to the HOC architecture without phase stabilization circuits. When phase stabilization and scanning technique is not implemented and Ψ 1 and Ψ 2 are both kept zero, FPA detects a signal jS f j 2 which contains both convolution (T 1 and T 2 ) and cross-correlation (T 3 and T 4 ) terms, as shown in Fig. 5(b) . Another case without phase stabilization and scanning technique is shown in Fig. 5(c) . where Ψ 1 and Ψ 2 are both set to the value of π∕3. In this case also, the convolution terms appear in jS f j 2 along with the crosscorrelation terms. Next we vary (Ψ 1 Ψ 2 ) from 0 to 2π, in 20 intervals, and average the values of S for these 20 cases. This is, of course, equivalent to the process of low-pass filtering with a bandwidth less than the inverse of the duration for a linear scan of (Ψ 1 Ψ 2 ) over a range of 2π, as described in Section 2. As shown in Fig. 5(d) , this averaging eliminates the convolution terms, and jS f j 2 contains only the cross-correlation terms. In what follows, we will assume that such an averaging/filtering is carried out, and plot only the cross-correlation terms.
Next, we illustrate the behavior of the HOC architecture for various types of objects and reference images. In Fig. 5(d) we have already shown the result for identical but shifted images. Figure 6(a) shows the case where the reference image, H 1 , and the query image, H 2 , are identical and not shifted. The cross-correlation signals (T 3 and T 4 ) are shown in Fig. 6(b) . In this case, the cross-correlation signals (T 3 and T 4 ) have a peak at the center of the detector plane. It is important to note that if the query and reference images are not shifted with respect to one another, then the crosscorrelation peaks will be at the center of the detection plane, overlapping each other. If the images are shifted with respect to one another, the cross-correlations will be shifted symmetrically around the center by a distance corresponding to the shift.
CONCLUSION
We have presented theoretical details and the underlying architecture of the HOC that correlates images using SLMs, detector arrays, and FPGAs. The HOC architecture bypasses the need for photorefractive polymer films by using detectors, and the phase information is yet conserved by the interference of plane waves with the images. The output signal of such an HOC has four terms: two convolution signals and two crosscorrelation signals. The convolution terms can be eliminated by implementing a phase stabilization and scanning circuit, so that the behavior of an HOC becomes essentially identical to that of a conventional holographic correlator (CHC). To achieve the ultimate speed of such a correlator, we also propose an optoelectronic chip which would perform all the electrical processes in a parallel manner. The HOC architecture along with the phase stabilization technique would thus be as good as a CHC, capable of high-speed image recognition in a shift-invariant manner. In addition to shift-invariant property of the HOC, rotation and scale invariant correlation can also be achieved by applying Polar Mellin Transform (PMT) to both query and reference images [21] . With the future implementation of an optoelectronic chip and PMT, the HOC architecture holds the promise of a practical, versatile, and high-speed image recognition system.
In this paper, we have not considered the issue of what the typical SNR would be for the HOC architecture. Obviously, the SNR would depend on the details of the implementation. Experimental efforts are underway in our group to demonstrate the operation of the HOC, and the issue of the SNR as well other possible practical constraints would be addressed in the context of reporting on the outcome of this experimental effort.
