We discuss architecture, subsystems, and applications of a multichannel digital correlator we constructed by using a combination of optical and electrical delays. Our system is capable of a 1-ns bin width in normal operation, when a derandomizer is used at the input stage and a 0.25-ns bin width when the derandomizer is bypassed. When switchable delay lines are used, the 16 real-time channels can access a 160-ns delay range, providing up to 160 channels at 1-ns bin width or 640 channels at 0.25-ns bin width.
Introduction
Clipped digital correlators for photon-counting correlation 1,2 have been commercially available for over two decades. For an electronic correlator capable of real-time correlation, the minimum bin width reported to date is 10 ns in a 256-channel instrument 2 designed for photon correlation, with AND gates used to perform the multiplication operation and a clocked shift register to implement the delays. This corresponds to a Nyquist frequency of 50 MHz. External clocking of a Malvern K7026 correlator ϳ8-ns bin width has been reported, 3 and a small number of 5-ns units have been developed from it.
Other schemes for which scalers 4 are used are able to produce histograms of time intervals between counts or histograms of arrival times, 5 producing an indirect way of evaluating the correlation function. 6 Bin widths of 5, 7 2.5, 5 and even 0.5 ns 8 have been reported.
An alternative correlator architecture in which passive fiber-optic delays and return-to-zero format signals are used was proposed 9 to overcome the bandwidth limitations due to the difficulty in achieving synchronous clock distribution. A 10-ns, 8-channel prototype that uses this architecture was previously implemented in our group. 10, 11 In this paper we present a higher-speed version, with 16 real-time channels that use switchable passive-delay lines to multiply the effective number of channels, for stationary signals, up to 160 channels spaced by 1 ns. This corresponds to a Nyquist frequency of 500 MHz. Fiber-optic passive delays are used for the incremental-delay tree, but an electronic splitter and cable delays are used in the equal-delay tree.
In addition, four steps of 0.25-ns increment are introduced within the 1-ns time lag. As the derandomizer cannot work at the equivalent clock rate of 4 GHz, this signal is not derandomized. In this mode of operation we have demonstrated a potential retrieval of up to 2-GHz signals from noise. Section 2 of this paper gives a brief description of the evolution toward a faster equivalent of the electronic Malvern correlator. Section 3 is dedicated to the use of different configurations of switchable delay lines to increase the number of channels. Bandwidth considerations are mentioned briefly in section 4, and the subsequent sections are devoted to the AND gates ͑Section 5͒, counters ͑Section 6͒, derandomizer ͑Section 7͒, front-end interface ͑Section 8͒, distributor ͑Section 9͒, and hardware implementation ͑Section 10͒. Section 11 refers to laboratory demonstrations and applications.
Evolution of Correlator Architecture

A. Theory of Clipped Digital Correlation
Sampling and quantization are necessary to implement digital correlation. Assuming a regular delay interval , the discrete correlation function C xy ͑n͒ of two signals x͑t͒ and y͑t͒ is given by
where p and P are integers, as is n, the channel number.
For high-bandwidth real-time correlation functions to be obtained, the input functions x͑t͒ and y͑t͒ are commonly quantized to single-bit values ͑clipped͒. This can be achieved with a discriminator, which gives two possible levels. If these are treated such that logic 1 represents ϩ1 and logic 0 represents Ϫ1, the multiplication table is identical to the truth table of an exclusive NOR ͑XNOR͒ logic gate. If instead logic 1 is treated as ϩ1 and logic 0 is treated as 0, the multiplication can be implemented with an AND gate. The former case ͑XNOR͒ is known as bipolar processing and the latter ͑AND͒ as unipolar processing.
For trains of photons, no amplitude information is present, and logic 1 and logic 0 represent merely the presence or the absence, respectively, of a photon. Thus, in this case, unipolar ͑AND͒ processing is appropriate and single-bit quantization provides a good representation of the input signals.
B. Evolution of Functional Implementation
Electronic Shift-Register-Based Correlator
The fastest commercially available clipped digital correlators, such as the Malvern K7026, implement the delays with a digital shift register clocked at the sampling rate. The architecture, shown schematically in Fig. 1 ͑top͒, uses AND gates to perform the multiplication. The output of each AND gate is averaged by that channel's counter.
The shift-register architecture has given rise to commercially available 10-ns correlators constructed with emitter coupled logic devices. A significant increase in speed is prevented by the need to route the clock to all the shift-register elements synchronously.
Fiber-Optic Passive-Delay Correlator
To overcome the difficulty of synchronous clock distribution, it was proposed 9 that the shift register be replaced with a passive fiber-optic delay tree, as shown schematically in Fig. 1 ͑second diagram͒. In this case, the clock is used only in the derandomizing process.
An optoelectronic AND gate was suggested 9 and later implemented 10, 11 in our group. We performed the AND operation by first detecting the optical signals from the two input fibers on separate areas of the same photodiode, effectively performing incoherent summation, followed by discrimination to detect only the presence of simultaneous pulses.
The use of this photodiode summator-based AND gate requires input optical signals with minimal variation in amplitude from each of the input fibers for maintaining high coincidence efficiency.
Fiber-Optic Architecture with Electronic Logic
When the number of channels N is increased, it becomes increasingly time consuming and expensive to ensure approximately equal amplitudes in both inputs of each photodiode summator. In addition, for a multigigahertz bandwidth, small-area photodiodes are needed, which are commonly smaller than the separation of the two fiber cores ͑typically 125 m͒. In such a case, the fibers cannot be simply butt coupled to the photodiode, so the use of additional directional couplers may be necessary.
A better solution is to use separate photodiodeamplifier combinations for each input fiber and con- ventional electronic AND gates, as shown in Fig. 1 ͑third diagram͒. This provides greater immunity to amplitude variations and noise compared with the case of the summator-based AND.
Hybrid Fiber and Electronic Passive-Delay Correlator
In the architecture described above, the tree providing equal delays performs an unnecessary transfer from electrical format to optical format and back to electrical format. In the delayed branch, however, the superior bandwidth of fiber optics is required for the implementation of large delays. The equaldelay tree has only short delays, suggesting that it can be implemented with an electrical distributor and moderate quality, inexpensive microwave cable, as shown schematically in Fig. 1 ͑bottom diagram͒ . This is the architecture adopted 12, 13 in the correlator presented here and has properties similar to other passive-delay architectures. As an added benefit, the use of electronic delays for the equal-delay tree allows delay adjustments to be made by the adjustment of the cable length, thus relaxing the need for precise fiber-length adjustment, which is considerably more difficult to achieve.
Expansion of the Number of Channels
Because of the relatively high cost of each channel in the fiber-optic digital correlator, it has been proposed 9, 11 that the number of channels could be effectively increased by use of an optical switchable delay line. This approach means that the correlation function is no longer generated in real time but is built up sequentially at a rate reduced by a factor N e ͞N r , where N e is the effective number of channels and N r is the real number of channels. Concomitantly, the time to achieve the same accuracy is increased by the factor N e ͞N r . A second disadvantage of this approach is that it restricts the correlator's application to statistically stationary signals only. However, this is not a characteristic of fixed-delay correlators, but merely an inexpensive way to overcome the high cost associated with the implementation of a large number of hardware channels.
A. Multiplying the Channel Delay Range
In a correlator in which the number of real channels N r ϭ 16 and the bin width ϭ 1 ns, the correlation function is evaluated at delays of 0, 1, 2, . . . , 15 ns. Through the insertion of an additional delay, equal to N r ϭ 16 ns, at the input of the incremental-delay tree, the measurements are repeated to obtain the correlation values at delays of 16, 17, 18, . . . , 31 ns. When similar incremental delays are added, the effective number of correlator channels may be increased at marginal extra cost. For example, when 5 switchable delays are used, 80 effective channels are achievable, providing correlation values for 0, 1, 2, . . . , 79 ns.
B. Interleaving Switchable Delay Lines
The same range of effective channels can be achieved if the optical delay increment is made equal to 5 ns and the switchable delay increment is made equal to 1 ns ͑giving a choice of 0, 1, 2, 3, or 4 ns͒. In this case, the initial pass collects the 16 correlation values for 0, 5, 10, . . . , 75 ns. Repeating this at 1-ns increments, 4 other sets of 16 points each are interleaved among the points that were initially collected. For example, introducing 1 ns extra in the delayed branch provides the 16 values for 1, 6, 11, . . . , 76 ns, and, after all 5 switchable delays are used, the same collection of 80 correlation terms for 0, 1, 2, . . . , 79 ns is obtained.
If all the delays are implemented optically, there is practically no difference between the two versions, but the last arrangement suggests that for the short delays in the switchable delay line unit, microwave cables could be used instead of optical fiber. Microwave cables are generally not suitable for the longer delays because of higher losses, low bandwidth, and cross talk.
The advantage of using microwave cables for short switchable delays is that microwave switches currently available are less expensive than integrated optical switches and provide a higher extinction ratio.
In summary, the optimum structure consists of a combination 14 -16 of microwave and optical components. The longer delays can be realized in optical fiber because of the superior bandwidth, volume, and cross-talk immunity, whereas the shorter delays can be fabricated with moderate-quality microwave cable.
In the correlator presented here, an optical tree with a 5-ns delay increment is used, in series with a 5-step 1-ns switchable delay ͑to interleave delay values͒ and a 2-step ϫ 80-ns switchable delay ͑to multiply the delay range͒.
The 5-step ϫ 1-ns electronic switchable delay line, shown in Fig. 2 as ESDLc, consists of two single-pole five-throw ͑SP5T͒ microwave switches with five coaxial cables that give incremental differential delays from 0 to 4 ns. The states for both switches are selected from the same electronic control lines to al- Fig. 2 . Switchable delay block ͑SDB͒: ESDLf, electronic switchable delay line ͑fine͒, 4 steps of 0.25 ns, implemented with microwave cables; ESDLc, electronic switchable delay line ͑coarse͒, 5 steps of 1 ns implemented with microwave cables; Cable numbering shows differential delay in nanoseconds; SP4T, SP5T, 4-GHz pin-diode switches; SPDT, GaAs, 3-GHz switch; SODL, switchable optical delay line, 2 steps of 80 ns; E1, laser driver; L1, 1-mW, 3-GHz, 1300-nm laser pigtailed to single-mode fiber; PD0, PD1, 2-GHz Ge avalanche photodiode; DC, single-mode directional coupler; 80-ns differential delay between the fiber outputs. low only one cable to make the connection from input to output at any time. The coaxial delays have successive increments of 1 ns.
In addition, we constructed a fine-incremental switchable delay, ESDLf, to provide a 4-step 0.25-ns delay; we used the same construction as for the 5-step 1-ns delay.
The 2-step ϫ 80-ns switchable optical delay line, SODL, was implemented as shown in Fig. 2 . Here, two fiber-optic delay lines, whose delays differ by 80 ns, are illuminated by laser L1 by means of a singlemode 1 ϫ 2 fiber-optic directional coupler, DC. Optical fiber is preferred here because of the large delay. The transmitted signals are detected by photodetectors PD0 and PD1, and the output from one of them is selected by means of a microwave single-pole double-throw switch, SPDT.
In normal operation only the ESDLc and the SODL are switched and the effective number of channels is 16 ϫ 5 ϫ 2 ϭ 160, which determines the correlation value in 1-ns steps from 0 to 159 ns. When all three switchable delay line blocks are used, the effective number of addressable channels is 16 ϫ 5 ϫ 2 ϫ 4 ϭ 640, which provides correlation values from 0 to 159.75 ns in 0.25-ns steps.
In the final version, under computer control, the number of channels and the bin width can be selected from any row in Table 1 .
The detailed configuration of the correlator is shown in Fig. 3 . Using the switchable delay block, SDB, in the equal-delay tree meant that it was necessary to insert a compensating optical fiber delay, FD, before the input of the 1 ϫ 16 tree, IDT.
Bandwidth and Rise-Time Considerations
The rise times of analog components in the system contribute to jitter in the timing of the pulses. For our target of a 1-ns correlator, we chose to design our transmission system rather conservatively, with an overall rise time of roughly one-quarter of the sampling period, i.e., t r ϭ 0.25 ns, giving us the overall analog half-power bandwidth f 3dB Ϸ 0.35 Ϭ t r ϭ 1.4 GHz.
For correct operation, the transmission system in each tree must achieve the design bandwidth. This means that, in each case, the collective rise time of all the components between the digital signal input x or y and the AND gate input must be less than or equal to the maximum acceptable value of t r . The collective rise time t r of a series of m components is a convolution of their individual rise times such that t r 2 ϭ t 1 2 ϩ t 2 2 ϩ . . . ϩ t m 2 , so, assuming that the individual rise times are equal ͑t m ͒, the individual rise times required are t m ϭ t r ͞ ͌ m. If we assume that there are m ϭ 5 bandwidth-limiting units ͑e.g., derandomizer, laser driver, laser diode, photodiode, and amplifiers͒, then to obtain t r ϭ 0.25 ns, our target should be t m ϭ 0.11 ns, corresponding to a 3-GHz analog bandwidth. Where possible, components with at least a 3-GHz bandwidth were used. Because the fiber bandwidth throughout our correlator is far in excess of 3 GHz, as indicated below, it does not contribute significantly to the overall rise time.
A. Electrical Cable Bandwidth
The correlator required a high-bandwidth cable at various locations. For lengths less than 0.5 m, cable RG188A ͑attenuation 0.67 dB͞m at 0.4 GHz͒ was used, while for longer lengths, cable such as RG402 ͑0.36 dB͞m at 1 GHz͒ was employed.
B. Choice of Fiber
In choosing the fiber for the 16-way incremental-delay tree, IDT in Fig. 3 , we needed to ensure that dispersion was sufficiently small to maintain the pulse rise time, balanced against cost and launched optical power. We selected 50͞125-m gradient-index multimode fiber for this tree and an 850-nm pigtailed laser because the power available was ϳ3 to 10 times greater than that from equivalent lasers pigtailed to a single-mode fiber at either 800, 1300, or 1550 nm.
As the maximum delay was 200 ns ͑40 m of fiber͒ in this tree, including delay FD, we estimate 11,17 that a bandwidth of 10 GHz was achieved for this delay. Another advantage of using a multimode fiber is that multimode fused couplers are significantly less expensive than single-mode equivalents. In block SODL ͑Fig. 2͒, a 1300-nm laser diode pigtailed to a single-mode fiber is used. This combination offers a higher bandwidth than an equivalent 800-nm delay.
AND Gates
To implement unipolar multiplication for each of the 16 correlator channels with a suitably short coincidence time, we planned to use four Sony Quad AND gate integrated circuits ͑Model CXB1101Q, 3 GHz͒, providing 16 gates in total. Unfortunately, the high cross talk between gates on the same substrate prevented the use of all the gates. Hence, rather than purchasing 16 such integrated circuits and using only one AND gate per chip ͑a costly and inefficient approach͒, we adopted a low-cost solution 18 based on a double-gate GaAs modulation-doped field-effect transistor ͑NE25139 from NEC͒.
The time window of each channel is given by the transfer characteristic of the AND gates, FWHM less than 0.35 ns. Hence, when the incoming pulses have delays shifted by Ϯ0.5 ns from the channel value, they contribute much less to the counts in that channel and adjacent channels, 1 ns away.
Counters and Interface
The counter block, C0, . . . , C15 is identical for all 16 channels. It consists of a 3-GHz 4-bit CXB1106Q Sony counter, an emitter coupled logic to transistortransistor logic translator, followed by a 74F193, 125-MHz 4-bit counter and an Altera programmable logic device ͑EPM5032DC-2͒ designed to operate as a 16-bit counter and to handle the output lines from the previous counters to the interface. Thus the counter array for each channel is capable of 24-bit counting at up to 2 GHz. Counting the number of flags ͑counter overflow events͒ with the computer, the system can take data over an extremely long measurement time, limited only by the computer memory.
A digital input-output board was configured as the interface CI ͑see Fig. 3͒ , to handle the counter contents and control the switchable delay lines and all the different operating regimes, including enabling and resetting the instrument.
Derandomizer
The role of the derandomizer is to synchronize the data with a clock signal whose period is equal to the bin width. When a pulse comes within the clock pulse width, that clock pulse is transferred to the derandomizer output. When the pulse falls inside the next halfperiod of the clock pulse, the following clock pulse is sent to the output. The minimum time interval between successive pulses that may be processed is just half the bin width ͑0.5 ns in our case͒, so almost no dead time is introduced in the derandomization process. This is a dual-path procedure, and in this way no pulses are lost, but only their time interval statistics are altered by up to half of the bin width.
The derandomizer is the most critical part of the system in terms of speed. For a 1-ns correlator, it should produce 0.5-ns FWHM pulses synchronous with a 1-GHz clock. The rise and fall times should be kept as small as possible. Derandomizers DR1 and DR2 are built with GigaBit Logic circuits with 150 -170-ps transition times: D flip-flops ͑10GO21A͒ and comparators ͑10GO13͒.
In the case in which the system is used to retrieve a signal from the noise whose period is an integer multiple of the delay time, no derandomizer is needed. This represents an advantage of the passive-delay correlator over the electronic clipped correlator in terms of hardware, derandomization noise, and channel selectivity.
Front-End Interface
The instrument has been equipped with three photodetector modules to cover a large range of wavelengths and applications: a microchannel plate photomultiplier tube ͑Photek-PMT 213͒, capable of resolving 0.3-ns pulses, a Ge avalanche photodiode ͑APD͒ ͑GAV30 from Germanium Devices, with 2-GHz minimum bandwidth͒, and a Si APD ͑PD1002 from Mitsubishi, with 2-GHz minimum bandwidth͒. All three modules are stand-alone units, containing high-gain, 3-GHz, GaAs amplifiers.
Distributor
The distributor, 1:16-D in Fig. 3 , has been organized in five blocks, each of which is a 1 ϫ 4 GaAs distributor, which we built in-house. Coaxial cable is used to interconnect the 5 distributor blocks and to connect the 16 outputs to the AND gate inputs. The channel delays are adjusted by modification of the cable lengths.
Hardware Design and Implementation
The system is mounted in a S6II Vero 9U case ͓stan-dard 19-in. ͑48.26-cm͒ rack͔. Each channel is equipped with ͑1͒ an optical input with an APD ͑PD1002 Mitsubishi, 2-GHz minimum bandwidth͒, with a high-gain amplifier ͑10-k⍀ transimpedance, 3 GHz, 18 dBm͒; ͑2͒ an electronic input; ͑3͒ the AND gate; and ͑4͒ the counter, fed by the output of the AND gate. Four channels are implemented in a module, and there are four such modules in the correlator. The distributor is placed in close proximity, to minimize the lengths of the coaxial cables to the AND gates in each channel.
The interface, CI, is placed inside the computer, PC, and the 1 ϫ 16 multimode tree, IDT, in Fig. 3 is placed in a separate unit.
Laboratory Demonstrations
This section refers to performance tests of the correlator and experiments in different applications. Two different regimes were tested. In the normal regime, the ESDLf is not switched, so the bin width is 1 ns. When the ESDLf is switched, the bin width is 0.25 ns and the signal is applied directly to the two internal inputs, X and Y in Fig. 3 , bypassing the derandomizer.
A. Performance Tests
For the tests below, the system was used to perform the autocorrelation function in the normal regime. Figure 4 shows the results produced for a 1-s data collection time, when a symmetric 500-MHz rectangular pulse train ͑the maximum signal frequency for the 1-ns bin width͒ was applied.
For perfect agreement with theory, the number of counts in the odd channels should be all zero and those in the even channels should be 5 ϫ 10 8 . The logarithmic scale highlights small discrepancies of fewer than 20 counts in channels 27, 29, and 37. Figure 5 shows theoretical and experimental autocorrelations of a 3-ns separation pulse pair repeated every 10 ns. The width of each rectangular pulse is 0.5 ns, which represents half of the correlator's bin width. The top sketch graph represents the original pulse train and below is its theoretical autocorrelation. The bottom graph is the digital autocorrelation measured with the first 40 channels of the correlator for 1-s integration time. The derandomizer allows pulses to be separated only by integer multiples of 1 ns, and, as expected, the triangular profile was not recovered. In principle, it is possible to retrieve the triangular shape of the autocorrelation only for pulse widths larger than the bin width. The linear display was chosen such that the experimentally obtained autocorrelation function can be readily compared with theory. Figure 6 shows the correlograms for rectangular pulses of 4-ns width repeated every 10 ns. To show the correlogram shape, we have represented the results again on a linear scale. An interpolation procedure ͑thick line͒ now recovers the triangular shape. Slight deviations from the triangular shape originate both from the statistical nature of the derandomizer operation and from the ac coupling used at various stages in the correlator. Figure 7 shows the autocorrelation of the signal used in Fig. 4 , but now in the presence of noise. The noise was produced by amplifying the output from an APD driven above the breakdown voltage and added to the input signal. The noise level was selected such that the noise alone produced 10 7 counts͞s, once it had passed through the derandomizer. Because of the noise, correlation values in the odd channels are nonzero.
The results in Figs. 4 -7 confirm that the correlation function generated by the instrument agrees with the theoretical predictions.
B. Applications
For the tests below, the correlator was used with 0.25-ns bin width and without the derandomizer, in contrast to the tests illustrated by Figs. 4 -7 . Although the existing derandomizer cannot operate for signals in excess of 500 MHz ͑clock rate no higher than 1 GHz͒, we were interested in determining the response of the instrument beyond 500 MHz. Despite the lack of the derandomizer, useful information about a periodic signal buried in noise could be obtained. In this case the signal was applied by a fast comparator to both internal inputs X and Y shown in Fig. 3 . Whenever the threshold is exceeded, the comparator produces a pulse of ϳ0.45-ns width. In this regime we use a bin width of 0.25 ns by means of the electronic switchable delay line block ESDLf.
For Fig. 8 , we applied rectangular pulses of 4-ns width, repeated every 10 ns, to which noise was added in the same way as in Fig. 7 . Comparing Figs. 6 and 8 shows that the derandomizer is not essential to retrieve the period of the signal, although the shape of the correlogram is distorted. The comparator threshold was adjusted such that without the noise, counts were recorded only in channels 0, 10, 20, 30 ns, and so on. As Fig. 8 illustrates, when noise is superposed on the signal, the comparator threshold may be exceeded at any time within the 4-ns pulse width, giving rise to coincidences at delays adjacent to 0, 10, 20, 30 ns, and so on.
The histograms in Fig. 9 demonstrate the ability of the system to retrieve periodic signals of 1 GHz ͑left͒ and 1.33 GHz ͑right͒ from noise. As shown in the case of the 1.33-GHz signal, the variation of the cor- relogram ͑top͒ over three channels is clearly discernible. The fact that large differences between adjacent channels are observed indicates that the correlator could recover signals with a period of 0.5 ns, although we did not have a 2-GHz generator available to confirm this performance. The Fourier transforms ͑bottom͒ of the two correlograms clearly recover the frequency of each sinusoid.
Potential Applications
Apart from the well-known applications for correlators in laser Doppler velocimetry, 2,19 intensity fluctuation spectroscopy, 19 and radio astronomy, 20 another potentially important application has been identified, namely tomography. In the tomography of neonatal brains, 21 the temporal delays of a large number of spatially separate laser beams injected into soft ͑hu-man͒ tissue are determined. Simple modification to the instrument described here would enable it to make the temporal measurements.
Conclusions
An optoelectronic hybrid correlator has been developed, with 1-ns bin width, capable of a 0.25-ns bin width when the derandomizer is bypassed.
As a main conclusion, the passive optical delay correlator is shown to be a feasible concept. The system, as implemented, with a hybrid architecture may represent an intermediate step toward further speed increase, but in our opinion it is limited to bin widths of 0.1-0.2 ns. Practically, we set up a system ͑0.25 ns͒ very close to the maximum possible speed compatible with this specific architecture. As the delay has an optical and electric component and the optical delays have low drifts and instabilities ͑less than 20 -50 ps͒, the main limitation in this architecture comes from the instabilities of the electrical component delay because of the transit times of comparators, photodetectors, and lasers, which, when compounded, could exceed 100 ps. We devoted special care to this issue in our implementation by using stable power supplies for the critical components and in this way ensuring low timing drifts between switching events.
A better correlator configuration could be implemented only after the development of all-optical AND gates and counters.
In summary, we have achieved at least an order of magnitude of speed increase over existing electronic clipped correlators.
