Digital correlators play a significant role in dynamic light scattering (DLS) technology, which characterizes particle size distribution. We present a field programable gate array (FPGA)-based digital correlator that can be applied to process DLS data. To satisfy the DLS requirements in the FPGA logic with limited resources, a multiple lag time period (multi-τ ) method is employed that does not require storing the full dataset in memory. Moreover, the device directly accepts the transistor-transistor logic (TTL) signal from the photon counting detector by measuring the time intervals between photon events and calculates the autocorrelation functions in real time. Furthermore, we derive estimates for the error arising from the use of the multi-correlator. We implement all the necessary operations in a single Xilinx FPGA chip with a lag time from 10 ns to 45 min, including a highly optimized photon counter.
Introduction
Dynamic light scattering (DLS) is used to measure the size of nanoparticles. With this method, it is possible to determine the diffusion coefficient of dispersed particles in a liquid by measuring fluctuations in the intensity of scattered light using the correlation function. A digital correlator is a device capable of calculating correlation functions in a DLS experiment by digitally processing the signal from a photon detector in the form of a pulse stream. Commercial digital correlators provided by Laser Vertriebsgesellschaft mbH (ALV) (Langen, Germany), Brookhaven Instruments (Holtsville, New York), and Correlator.com (Bridge-water, New Jersey) have good performance with a high dynamic range.
However, these digital signal processor (DSP) devices have the disadvantages of inflexibility, high cost, and complexity of implementation, and it is worth considering alternative complementary approaches for ideal measurements. Currently, there are various types of customized digital correlators known for their implementation of this technique [1, 2, 4, 6] . In DLS experiments, the real-time computation of correlation functions has to be handled over a large lag time period range with nanosecond resolution, which is associated with the concept of dynamic range. At the same time, the sampling time interval frequency for counting photoelectrons should be no more than 10 ns. To meet the required sampling rate, we implement a high-speed counter that operates at a frequency of 800 MHz. With the help of the multi-τ (τ is lag time period) technique, it is possible to balance the trade-off of resolution and dynamic range. Thus, the correlator is very efficient in terms of performance. Data analysis is initiated only after the data collection is complete. Our aim in this paper is to perform measurements with such conflicting criteria on an effectively implemented hardware correlator system.
Correlation theory and correlator architecture
With a DLS setup [11] , the autocorrelation function of intensity scattered by micro-particles is used to retrieve their particle size distribution [13] through some algorithm such as CONTIN [16, 17] , Bayesian method [18] , regularization method [14, 15] , neural network method [19] and etc. The autocorrelation function g(τ ) of scattered light expresses the relationship between a function (signal) I 1 and its shifted copy I 2 with the time interval τ .
where N is the total realization time.
The basic operation principle of a digital correlator is to calculate the autocorrelation function at certain lag times τ j . In a discrete system, the autocorrelation function is expressed as:
where j is the index of the correlation channel (j = 0,1,2,..., J -1), the parameter t(i) is the photon arrival time of the i -th sample (detected event),
and N is the total number of samples measured during the experiment. For the entire correlation system, τ j (τ j = j * δ t ) has parallel but different values τ j : τ min < j * δ t < τ max to obtain the discrete approximation in real time.
Since δ t is the sampling time produced by delay element δ, its interval is derived from the clock frequency f of the j -th channel f ch(j) .
Dynamic range is a key parameter of a correlator that mainly depends on the number of correlation channels [5] . A linear correlator is a system consisting of a chain of correlation channels. If f ch(0) = f ch(J−1) in a linear correlator, then to achieve the desired dynamic range, a large number of correlation channels must be used, which is not feasible for hardware with limited resources. A more appropriate algorithm for a digital correlator is the multi-τ correlation technique, as described previously [8, 3] , which allows increasing the dynamic range without increasing the number of correlation channels. The disadvantage of this technique is that averaging introduces a systematic error. Therefore, the data must be normalized by a symmetric normalization procedure [9] .
Eventually, our photon correlator, implemented with the multi-τ technique, has S = 35 correlator blocks, each consisting of P = 8 equally divided correlation channels, except that the first block has P = 16 correlation channels. These parameters are set in this order to achieve high accuracy [2, 10] . The sampling time interval for the first block is δ t(0) =10 ns, whereas it is increases with the factoor of n = 2 for every next block and reaches δ t(S−1) ≈ 2.9 mins in the last block. The entire correlation function is calculated internally using 288 correlation channels.
System design and its implementation

Correlator structure and process control
The design of our hardware photon correlator implemented on a generalpurpose Nexys Video evaluation board (Digilent, USA) equipped with Artix 7
FPGA chip (XC7A200T) as illustrated in Fig.1 .
As shown in Fig.1 , all the functionally interdependent design components (the counter, autocorrelation unit, random access memory (RAM), multiplexer (MUX), soft processing core, local memory, universalasynchronous receivertransmitter (UART)) are supplied with reset and clock synchronization process control signals from the same sources (reset system and clock configurator) simultaneously, as indicated by the arrows (yellow). We set the system clock configurator to generate the system clock frequency f sys with 100 MHz.
In the correlator unit (overall correlator blocks), in addition to reset and clock synchronization signals, there are four more process control signals: clock enable, clear, start and stop.
According to the structure of the multi-τ correlator, the first block clock frequency f blk(0) is constant; thus, f blk(0) = f sys , and from f blk(1) to f blk(S−1) , the clock must be generated with different frequencies, as demonstrated in Fig. 2 .
The clock enable signal e s is connected to the clock cycle counter c s , where s is the index of the correlator block (s = 0, 1, 2, 3, ..., S − 1). The clock cycle counter c s range is from 1 to e s . Thus, whenever the clock cycle counter c s output value is equal to e s , the clock enable signal is activated and remains inactive until looping returns this condition again. This method decreases the consumption of the flip-flops, thus increasing the performance and reducing the power supply in the design.
4
Another process control signal of the correlator unit is a clear signal used to reset the accumulator data in the correlation channels. The capacity of the accumulator is 64-bit, and all the accumulators data in the autocorrelator unit must be read out to the internal block RAM for data storage with a 20-s cycle to avoid overflow.
In our design, the correlator system is controlled by finite state machines (FSM). In turn, the FSM send the start and stop signals to the autocorrelator unit (see Fig. 2 ). In general, there are four states optimizing the correlator processing system: 1, the initial idle state; 2, the ready for data processing state; 3, the data processing state; and 4, the end operation.
In FPGAs, interconnect systems are used to tie processors to peripherals.
The Advanced Extensible Interface (AXI) interconnect makes a connection between the processing system and programmable logic in the stream terminology by mapping to the processor memory, and it also provides a point-to-point unidirectional connection from the soft processing core to the UART peripheral interface for sending the correlator results.
To print the correlator results, there are two nested cycles. In the first cycle, the processing core selects a memory (RAM) row to print, and in the second cycle, the processor selects a memory column to print. The embedded software library of the processing core is located in local memory (see Fig.1 .).
The functionality of the soft processing core mainly depends on the machine states of the correlator and acts in accordance with its state changes, except for the final state. In the 4 state, the processor core starts to continuously send a data stream containing the autocorrelation results to a personal computer (PC) via a universal serial port (USB). Thus, the PC interface stores all the received data in a dedicated file.
The data flow steps through the system represented by sequence number are shown in Fig.1 .
Photon detection/counting module
This module is used to connect the correlator with the photon detector and guarantee adequate photon detection efficiency. In our case, the scattered light is measured by a photon detector (Hamamatsu H10682-210) at four different scattering angles (θ =15 In this way, we deliver photon pulses to the FPGA peripheral module (PMOD) interface with the 10 ns output pulse width of the photon detector.
The time-correlated single photon counting (TCSPC) technique is effective due to its ability to count detector pulses within defined time intervals. Since our correlator is designed for data acquisition with a TCSPC technique, we use a fast gate in the front of the counter, which samples 1-bit data at 800 MHz.
Then, with a deserializer the signal converts to 8-bit and sends to a counter which outputs the data with 100 MHz. The operation principle of counter based on counting the number of clock cycles (running with a frequency of 800
MHz) between two events in a data set. Fig. 3 presents an example simulation waveform of the counter module.
According to the principle of operation of the counter, it takes the time interval of the clock cycle of 1.25 ns as 1 ns, which leads to the fact that the output value of the counter differs from its initial value. Then, to restore the actual data, the counter output data must be multiplied by 1.25 (10 ns/8).
Fractional number multiplication in hardware multipliers has the problem of losing data accuracy. To avoid this problem, accumulated correlation functions are multiplied by 1.25 2 when they are collected by the PC.
Experimental results
The correlator designed in section 3 was mounted in a DLS setup designed by our research group [11] and was tested with a couple of polystyrene spheres produced by Suzhou Nanomicro Technology Company. The theoretical diameters of the particle samples were 240, 360, 530 and 805 nm with an accuracy on the order of 1 % or less. In the preparation of the solution, we added one drop (concentration of 1% in 10 ml) of latex to dust-free water. The prepared samples were placed in a 9-mm diameter cylindrical quartz cell. The parameters we used for the DLS data simulation and experiment are as follows: temperature T = 298.15 K, viscosity η = viscosity 0.89 mPas, and a linearly polarized diode pumped laser (CrystaLaser) with wavelength λ = 532 nm and a corresponding particle refractive index is n = 1.59. The laser beam illuminated the beads, and the scattered light was collected by detectors placed at scattering angles of (θ =15
• , 30
• , 45
• , and 60 • ).
In 
where the the baseline B, the amplitude β, and decay rate Γ are the fitting parameters.
According to the principles of DLS, the decay rate of the generated autocorrelation curve is in the range of 100 ns -45 min (10 to 100 ns is the time required for the device to warm up), as shown in Fig. 5 .
The particle size calculated from the decay rate of correlation curve with classical Einstein-Stokes equation [2] and the relative error for the four different particle diameters at four different scattering angles are between 1.2 and 5.7%
with an average deviation of 3% as shown in Fig.6 .
Conclusions
A key part of a photon correlator is a real-time correlation operation over a large span of lag time with ns sampling resolution. Among the most recent and best publications related to the FPGA correlator, we note that the multi-τ 
