Abstract -The measurement of light scattering is a well established scientific method for the analysis of particle sizes in macromolecular solutions [1] . The spectrum of applications reaches from physics, over pharmacology, molecular and cell biology, immunology right up to environmental research. The underlying principle of this approach is the extraction of spectral information from light scattering using correlation techniques. The system at present, in the phase of design, represents a novel FPGA based real-time multi correlator system for multi angle dynamic and static light scattering experiments.
I Introduction
Photon correlation techniques [2] based on low intensity, high precision single photon event spectroscopy, have many applications in different scientific and technological areas, such as the characterisation of proteins, polymer latexes, paints and pigments, cosmetic formulations, macromolecular complexes, viruses and vaccines [3] . In case of ink and toner products, the image quality, viscosity and the tendency to aggregate and clog ink delivery nozzles are directly related to the particle size. In the area of pigment research, the knowledge of particle size and distribution is important in developing stable chemical formulations of pigments. The pigment colour and hiding power are strongly influenced by the size of the particles [4] . In light scattering experiments, the detected light in form of single photon counting pulses is well suited for the processing by digital correlators due to their digital nature. However, real-time computing of correlation functions has to cope with the wide spread of time-scales (ns to minutes), corresponding to a dynamic range of 10 10 or more. Simultaneously, a resolution less than 10ns is required to precisely locate the photonic events on the timescale and to discriminate between different pulses. This paper outlines how these conflicting criteria can be balanced by the use of multiple sampling times (Multiple Tau technique) and an efficient hardware integration. Furthermore, the Field Programmable Gate Array (FPGA) based hardware implementation, consisting of 32 parallel Virtual Multiple Tau correlator structures for the use in angular dependent light scattering measurements is discussed.
II Dynamic Light Scattering
Particles suspended in a liquid are never stationary. They are constantly moving due to the Brownian motion, which is defined as the movement of particles due to the random collision with the molecules of the liquid that surrounds the particle [5] . The larger the particle, the slower its motion will be and vice versa. Dynamic Light Scattering (DLS), also referred to as Photon Correlation Spectroscopy (PCS), measures this Brownian motion, also referred to as 'random walk', and relates this directly to the diffusion coefficient and with that to the hydrodynamic radius of the particle. This is done by illuminating the particles with a monochromatic light beam, such as a laser, and analysing the intensity fluctuations in the scattered light. As the particles are in constant move- If an arrangement of particles is illuminated by a laser, the particles will scatter in all directions. As illustrated as in Fig. 1 , the detected fluctuation of the scattering intensity is time and angular dependent. The light scattered from a small 'scattering volume' fluctuates around an average value such the signal looks very noisy. This 'noise' evidently contains information about the dynamics in the liquid, the rate of diffusion, and whether there are correlated molecular motions. A way to extract this information is to calculate the auto correlation function of the detected intensity using:
At very short delays times I(t) and I(t − τ ) are the same and therefore consequently strongly correlated. However, for long delay times the intensity becomes uncorrelated and the correlation function will naturally decay towards zero. Fig. 2 pictures the fluctuations signal caused by large as well as small particles and their corresponding correlation functions. As can be seen, the large particles cause the intensity to fluctuate more slowly then the small ones. Consequently, the correlation function measured from a sample containing large particles takes longer time to decay than a correlation function determined from a solution containing small particles. The curve shape of the correlation function can directly be used to extract informations about the investigated sample. The initial time at which the curve starts to significantly decay is an indication of the mean size of the samples. The steeper the line, the more monodisperse the sample is, i.e. samples with approximately the same size. Accordingly, the more lenghtens the decay becomes, the greater the sample polydispersity. After calculating the correlation function in real-time with dedicated digital circuitry, parameters like particle size and the diffusion constant are extracted from the curve shape by various algorithms, which is done in software on the host system, in this case a personal computer. The molecular weight determination of expanded molecular specimen with high accuracy requires angular dependent measurements. This angular dependency is necessary due to the particle form factor leading for large particles to remarkable angular dependent scattering intensities. The innovation of the presented system is the real-time multichannel architecture which enables these spatial resolved measurements in parallel. The setup of an DLS experiment with 32 Single Photon Detectors (SPD) and 32 corresponding correlators is shown in Fig. 3 . 
III Correlator Architectures
As described in the previous section, a digital correlator performs the comparison of the scattering intensity at successive time intervals to determine the rate at which the intensity is varying. This is done by calculating, in real-time, the discrete approximation of the correlation function stated in (1):
In (2), the detected photon count rate is denoted with n(t), τ represents the time shift (also referred to lag time). The direct hardware implementation of (2) is shown in Fig. 4 . The main parameters of such a system are the sampling time δ = t i+1 − t j , which is the inverse of the system clock and the dynamic range of the correlator. The latter is determined by the spread of the discrete lag times τ j : τ min < j * δ < τ max . Each value τ j = j * δ corresponds to a particular correlation channel and is in a stationary state of the specimen, correlation functions show variations for a wide spectrum of lags (ns to several hundreds of ms). Correlation functions with a dynamic range of 10 12 require similar number of correlation channels. Since hardware resources are limited, the required dynamic range (10 orders of magnitude) is in direct conflict with a fine timing resolution for an accurate approximation of the time integral. There are various approaches to solve this conflict [6] . Instead of using equally-spaced time steps (a so called linear correlator) the delays of succeeding channels may be multiplied by a constant factor (exponential correlator) with δ m ∼ 2 m . This leads to an increased dynamic range. However, for large lag τ -values, the sampling of the correlation function reveals to be coarse. To overcome these disadvantages, the Multiple Tau correlation technique [7] is used. This approach combines the advantages of linear and exponential structures. This technique, shown in Fig. 5 , uses multiple sampling times, but instead of increasing δ for individual channels, blocks of eight or more correlation channels (in the following referred as sampling time blocks STP) with common sampling time are formed and the sampling time is doubled from one block to another. In a Multiple Tau structure the input signals for the different blocks are averaged over longer and longer periods. All events in the delayed and undelayed paths are added up for each block over two sampling periods and serve as input photon count rates for the next block, which operates with half of the clock frequency, e.g. accumulating the original n bit count rates evaluated in block 1 over two 200MHz periods yields the effective n + 1 bit count rates for block 2, clocked with 100MHz. The following sampling time block is again clocked with half of the frequency of the former block, consequently 50MHz. Accordingly, the input count rate (delayed and direct path) for the next block is increased to n + 2 bit. To normalise the calculated correlation function, a symmetrical normalisation as proposed in [8] is applied. This processing, which is performed on the host station, requires the counting of all events in the delayed and undelayed channels as well as the number of sampling time periods during the measurement.
IV System Design and Implementation
The underlying principle of the system is based on the Multiple Tau technique discussed in Section 2. A direct implementation of this structure would lead to an immense demand of FlipFlops and logic cell resources within an FPGA. A few years back, FPGA were actually built for implementing random logic, Flip-Flops were rare on the logic-cell level and the huge amount of global wiring in case of correlator designs strongly degraded the initial available data rates. Hence, correlator structures were only implemented as Semi Custom ASIC Designs. However, the logic cell density of FPGAs as well as their speed has experienced an enormously rise in recent years. Xilinx, one of the two main manufactures of programmable logic devices, has annually increased the amount of logic cells by 56 percent within the years 1994 and 2004 [9] . Another reason, why FPGAs are by now suitable for correlator designs is the large amount of additional embedded hardware features. High performance DSP applications are supported by hardwired embedded multipliers and accumulator cells, which work at higher data rates while consuming less power than conventional cell logic. Storage intensive implementation have been made possible by internal available on chip memory resources. These new features led to the development of so called Virtual Correlator architecture which is replacing the hardwired one-to-one implementations of correlation algorithms by means of intelligent usage of distributed hardware resources on the FPGA-device. Instead of implementing explicit register structures for the respective channels, virtual registers are implemented in form of specifically arranged memory sections. The overall processing structure is divided into two main sections. At first, the data is pre-processed to organise the sampling time block structure known from the Multiple Tau technique. This procedure is illustrated in Fig.6 . All events in the delayed and undelayed path of each sampling time block are added up over two sampling periods and serve as the input photon count rate for the next block, which operates in turn with half of the clock frequency. This is continued for all direct and delayed sampling time blocks to form the respective STB data structures. The following step in the processing chain is the processing of the former arranged data blocks where the multiplication and accumulation is performed, illustrated in Fig. 7 . The first multiplier accumulator module is necessary to implement the actual channel. The second accumulator adds the respective channel results to dedicated locations in a channel store memory block. Different experiments show different counting rates. This can range from 10k up to 1M counts per seconds. The case of multiple photon events during one sampling period, which arises in high photon rate experiments, is taken into account by counting the incoming pulses from the detector. Another task of this so called 'derandomiser stage' is the synchronisation of the random input signal. To decouple this first stage from the processing part of the system, a dual port FIFO is used to buffer the incoming data. The input data rate is limited through the detector performance and is assumed to be approximately 20Mhz. The following data arrangement and processing of the respective blocks takes place with up to 200Mhz. While higher sampling time blocks are generated, lower STP are already processed. The scheduling of the system is done by a finite state machine structure, which generates the required memory pointers and control signal to direct the data flow. Finally, the data content of the channel store memory is frequently transferred via the processor to the host system. Beside the task to transfer the correlation data to the host PC, the processor is furthermore used for the calculation of sampling time blocks larger than 12. By assuming an initial sampling time of 100ns, the processor would have approximately 4µs to fetch the data and perform the calculation of the 13th STB for 32 correlation channels. The acquisition capabilities of today available embedded processor are in that range and can thus be used. The simplified block diagram of a Virtual Correlator structure is shown in figure 8 below. Not shown in Fig. 8 are the respective multiplexers for the data selection as well as the system control structure. Initial resource estimation showed that the critical part of the design is definitely the efficient usage of the available memory. The number of STB depends directly on the memory available in the device. Furthermore, the program and data memory of the processor have to be taken into account as well as the temporary channel store memory. Besides the actual correlator structure, further on chip components will be various PLL modules for the generation of precise time bases, a built-in-self-test (BIST) module to test the circuit in the field as well as the monitor channels described in Section 3. One of the major design challenges is the reduction of the power dissipation of the system. Modern FPGAs are realised in deep submicron technology (90nm process) and show therefore even in stand-by an significant power consumption (approx. 1W). High clock frequencies, which lead to high switching activity on the routing tracks, may also cause significant power dissipation. To optimise the power dissipation, accurate estimations and analyses of static and dynamic power consumption have to be carried out during the whole design concept and implementation phase.
V Conclusion
The availability of high performances FPGAs makes the integration of the proposed system in programmable logic very attractive. The new developed Virtual Correlator architecture utilise dedicated arithmetic and memory blocks available on up to date programmable logic devices and this allows for power reduction as well as performance increases which were possible in former times only with semi-custom circuits. Since correlation measurements take several minutes to achieve the necessary statistical accuracy, the sample could dramatically change during the measurement from one scattering angle to another. Consequently, such time-dynamic samples can not be sequentially investigated under different angles. To obtain coherent results, the measurement on the different scattering angles has to be performed simultaneously. The currently available correlator products on the market are all Printed Circuit Board (PCB) implementations and therefore large and power consuming. Furthermore, all these systems are limited to a maximum of four correlators and therefore do not fulfil the actual demand for multi angle measurements of high-dynamic samples. To perform measurements over 32 different angles, at least eight PCB Board Correlators would be required. The novel system presented in this paper is the first implementation of 32 independent parallel correlators on a single FPGA chip. This new designed multi channel system represents a true innovation in correlator designs and will thus lead to new solutions of problems of multi angle dynamic and static light measurements.
