Abstract-Current fusion devices usually implement distributed acquisition systems for the multiple diagnostics of their experiments. However, each diagnostic is composed by hundreds or even thousands of signals, including images from the vessel interior. These signals and images must be correctly timestamped, because all the information will be analyzed to identify plasma behavior using temporal correlations. For acquisition devices without synchronization mechanisms the timestamp is given by another device with timing capabilities when signaled by the first device. Later, each data should be related with its timestamp, usually via software. This critical action is unfeasible for software applications when sampling rates are high. In order to solve this problem this paper presents the implementation of an image acquisition system with real-time hardware timestamping mechanism. This is synchronized with a master clock using the IEEE 1588 v2 Precision Time Protocol (PTP). Synchronization, image acquisition and processing, and timestamping mechanisms are implemented using Field Programmable Gate Array (FPGA) and a timing card -PTP v2 synchronized. The system has been validated using a camera simulator streaming videos from fusion databases. The developed architecture is fully compatible with ITER Fast Controllers and has been integrated with EPICS to control and monitor the whole system.
I. INTRODUCTION
L ARGE-SCALE physics experiments have increasingly been requiring a distributed data acquisition (DAQ) system. However,dataanalysis in the se experiments requires complete synchronization among the different devices used in the acquisition process. An incorrect sample timestamp prevents the signals from being correctly correlated with each other, invalidating the analysis results. The Precision Time Protocol (PTP) is likely one of the most accurate and flexible mechanisms of all of the different device synchronization mechanisms and can be perfectly adapted to heterogeneous systems. The PTP is a standardized protocol (IEEE Std 1588-2008 [1] ) that provides time synchronization among standard compliant devices connected by a network. The standard defines the specifications for several networks,including Ethernet, DeviceNET, ControlNET or PROFINET. A clock hierarchy is composed of PTP devices connected to the network and can achieve a synchronization precision of tens of nanoseconds between the defined grand-master clock and the slave clocks. Several proposals are currently available that apply the PTP to nuclear fusion environments [2] - [4] Synchronization problems become more challenging when high data volumes are collected in experiments, such as in nuclear fusion by magnetic confinement. In these experiments, cameras are typically used to monitor plasma behavior or perform a real-time diagnosis of the experimental results [5] , [6] . The nature of these experiments requires medium-sized images (from ) to be acquired at high frame rates with real-time image processing. These high throughput processes require specialized systems that are adapted for each specific experiment. Therefore, reconfigurable hardware platforms are perfectly suited to these environments. In particular, Field Programmable Gate Array (FPGA) devices are attracting increasing interest in high performance acquisition systems [3] , [7] . DAQ FPGA devices usually do not come with the PTP synchronization required resources, hence the system designer must provide synchronization artifacts with the available resources if this protocol is intended to be used for synchronization and timestamping.
The ITER Fast Controller for the PXIe form factor includes a timing card that can be synchronized to a PTP network to serve as a reliable source of the system time [2] . However, the timestamp mechanism of the timing card relies on an external computer to package the data together with the timestamp. This process may induce a high CPU overload when multiple channels are distributed among different devices and are sampled at a high sample rate.
In this paper, a solution is developed for image timestamping, whereby a hardware clock in the FPGA is synchronized with an external reliable clock. This easily adaptable module can be used to timestamp images as well as any type of signal acquired or event detected by FPGA-based platforms. The FPGA time clock achieves a precision of tens of nanoseconds with respect 0018-9499 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. to its reference clock (the timing card clock). Timestamping the samples or images in the FPGA can effectively leverage the CPU load, allowing more complex distributed DAQ systems. The developed solution is based on a previous real-time system that can acquire and process images using PXIe and FlexRIO technology [7] from a CameraLink standard compliant camera [8] .
The rest of this paper is organized as follows. First, we discuss the need for hardware timestamping and the advantages it offers over conventional solutions, followed by a description of the hardware and software for the developed system. Finally, the results and conclusions obtained using the implemented system are presented.
II. BENEFITS OF HARDWARE TIMESTAMPING
Hundreds or even thousands of signals must be processed in experiments such as magnetic confinement fusion. Timestamping individual or even groups of samples of these signals can become challenging because of the high sampling rate (surpassing the MS/s) that is required in this type of experiment. Moreover, the multiple signals must be related in the sample analysis, requiring coherence among the timestamps of these signals.
The ITER Fast Plant System Controller prototype for the PXIe form factor includes a timing card that serves as the system time source for timestamping. This device can be configured to register timestamps for events in the PXI trigger lines in a hardware buffer that can be read later via software. The reading process can distinguish between event sources, enabling the timestamping of different acquisition channels working at different sampling rates. However, this process requires the software to read the samples and their timestamps and pack them together. According to the referenced prototype, this feature can be used at a maximum rate of 120 kS/s per acquisition channel because of the latency of the software access to the device's hardware buffer [2] .
For the aforementioned system, one of the options for sample timestamping is to trigger an event in the PXI trigger line for each individual sample collected. This method provides precise and coherent timing for different channels or devices. However, the aforementioned 120 kS/s limit and the CPU overload of the sample and timestamp packing make this option unfeasible for complex DAQ systems such as those used in nuclear fusion experiments.
The workload of the timing card hardware buffer can be decreased by raising only a single event at the start of the acquisition of a group of samples and using this information together with the sampling rate to timestamp the entire group. Hence, in this method, the frequency of the DAQ device oscillator is used as a time measure in counting the oscillator cycles between samples. Nevertheless, every oscillator has an inherent jitter, which is defined by the ITU G.810 recommendation [9] as the deviation in time around an ideal value. This jitter implies that every base clock cycle is slightly longer or shorter than expected, which generates a drift that increases as the acquisition progresses. If only the first sample in each sample group is timestamped using the timing card, any additional sample will accumulate a timing error that is proportional to the oscillator's inherent jitter. Depending on the size of the sample group, the sampling rate and the jitter, two phenomena can take place with two supposedly consecutive samples: an overlap between the end of one group and the beginning of the following group or a time gap between them. This method effectively reduces the CPU load because there are fewer reads of the timing card hardware buffer. However, if samples are not post-processed to fix the timestamping errors, this method does not provide time coherence even among samples acquired through the same channel. Figs. 1, 2, and 3 show how the jitter produces sampling errors. In the example shown, a 2-kHz sine wave is sampled at 1 MS/s with a base clock of 100 MHz. The base clock has a jitter of 50 ppm, which produces a drift of ps per clock cycle or ps per sample acquired. That is, the true value of a nominal 100-MHz oscillator frequency lies between 99.995-100.005 MHz. Fig. 1 shows the beginning of the acquisition process. If the timestamp assigned to the first sample is provided by a reliable timing source, such as the timing card, then no error is introduced at this time. As sampling continues, the accumulated drift produced by the clock jitter increases, reaching a maximum error of ns for sample number 200. This result implies that the base clock could be one cycle ahead or behind the ideal clock. The error introduced at this point is insignificant; however, this error grows to ms after twenty seconds of acquisition, which entails an error of 100 samples. The acquisition system may assign a timestamp of twenty seconds when the real time of the sample acquisition is in the 19.9999-20.0001 s interval, as shown in Fig. 2 . The sampled signal will be correctly reconstructed but will progressively acquire a phase relative to the original signal. Fig. 3 shows an example of a 2-kHz signal reconstruction using the samples acquired by an acquisition system with a 100-MHz base clock at 1 MS/s at a real frequency of 100.005 MHz. The figure represents the comparison between the original signal and its reconstruction after twenty seconds of acquisition. The implementation described here distributes the clock of the timing card to the FPGA module using two PXI trigger lines for synchronization purposes. A synchronized clock in the data acquisition module serves as a reliable high-precision timestamping source without increasing the CPU computational effort or affecting either sampling rate or the image processing time, even inside a distributed acquisition system. The synchronized clock implementation is based on [10] and maintains a constant synchronization with the timing card clock with an error of less than 40 ns. This synchronization is achieved using a process running on the host computer that periodically programs the timing card to generate events in two PXI trigger lines and writes in several registers of the FPGA contained in the acquisition device. These operations are only executed once per second and per acquisition device, regardless of the sample rate or the number of channels of acquisition for each device. The developed solution applies progressive corrections to compensate for the oscillator jitter, which is dynamically calculated using the synchronization events, resulting in a coherent and precise method of timestamping the acquired samples.
We demonstrated the difference between using the oscillator frequency as a time measure and the developed implementation for a real acquisition process by sampling a 2-kHz sine wave using a NI PXIe-7965 FPGA module with a NI 5781 analog/digital input/output adapter module and a NI PXI-6682 timing card. The timing card was programmed to trigger an event every second. This event triggered the generation of one cycle of a sinusoidal signal in a function generator that was connected to a NI 5781 input channel. The event generated by the timing card also started the acquisition of 500 samples at 1 MS/s of the NI 5781 input signal by the FPGA. Each data acquired by the FPGA was sent to the host computer together with the value of a hardware counter of the FPGA base clock and the timestamp that was obtained from the implemented FPGA synchronized clock. The acquired signal was reconstructed using the two timestamping methods. Fig. 4(a), (b) and (c) show a reconstructed signal where the first sample of the entire acquisition was timestamped using the time from the timing card, and the rest of the samples were timestamped using the value of the hardware counter together with the sampling period. Fig. 4(d) , (e) and (f) show a reconstructed signal using only the timestamp provided by the FPGA synchronized clock. To show how the behavior of the drift caused by the jitter evolved with time, Fig. 4 shows the reconstruction of the signal at the beginning of acquisition ((a) and (d)), after 10 seconds of acquisition ((b) and (e)), and finally, after 59 seconds of acquisition ((c) and (f)). The start of the time axis for each of the aforementioned pairs of graphs was at 0 seconds, 10 seconds and 59 seconds, respectively. When the sample period was used, the signal appeared to shift on the time axis as the acquisition progressed because the base clock had between 300 and 400 more cycles per second than expected, resulting in an average timestamping error of 3177 ns per second of sampling. In contrast, when the timestamp from the FPGA synchronized clock was used, there was no perceptible difference between the signals acquired at different times except for the synchronization error, which was always less than 40 ns.
III. SYSTEM DESCRIPTION
The developed real-time image acquisition system is composed of two different subsystems. One subsystem is the image generator system, which captures or generates images and streams these images according to the CameraLink standard. These images are received by the second subsystem, which is the image acquisition system that performs image reception, processing and storage, if needed.
A. Image Generator
The developed implementation, while still compatible with a CameraLink camera, was tested with a PCIe8 DVA CameraLink simulator PCI express card. This card, when connected to a computer with a Linux OS, emulates the behavior of a CameraLink camera using images stored on the host computer or generated by the card itself ( Fig. 5(a) ). This feature allow images to be streamed from previously recorded movies of the fusion experiments, such that the system can be tested without a real fusion process running, and the image processing algorithms can be validated. The card functions as the standard definition for the base, medium and full modes, sending from 24 bits to 64 bits of pixel data per pixel clock event. The pixel clock of the card can be set to a 20-MHz to 85-MHz range, thus reaching the 680 MB/s transfer speed limit specified in the standard for the full configuration. The card can be configured to include the value of a hardware counter in the first 16-bit word, which is useful in the development stage and simplifies the verification frame losses. The card can be connected to a Linux system and then handled and configured using an API provided by the manufacturer [11] , thereby enabling the configuration of the transfer parameters and the transferred image set.
B. Image Acquisition and Processing System
The developed image acquisition and processing system is based on FlexRIO and PXIe technologies and is fully compatible with the standardized ITER CODAC's Fast Controller model for the PXIe form factor [2] . This subsystem (Fig. 5(b) ) includes the following elements. a) A NI PXIe-1065 chassis is used that has slots for data acquisition modules, FlexRIO modules and other modules, such as timing cards. The chassis provides segmentshared PXI trigger lines that can be handled by any of the connected modules. These lines are used for the synchronization of the distributed PTP clock. b) A host computer is used that is equipped with a PXIe-PCIe 8361 card. This card, when connected to its chassis homologue, the PXIe-8370 module, enables the host computer to remotely handle the PXIe system through a PCI/PCIe connection. In our case, this host computer is a PICMG-1.3-compatible computer. c) A NI FlexRIO PXIe-7965R FPGA module is used, which includes a Xilinx Virtex-5 SX95T FPGA in a PXIe card that is compatible with a variety of input/output adapter modules. This device is connected to the PXIe chassis. d) A CameraLink adapter module NI 1483 is used for the FlexRIO to implement a CameraLink frame grabber solution. This adapter module operates in the base, medium, full, and extended modes with a pixel clock of 20 to 85 MHz, receiving from 24 to 80 bits of pixel data per pixel clock event. This module does not have any clock synchronization input. e) A NI 6682 timing card is used that can be synchronized with a PTP network (versions 1 and 2) working as a master or slave. This timing module can timestamp events in the PXI trigger lines and be programmed to generate changes to these lines at a specified future time. The clock implemented in the FPGA is synchronized with the clock provided by this card. f) A Hirschmann MACH1040 Gigabit Ethernet Switch is used to provide a master clock for the timing card, thereby emulating the real conditions in a distributed acquisition system with a centralized clock. Implementation Description.
IV. IMPLEMENTATION DESCRIPTION
The developed implementation for an image acquisition and processing system based on the architecture defined above is divided into two parts. One part occurs in the FPGA module and involves the image acquisition from the CameraLink camera, the image processing, timestamping, and the dispatch of the raw image and the results to the host computer. The other part is the software process needed for the proper synchronization of the FPGA timing clock, the reception of the image in the host computer and image storage.
A. FPGA Side
The FlexRIO modules are FPGA-based, and the FPGA configuration bitfile can only be generated using LabVIEW and its module for the FPGA. This software tool is used to develop and debug the hardware. The solution presented here includes a VHDL implementation and, as explained below, an improvement on the FPGA synchronized clock of [10] and its application to a real-time image acquisition and processing system for high-precision image timestamping. The developed system acquires CameraLink images through the NI-1483 CameraLink adapter module and adds a timestamp to each frame with the precision of a PTP time clock (Fig. 6) . The CameraLink compliant cameras can usually be configured to add the value of a camera hardware counter in the first 16-bit word of each frame; thus, the timestamp is embedded into the following 80 bits, as shown in Fig. 7 . A timestamp is defined as a structure of two fields in the IEEE Std 1588-2008 [1] . The first field is the secondsField, which is defined as the integer portion of the timestamp in units of seconds. The secondsField is located at the 48 bits that follow the hardware counter (bits 16 to 63). The second field is the nanosecondsField, which is defined as the fractional portion of the timestamp in units of nanoseconds. The nanosecondsField occupies the 32 bits after the secondsField (bits 64 to 95). The timestamp that is assigned to each image is the time at which the first set of pixel data is received by the FPGA. This moment is identified using CameraLink, FVAL, LVAL and DVAL signals. This is the closest moment to the image acquisition that we can identify with a standard CameraLink camera. The timestamp is only added to the image before it is sent to the host computer. Therefore, the image processing is not affected. The pixel sets received in the FPGA are forwarded to the host computer in 64-bit packets via DMA.
The FPGA synchronized clock is implemented in VHDL language instead of LabVIEW graphical language to provide greater control over the synchronized clock latencies, thereby improving the precision of the synchronization and timestamping. LabVIEW is still needed to generate the configuration file; thus, the developed VHDL module must be included in the LabVIEW project as a component-level IP (CLIP). The instantiated synchronization module offers a 10 ns resolution clock that is updated by a 100-MHz oscillator. An input signal from the synchronized clock module can be used to load a register with the current timestamp. This register maintains its value until the next load. In this case, the register is loaded when the 1483 CLIP (a FPGA interface to the NI 1483 adapter module) signals the reception of the first set of pixel data of each image, as explained above. When properly signaled at the moment of acquisition, this module can be used to timestamp both images and any type of acquired signal up to 100 MS/s. No timestamp buffering has been implemented as yet and must be implemented outside the module, if needed. The implemented clock is synchronized with the external timing card using the periodical events generated by the external timing card in two PXI trigger lines. These events, together with the number of 100-MHz clock ticks between event occurrences and the value of the externally written registers, form the required inputs for the synchronization mechanism. This mechanism is explained in detail in [10] .
B. Host Computer Side
For an effective synchronization between the implemented FPGA clock and the timing card clock, the mediation of a software process is mandatory. This software process is schematized in Fig. 8 . The timing card is configured using this process, which is executed in the host. The host computer must program the timing card to generate one pulse per second in the PXI trigger lines used for synchronization. These pulses, which are generated as future time events, can only be configured in the NI PXI 6682 timing card using an API called nisync.
As mentioned above, some FPGA registers must also be externally written for synchronization purposes. These registers provide the synchronization clock with information on when the PXI trigger line events occur in terms of the timing card clock. Therefore, this register can only be written by the host computer, which is the only component with this information. For this purpose and to load the configuration bitfile into the FPGA, National Instruments provides the niflexrio and NiFpga libraries. These libraries can also be used to check the FPGA timing clock status by reading the parameters for the current offset with the timing clock card from several FPGA registers. The DMA data transfer from the FPGA to the host computer is managed by these libraries, thus providing access to the images sent from the FPGA.
The synchronization mechanism and the image acquisition process have been implemented using NIRIO EPICS Device Support [2] and therefore can be fully controlled and monitored using EPICS.
V. RESULTS AND CONCLUSIONS
The FPGA resource usage of the synchronized clock implementation was checked for an NI PXIe-7965 FPGA module by comparing the usage report provided by the Xilinx tools to compile an image acquisition system with and without implementing the synchronized clock (see Table I ). The low resource usage of the FPGA timing clock allows its inclusion in a sampling system without sacrificing hardware processing capabilities.
The accuracy and precision of the synchronized clock was tested with respect to the timing card by using the following configuration. We plugged in a NI 6682 timing card and a NI PXIe-7965 FlexRIO module into a NI PXIe-1065 chassis connected to a PICMG computer. The timing card was connected to a Hirschmann MACH1040 switch and configured as a PTP slave clock of the switch. A software process running in the host computer arbitrated the synchronization between the FPGA timing clock and the timing card clock. The same process read the offset between the FPGA timing clock and the timing card clock from the FPGA registers once per second. The data collected during the first 15 seconds was excluded from the results because the FPGA timing clock was not yet synchronized. This synchronization time resulted from the initialization of the average drift that was required to correct the jitter. The results in Fig. 9 show that the synchronization by the timing card was successful. The synchronization error over 2,000 seconds of measurement had a normal distribution with a standard deviation of 6.86 ns, a mean of 0.08 ns and a maximum offset of 33 ns. The system described in Sections III and IV enabled us to acquire and timestamp images of 1 byte per pixel sent by the CameraLink simulator at up to 1,677 fps (97.02 MB/s) using a PCI link between the chassis and the host computer. In further tests using an PCIe link, the system reached 680 fps receiving images of 1 byte per pixel (680 MB/s). The maximum pixel clock frequency specified in the CameraLink standard was 85 MHz, and our timing clock could timestamp the samples acquired up to 100nbspMS/s; thus, the limitations on the image sampling rate could only come from the host computer. If the host computer cannot read the images that are transferred before the memory assigned to the image reception is filled, then the image frame rate or resolution should be configured to prevent data loss.
The implemented FPGA timing clock can be used in any FPGA-based device, not only to timestamp the images signals but also for hardware events during threshold detection and for analog or digital input signals. The timing clock is designed in VHDL to prevent unexpected latencies in the clock module from LabVIEW design constraints, while facilitating integration with any acquisition and processing system.
