Abstract-The main role of the ITER Radial Neutron Camera (RNC) diagnostic is to measure in real time the plasma neutron emissivity profile at high peak count rates for a time duration up to 500 s. Due to the unprecedented high-performance conditions and after the identification of critical problems, a set of activities have been selected, focused on the development of high-priority prototypes, and capable to deliver answers to those problems before the final RNC design. This paper presents one of the selected activities: the design, development, and testing of a dedicated field-programmable gate array (FPGA) code for the RNC data acquisition prototype. The FPGA code aims to acquire, process, and store in real time the neutron and gamma pulses from the detectors located in collimated lines of sight (LOS) viewing a poloidal plasma section from the ITER Equatorial Port Plug 1. The hardware platform used was an evaluation board from Xilinx (KC705) carrying an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with two digitizer channels of 12-bit resolution sampling up to 1.6 Gsamples/s. The code performs the proper input signal conditioning using a down-sampled configuration to 400 Msamples/s, applies dedicated algorithms for pulse detection, filtering and pile up (PU) detection, and includes two distinct data paths operating simultaneously: 1) the event-based datapath for pulse storage and 2) the real-time processing with dedicated algorithms for pulse shape discrimination (PDS) and pulse height spectra (PHS). For continuous data throughput, both datapaths are streamed to the host through two distinct PCIe × 8 direct memory access (DMA) channels.
I. INTRODUCTION
T HE ITER Radial Neutron Camera (RNC) main goal is to measure the plasma neutron emission profile enabling real-time plasma control purposes [1] . Spectrometers, expected to be placed at the end of each collimated line of sight (LOS), will provide the line-integrated neutron flux measurements for neutron emissivity calculations through inversion algorithms [2] . A set of high-priority activities within the framework contract focus on the development of experimental setups of the neutron detector prototypes and its signal readout equipment. This includes the design, development, and testing of dedicated field-programmable gate array (FPGA) codes for the front-end electronics prototype [3] , aiming to acquire, process, and store in real time the incoming neutron and gamma fluxes at an expected sustained event rate of 2 Mevents [4] . The hardware platform includes an evaluation board from Xilinx (KC705) carrying an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with two digitizer channels of 12-bit resolution sampling up to 1.6 Gsamples/s [5] , [6] . This paper presents the FPGA codes and algorithms of the frontend electronics prototype followed by some results achieved so far.
II. FPGA CODE
The RNC code was developed in Verilog with the Xilinx VIVADO tool (2015.4 and 2017.4 versions). It was implemented in the Xilinx KC705 development kit carrying an IPFN FMC prototype and tested with synthetic pulses from CAEN generator (DT5800D). Fig. 1 flowchart depicts the main blocks of code developed under the RNC framework contract concerning the FPGA common environment and algorithm activities. The code is composed of four main blocks, detailed in the next sections: 1) data conditioning; 2) data processing; 3) data streaming (PCIe interface); and 4) system control.
III. DATA CONDITIONING
The RNC detector signals are digitized by the 12-bit, 1.6-Gsamples/s analog-to-digital converters (ADC) of the FMC card. The ADCs are configured (e.g., sampling rate and operating mode) by FPGA phase-locked loop (PLL) control module, Fig. 1 , through the 11 24-bit registers of the high-performance frequency synthesizer (LMX2531) with a PLL installed in the FMC module. The FPGA receives four samples simultaneously, at a sampling rate of 1/4 of the ADC acquisition clock rate (1.6 GHz/4 = 400 MHz), 0018-9499 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 
IV. DATA PROCESSING
Considering the ultrahigh acquisition rates allowed by the FMC card (1.6 Gsamples/s), dedicated data reduction and processing algorithms are a priority for feasible data streaming. Dedicated algorithms should be able to sustain the expected high data throughput (2 Mevents/s per ADC) without losses. Thus, two different operating modes were selected as follows.
1) Events: The detected events are streamed to host together with the corresponding time occurrence, time stamp (TS), for data archiving. 2) Real-Time Process: Delivers the gamma-ray/neutron pulse shape discrimination (PSD) and/or pulse height spectra (PHS) in real time. Sections IV-A-IV-E describe the main real-time algorithms of the data processing block.
A. Filter
When signals coming from detectors are unstable (e.g., drift in the signal baseline), the pulse detection may fail, leading to poor system performance (e.g., energy resolution degradation of founded events) [7] . Dedicated pulse-filtering algorithms may be applied to the raw data before the event detection stage, capable of digitally restoring the baseline and removing the undesired offset. The optimal offset removal/baseline restoring algorithm usually depends on the incoming signal on-site after diagnostic installation. Thus, it is not possible to select, at this phase, the best algorithm capable to minimize possible instabilities in the RNC signals. However, a generic filter module interface was added to the project (FILTER, Fig. 1 ) enabling the possibility to allocate suitable stabilization circuits. The filter module can be bypassed if not needed. The event detector algorithm may trigger from filtered data, when the filter is on, or from raw data when the filter bypass option is selected. As an example, a digital trapezoidal-based shaper (DTS) was implemented in the filter module. DTS is a well-known technique capable of suppressing ballistic deficit of sharp peak with exponential decay pulses, being a strong candidate for baseline restoring and offset the removal of RNC signals [7] , [8] . When the DTS-based filter is used, the exponential signal is transformed into a kind of trapezoid, whose amplitude is proportional to the energy of the event [9] . Considering expected pile up (PU), DTS parameters were slightly modified providing filtered events similar to the Gaussians instead of pure trapezoids [10] .
B. Event Detection
Considering the expected shape of the gamma-ray/neutron from RNC scintillators (exponential decay signals with fast rise time), two different trigger types were selected for event detection as follows.
1) Basic Trigger: Threshold by level. The event detector triggers when data reaches a predefined threshold. 2) Advanced Trigger: Threshold by derivative. The event detector triggers when the first signal derivative reaches a predefined threshold. The advanced trigger is the option adopted by many spectroscopy diagnostics due to its ability to reject the high-frequency noise, baseline restoring, and cancel the low-frequency fluctuations [7] . The event detection algorithm is applied to both raw data and filtered data.
C. Events Storage-Pulse Window
As depicted in Fig. 2 flowchart, when an event is detected the FPGA starts storing the corresponding 64-bit TS, followed by 16-bit event samples (delayed by 64 bit to include the TS) until the predefined pulse window (PWIDTH) decreasing counter is reached (cnt= 1). If a new event is detected during the second half of the PWIDTH being stored, a piled up event is considered and another PWIDTH storage starts. The event detector state machine includes increasing counters to provide the number of triggers occurred in each event (PU counting) and the number of PWIDTH used. Then, when no more piled up events were detected during the last PWIDTH, the event storage ends with the total number (P) of pulses found (16 bit), followed by the number of extra PWIDTH (PWIDTH-1) used to store the event (8 bit), and finally, by an end-of-event (W) tag (8 bit). The event data follow to an event packet buffer, a two domains clock first-in first-out (FIFO) buffer, for clock domain crossing between the FPGA logic and PCIe blocks.
D. Real-Time Process-PSD
Different algorithms, feasible to implement in FPGA, can be used to perform neutron/gamma discrimination [11] , [12] . Similar to the filter module (Section IV-A), a generic PSD interface was implemented, foreseeing user-defined inputs capable to meet different algorithm needs (e.g., calibration slope and event type parameters). For testing, it was implemented a PSD code based on DTS, receiving as input data from the filter module. The neutron/gamma discrimination is determined by the relation factor between the maximum of the trapezoid (peak value) and the trapezoid area-charge integration (CI) [13] . From this relation, it is possible to determine if the detected event is neutron or gamma, depending if the result is below or above of the corresponding calibration slope value. The foreseen light emission diode (LED) detection, for calibration purposes, was not included in this implementation. However, due to its singular shape, LED detection is not a critical concern. The data packet returned by the PSD module, Fig. 3 , is a two Q-Word (2 bit × 64 bit) containing the PSD output of each detected event.
The PSD output data may be used to feed the PHS module (Section IV-E), when available or streamed directly to host. When the PHS module is present, the slope parameters must be previously adjusted for proper particle discrimination and correct PHS construction at FPGA.
E. Real-Time Process-PHS
The PHS module receives the two PSD Q-Words, being responsible for the real-time PHS construction of both neutron and gammas. For each real-time cycle, established by the synchronous data network (SDN) periodicity [4] , a data packet (Fig. 4) with both uncalibrated spectra and the corresponding counts (number of single and piled-up events; neutron, gamma, LED, and total counts; counts per bin window) is streamed to host. The state machine of the PHS and counts module is responsible for: 1) deserialization of the two PSD Q-Words; 2) selection of the predefined value for spectra construction (peak or CI); 3) finding the corresponding histogram address for each incoming neutron/gamma; 4) counters incrementation; and 5) packet construction and storage in buffer. In each SDN cycle, a new PHS packet is streamed to host and the buffer resets (e.g., 2 ms).
V. DATA STREAMING
The eight-lane PCIe Gen2 was the selected communication protocol for data transfer between the RNC prototype and its host. Both RNC events and real-time processed data are streamed to host through dedicated PCIe direct memory access (DMA) packets, resulting in higher data throughput and better overall system performance through lower CPU utilization [5] . Two distinct DMA channels were implemented for data streaming: 1) DMA 0 to carry the events for data archiving/host processing and 2) DMA 1 for the real-time processed data (PSD or PHS packets).
Considering the RNC demanding data transfer at a variable rate (maximum throughput of 4 Mevents/s per FMC), a third DMA (DMA 2) was included to stream the status information (e.g., last sent DMA, DMA 0/1 counters, and last DMA address). Thus, each streamed DMA 0/1 data set written to host is followed by the DMA 2 carrying a new status word. This allows to identify the last DMA data transfer for proper data retrieval from host memories. Usually, in less demanding applications, the status word is written in PCIe shared memory as a completion to a CPU read request (Section VI). This procedure may enable conflicts in the PCIe bus between the register reading and DMA data transfer, which are of higher probability for demanding throughput at a variable rate.
The DMA engine is included in PCIe Receiving (RX)/ Transmission (TX)-interface (RX-TX), where a state machine is responsible for datapath management between RX and TX engines (endpoint front-end interfaces) and other FPGA modules, as depicted in Fig. 5 flowchart.
VI. SYSTEM CONTROL
The endpoint configuration is done through shared configuration registers, located in the host shared memory, namely, PCIe Base Address (BAR), usually settled by host BIOS during PCIe configuration space at power up. Registers must be properly defined by the host (driver) and endpoint (FPGA code) guaranteeing its correct operation. The system control module interfaces with the PCIe engine, receiving and delivering 32-bit register fields from/to host. Moreover, it exchanges configuration registers with other FPGA modules.
VII. RESULTS
The RNC FPGA code was tested using synthetic data from CAEN DT5800D pulse emulator. As an example, Fig. 6 shows a sequence of events from DMA 0 data streaming. Zoomed-in view of the figure highlights the sliding event window and PU detection capabilities.
In Fig. 7 , it is possible to observe the well-separated relation factors in red [x: peak; y: Tot (CI)] from PSD packet directly streamed to host through DMA 1. To simulate neutron and gamma events, it was used the CAEN emulator providing synthetic gamma and neutron-shaped pulses from two combined channels. RNC prototype receives data at an event rate up to 1 Mevents/s from both channels simultaneously, without PU. The separation slopes (blue and black) are included for better data analysis using an offline MATLAB code.
To test if the FPGA processing code is capable to identify PU events, it was applied a Poisson distribution individually (10% of PU) to each CAEN channel. As an example, Fig. 8 depicts the PU events found by PSD at FPGA (green spots) superimposed with the well-defined neutron/gamma relation factor.
From experimental results, it was concluded that the FPGA algorithms are feasible to detect PU in both event and processed data. However, it was observed that PU detection slightly reduces for filtered data, which is applicable Sequence of events stored with RNC prototype using synthetic data from CAEN emulator. Zoomed-in view: event window composed of three PWIDTH (3 × 64 samples) and five piled-up pulses, identified by the corresponding P-value (y = 5). to results from DMA 1 data, according to Section IV-D. This is explained by the signal smoothing effect imposed by the DTS filter. Improvements in the event detection algorithm were identified for future implementation (e.g., maximum pulselength), capable to reduce the undesired smoothing effect.
To check the FPGA PHS algorithm performance, streamed through DMA 1, the real-time PHS were compared with spectra obtained by postprocessing the event data from DMA 0 acquired simultaneously. As an example, Fig. 10 depicts the resulting PHS from FPGA for a 100-ms acquisition using two CAEN emulator combined channels (Ch1: 500 kev/s of gamma-based-shape pulses through an input spectrum defined by Fig. 9(a) ; Ch2: 500 kev/s of neutron-based-shape pulses through an input spectrum defined by Fig. 9(b) . Please note that both spectra physics are meaningless.
It was concluded that the FPGA PHS output, depicted in Fig. 10 , is in agreement with input data (Fig. 9) , and with spectra from postprocessing methods using event data from DMA 0.
However, before operating the FPGA PHS module, it is necessary to deeply adjust the separation slope values needed for proper discrimination. Thus, the real-time PHS production at FPGA might be difficult to use in experiments with the higher fluctuation of the separation slopes. Thus, instead of producing the PHS in real time at FPGA the best option is to stream the PSD data (peak and CI relation factors) through DMA 1 and postprocess the PHS at the host.
Since that all FPGA code is implemented in HDL, without embedded software processors included, the overall expected latency introduced by the processing modules is within the few nanoseconds range, taking into account the 400-MHz sampling clock. Also, the DTS filter introduces a very few latencies (taking into account the small delay parameters value) when compared with other digital filters [15] . Nevertheless, the latency introduced by data streaming to host (event-based and PSD data) depends on the expected count rate. Data are streamed when the DMA packet is filled or after a predefined timeout. Concerning the PHS module, expected to be periodically streamed (e.g., 2-ms cycle time), it requires more demanding calculations to fill the histogram buffers size. Thus, 200 ns of delay are currently introduced to process the 256-bin histograms, however, not relevant when compared with the milliseconds cycle time range.
VIII. CONCLUSION This paper presents the FPGA code developed for RNC front-end electronics prototype. The code foresees to acquire, process, and store in real time the neutron and gamma events from the detectors located in collimated LOS viewing a poloidal plasma section. It was implemented and tested in an evaluation board from Xilinx (KC705) carrying an IPFN FPGA Mezzanine Card (FMC-AD2-1600) with two digitizer channels of 12-bit resolution sampling up to 1.6 Gsamples/s. After signal conditioning, dedicated algorithms are used foreseeing event detection, filtering, PU detection, and real-time processing (event storage, PSD, and PHS). Three distinct × 8 Gen2 DMA channels were implemented. Two DMAs are responsible for real-time data streaming (event-based and PSD or PHS processed data), and the third DMA for the status word, avoiding the concurrent access of reading requests. The code was successfully tested with synthetic data from a CAEN emulator in the laboratory, allowing a maximum throughput of 1600 MB/s (the maximum possible for two channels 400-MHz continuous acquisition). Concerning the PSD algorithm, it is possible to conclude that PSD relation factors from FPGA provide successful neutron/gamma discrimination, when compared with postprocessing methods. However, it was observed that PU detection slightly reduces for filtered data, when compared with postprocessed event data from the same acquisition. This is due to the signal smoothing effect introduced by the DTS filter. New methods were identified capable to overcome this undesired effect in the presence of PU. The PHS algorithm was successfully implemented and tested at FPGA, which results are in agreement with postprocessing PHS using event data. However, it was concluded that PHS at FPGA might be impractical in experiments with the higher fluctuation of the separation slopes (e.g., signal gain changes). Thus, the best compromise is to stream the PSD packets through DMA 1 for PHS postprocessing at the host.
