Using FPGA technology for event building tasks in high-energy physics experiments reduces costs and increases reliability of DAQ systems. In 2014, the COMPASS experiment at the Super Proton Synchrotron at CERN commissioned a novel, intelligent, FPGA-based DAQ in which event building is entirely performed by FPGA cards. The highly scalable system is designed to cope with an on-spill data rate of 1.5 GB/s and a sustained data rate of 500 MB/s. Its intelligent and highly reliable hardware event builder is able to handle and detect front-end errors and automatically take corrective action. In September 2017, it reached an uptime of 99.63%. The paper gives an overview of system details, performance, and running experience.
Introduction
Driven by the need of a highly scalable and high-performance computing architecture for data acquisition, the COMPASS experiment at CERN's Super Proton Synchrotron (SPS) developed a new Data Acquisition System (DAQ) from scratch using a novel approach to the event building network. The new system and its event builder exploit the application-optimized computation technology of Field Programmable Gate Arrays (FPGAs). In contrast to traditional event builders which are based on distributed online computers interconnected via an Ethernet Gigabit network, the event building task is completely executed in hardware. Recent developments in FPGA technology, such as increased I/O bandwidth (> 3 Gbps) and support for high-performance SDRAMs even on low-cost chips, made FPGAs suitable for event building purposes. The arguments to move from software-based to FPGA-based event builders in future are reduced costs, higher reliability, and increased compactness.
COMPASS commissioned its so-called intelligent, FPGA-based DAQ (iFDAQ) in 2014, when a reduced spectrometer required only a reduced event builder. During the following years, the system was debugged, extended, and features were added. This paper describes features, in particular intelligence elements, that were added since 2015 and system performance during the run in 2017.
System Design and Setup of the iFDAQ for 2017
Since the COMPASS spectrometer setup varies depending on the physics program (see [1] and [2] ), the requirements for the hardware Event Builder (EB) may change annually. Maximum and average event size, trigger rate, and number of incoming optical links are dependent on the spectrometer setup. Therefore, the design of the EB has to be highly flexible.
The hardware EB consists of custom designed DAQ units, called data handling cards (DHC) shown in figure 1 left. There are two different operation modes of the DHCs in the EB, as can be seen in figure 2: They can either be used with a firmware configuring them as 12:1 multiplexer (DHCmx) or as 8x8 switch (DHCsw). The heart of the DHCs is formed by a Xilinx XC6VLX130T FPGA from the Virtex6 family. In addition, it is equipped with 4 GB DDR3 SDRAM which allows for buffering of data on the module. In both operation modes, the DHCs check the incoming data streams for errors and consistency and hence can detect malfunctioning in one of the attached nodes. Since all the data has to go through the DHCsw module, the DHCsw is the bottleneck of the system and high data throughput is main requirement. The right side of figure 1 shows the data throughput as a function of event size. The absolute limit is at 3 GB/s which is sufficient for the 
PoS(TWEPP-17)127
Intelligence Elements and Performance of the FPGA-based DAQ of the COMPASS Experiment use in the iFDAQ of COMPASS. For a detailed system reference and description of the firmware designs, please refer to [3] . Figure 2 shows the layout of the iFDAQ EB during the run in 2017. In contrast to previous years, there is a second layer of DHCmx modules which was installed in order to account for the increased number of incoming S-LINKs [4] . These modules partly replaced Slink-multiplexers (four inputs) and allow for a more efficient usage of the S-LINK bandwidth by merging up to twelve low rate detectors.
The support software for the iFDAQ is a multilayer system centered around a master process. The Run Control, Configuration, And Readout Software (RCCARS) incorporates only one realtime process which is responsible for transferring the fully assembled events from SDRAM to hard disk. While doing so, it changes the data format of the EB to the data format of the previous DAQ in order to ensure full backward compatibility with all other software tools. Details about the RCCAR software and in particular the event processing in the real-time processes can be found in [5] . Until 2016, communication between processes was carried out using DIM library which was developed at CERN [6] . However, stability of the software system was reduced due to problems connected to DIM library such as truncation and loss of messages. For the run in 2017, a new communication library called DIALOG was developed [7] . The DIALOG library was successfully deployed to all processes at the end of the run in 2016 and contributed to an improved system reliability in 2017 (see section 4).
Intelligence Elements of the iFDAQ
The iFDAQ was designed to be a highly automated system. This increases user-friendliness by requiring less interventions from shift crews and ultimately less experienced users. Intelligent and automated features in hardware as well as in software turned out to be commonly used and very appreciated by collaborators. By now, there are following intelligence elements deployed:
• Self-synchronized event building data flow: initiating the data flow through the event builder and maintaining a synchronous processing of data by all involved hardware nodes is achieved by distributing trigger and spill cycle information to all nodes via the Trigger Control System (TCS) and applying reset commands and timeouts. In this way, an asynchronous working node recovers usually latest on the next spill. In the unlikely event of a nonrecoverable desynchronization, the data flow through the system is automatically stopped by software. When restarting data taking, a completely new synchronization is established.
• Automatic resynchronization of front-end (FE) modules: the TCS information is also distributed to the data concentrator modules which allows for similar automatic resynchronization. Moreover, data from a concentrator module can be excluded or included on any spill during the run. This allows detector experts to work on their equipment without interrupting data taking. The feature is particularly useful at the beginning of a run during commissioning phase.
• Continuous FE electronic error diagnostic, automatic error handling and error recovery: during the development of the iFDAQ, great emphasis was put on the ability to recover and handle errors in the data stream emerging from FE electronics and on the capability to identify their origin. On each level of the EB, many data consistency checks are performed. Detailed information about the error diagnostics can be found in [3] and [5] . The data quality control offers a means of identifying malfunctioning FEs. Since the data flow through the EB is maintained even when no data are stored, the iFDAQ is a powerful real-time monitoring tool for detector FEs.
In future, the intelligence of the system shall be further extended. Since all optical links in the hardware EB are fixed point-to-point connections, failure of one node within the EB can not be compensated without human intervention. This is always connected to beam time loss. For next year, it is planned to wire all connections between data concentrator modules, DHCmx and DHCsw modules, and readout computers through a fully configurable cross-point switch. Details of the implementation of the cross-point switch will be published soon.
• Automatic recognition and compensation for hardware failure: Connecting spare DHCs to the cross-point switch will allow fast failover. Software will automatically identify broken nodes, compute, and apply a new configuration for the cross-point switch.
• Automatic load balancing: cross-point switch support software will analyze data rate on incoming links and automatically reconnect them to input ports of the EB in a way that load is equally distributed over all DHCmx modules.
Performance of the iFDAQ in 2017
The setup of figure 2 was able to cope with this year's on-spill data rate of 1.5 GB/s. The on-spill data rate is averaged over the whole SPS duty cycle to a sustained rate between 250 and 380 MB/s in stable beam conditions. Since beam time is highly valuable and beam is usually
PoS(TWEPP-17)127
Intelligence Elements and Performance of the FPGA-based DAQ of the COMPASS Experiment Dominik Steffen Figure 3 : Downtime in each month of the run in 2017 until September 10th and corresponding DAQ availability provided 24h per day, high reliability of the DAQ system is of major importance to high-energy physics experiments. The iFDAQ proofed to be very reliable in 2017 as indicated in figure 3 . There are three main sources of DAQ downtime. The first one is a memory access error (PCI/DMA) caused by scrambled data being buffered in the RAM of the readout engines. The second one is unrecoverable loss of synchronization in the hardware EB leading to a safe stop of the run. The contribution of these two sources of downtime to the overall system downtime decreased during the run due to better commissioning and calibration of detectors and consequently higher data quality. The third source are safe stops of the data flow, which are triggered automatically by the system in case of errors or stuck processes in the RCCAR software. Software updates in June and July 2017 significantly improved system reliability by increasing the stability of readout and master processes. In September 2017, the iFDAQ reached a system uptime of 99.63%.
