Modern experiments in high energy physics impose great demands on reliability, efficiency, and data rate of Data Acquisition Systems (DAQ). In order to address these needs, we present a versatile and scalable DAQ which executes the event building task entirely in FPGA modules. In 2014, the intelligent FPGA-based DAQ (iFDAQ) was deployed at the COMPASS experiment located at the Super Proton Synchrotron (SPS) at CERN. The core of the iFDAQ is its hardware Event Builder (EB), which consists of up to nine custom designed FPGA modules complying with the µTCA/AMC standard. The EB replaced 30 distributed online computers and around 100 PCI cards increasing compactness, scalability, reliability, and bandwidth compared to the previous system. The iFDAQ in the configuration of COMPASS provides a bandwidth of up to 500 MB/s of sustained rate. By buffering data on different levels, the system exploits the spill structure of the SPS beam and averages the maximum on-spill data rate of 1.5 GB/s over the whole SPS duty cycle. It can even handle peak data rates of 8 GB/s. Its Run Control Configuration and Readout (RCCAR) software offers native user-friendly control and monitoring tools and together with the firmware of the modules provides built-in intelligence like self-diagnostics, data consistency checks, and front-end error handling. From 2017, all involved point-to-point high-speed links between front-end electronics, the hardware EB, and the readout computers will be wired via a passive programmable crosspoint switch. Thus, multiple event building topologies can be configured to adapt to different system sizes and communication patterns.
Introduction
With increasing beam energies, higher accelerator luminosities, and growing number of frontend channels over the last decades, the task of data acquisition in high-energy physics (HEP) experiments has become more and more complex [1] . The potential need of higher-level triggers, which ideally consider already the full detector data, requires inevitably an online event building mechanism. Most current DAQ systems use an Ethernet-based network of distributed online computers for this purpose. Concurrent developments in FPGA technology, such as increased I/O bandwidth (> 3 Gbps) and support for high-performance SDRAM, make also FPGAs suitable for event building purposes [2] . Triggered by the need of a versatile DAQ for the COMPASS experiment with its variable spectrometer setup [3] , the intelligent, FPGA-based DAQ (iFDAQ) has been developed. It exploits the superior properties of FPGAs, such as compactness and reliability, by executing the event building task entirely in FPGA modules. One goal during the development process was to construct the system as universal as possible to address the various spectrometer setups of COM-PASS as well as to simplify DAQ development and offer a solution for future HEP experiments. Due to its modularity and scalability, it has also been applied to the NA64 experiment, another HEP experiment at the SPS accelerator at CERN.
The intelligent, FPGA-based DAQ (iFDAQ) of COMPASS

Design of the Hardware Event Builder (EB)
The main requirement for any DAQ concerns the data rate. Depending on the spectrometer setup with around 300,000 detector channels and on the current physics program, the data rate in COMPASS during the on-spill period of 5 -10 seconds is around 1.5 GB/s with peak rates of up to 8 GB/s. The spill is followed by an off-spill of 10 -30 seconds. By buffering data on all levels of the EB, the system makes use of the spill structure. Hence, the four subsequent readout computers work independently from the initial rate at a sustained rate of up to 500 MB/s. Figure 1 shows a sketch of the iFDAQ for COMPASS. From the data concentrator modules, the data are transmitted to the hardware EB via optical links using the S-Link protocol developed at CERN. An optional intermediate multiplexing stage allows to merge the data streams of concentrator modules with low rates for an efficient use of the S-Link bandwidth and therefore, a lower number of incoming links. The hardware EB of COMPASS consists of seven custom designed modules, with identical PCB layout. One module is composed of a 6U VME carrier card which provides all necessary interfaces (16 optical high-speed links, Ethernet, and TCS interface) and a µTCA/AMC card called Data Handling Card (DHC) carrying a Xilinx Virtex6 FPGA and 4 GB of DDR3 SDRAM (see figure 2 left). Despite their identical layout, the DAQ units are used differently in the EB: six of them are programmed with a firmware that allows to use them as 15x1-multiplexer (DHCmx) and one is programmed as 8x8-switch (DHCsw). The DHCsw distributs in a round robin fully assembled events to the spillbuffer cards plugged into the readout engines. The spillbuffer cards are off-theshelf FPGA modules which provide a PCI-E interface to transmit the data into the buffer memory of the readout computers.
The system uses three independent network interfaces. For synchronization and distributing event information, the already mentioned Trigger Control System (TCS) is used. The data
PoS(ICHEP2016)912
The intelligent, FPGA-based DAQ of COMPASS flow and thus the event building follows the S-Link specifications and for configuration and control/monitoring of the hardware EB, the IPbus protocol over Ethernet network is used. The functional division simplifies diagnostic capabilities and offers efficient usage of the data bandwidth. The iFDAQ also offers built-in intelligence. Malfunctioning front-end electronics is recognized by the DHCmx modules by means of a defective incoming data stream and is displayed to the user while a meaningful event assembly in the EB is maintained. Moreover, the system provides self-diagnostics and for certain errors, like desynchronization, automatically takes action. The iFDAQ was deployed in 2014 in COMPASS for the first time and intensive development and debugging work has led to a stable and user-friendly system. In terms of reliability, it has already outreached the former system.
Future Development
For the run in 2017, a programmable crosspoint switch will be introduced, which will interconnect the front-end modules, all DHCmx modules as well as the DHCsw, and the spillbuffer cards. As a result, the former static point-to-point high-speed links of the EB will become configurable. This will enable the user to remotely customize the network topology via a dedicated software in order to adapt to different system sizes, to compensate broken modules by wiring to spare modules, or to achieve better load balance. First, changing the network topology will only be possible between two runs to avoid synchronous and time critical reconfiguration of the crosspoint 
PoS(ICHEP2016)912
The intelligent, FPGA-based DAQ of COMPASS D. Steffen switch. In a second step including upgrades of the TCS system to a bidirectional Passive Optical Network (PON), the feature of the TCS to distribute any timing information to all hardware nodes of the EB simultaneously can be used to reconfigure it on-the-fly.
To safe cost-intensive optical connections, the implementation of the crosspoint switch will come along with a change from the VME to ATCA standard. One ATCA carrier card will be equipped with four DHC Mezzanine Cards (see figure 2 center) . The interconnection between different carrier cards will be carried out via the backplane of the shelf which reduces significantly the number of optical links. The necessary optical links for interconnection to the front-end electronics and to the spillbuffer cards are provided by the Rear Transition Module. To ensure free programmability of every single high-speed link, all the links on one carrier card are wired through a Vitesse VSC3144-02 switch as can be seen on the right of figure 2. This module combines 6.5 Gbps asynchronous data throughput with full programmability of any output to any input port.
Versatility and Scalability
Since the iFDAQ is greatly modular, it can optimally be adapted to any system size saving costs by omitting unused resources. The planned upgrade allowing for configurable connections will simplify adaptability in order to address any specific processing requirement. The maximum data throughput of the hardware EB of the iFDAQ is limited by the throughput of the DHCsw, as it is the bottleneck of the system. A scenario with an FPGA of the most recent generation (Artix7 by Xilinx) assuming ten in-and ten outgoing links, could provide a data throughput of around 10 GB/s. In order to go beyond this limit, a geometric pattern of DHCsw modules would be necessary, as can be seen in figure 3 . Estimated costs of around 88 k$ for a throughput of 100 GB/s and 1.3 M$ for 1 TB/s in the already mentioned scenario make hardware EBs an even less expensive alternative compared to conventional systems. Possible patterns to combine DHCsw modules to achieve higher data throughput.
