Abstract-The time-of-flight system in the Compressed Baryonic Matter experiment is composed of super modules that are based on multigap resistive plate chambers for highdensity and high-resolution time measurement. To evaluate the quality of detectors during mass production, a distributed data readout system was developed. Furthermore, a related data acquisition (DAQ) software was developed under Linux operating system to test and verify the feasibility of the utilized distributed data readout method. In this paper, the DAQ software focused on data collection, event building, status monitoring, and system control. Laboratory tests confirmed the function of the DAQ software for super module quality control and showed that the overall data transfer rate of a single data transmission path can reach up to ∼550 Mb/s. Index Terms-Compressed Baryonic Matter (CBM) readout electronics, data acquisition (DAQ) software, event building, graphical user interface (GUI).
I. INTRODUCTION
A COMPRESSED Baryonic Matter (CBM) experiment at the Facility for Antiproton and Ion Research (FAIR) aims at the search for phase transitions in the phase diagram of strongly interacting matter as well as the study of strange and charmed particle production [1] , [2] . Charged hadrons are identified in CBM by a time-of-flight (TOF) super module system that will be placed 10-m downstream of the fixed target [3] .
Each module contains five multigap resistive plate chamber (MRPC) detectors with 320 electronic channels. The case rate of a single channel can reach up to 300 kHz due to a high reaction rate of 10 MHz in a heavy ion collision experiment. Since each case is saved in the form of 48-bit data in the time-to-digital converter (TDC), the raw data throughput of the super module will be estimated with the following equation: 300 kHz/ch × 320 ch × 48 bit = 4.6 Gb/s. Furthermore, considering the overhead of transmission protocol, the actual data rate in the heavy ion collision experiment may reach up to about 6 Gb/s [4] . The traditional readout system is based on the crate, which transmits through the backplane bus and the crate controller; however, this hardly meets the requirements for highrate transmission and expansion. Consequently, to achieve data readout of the CBM-TOF super module, a 320-channel time digitizing and readout electronic system were designed (as shown in Fig. 1 ). For data read out in real time, the system has a distributed architecture, including front-end electronics (FEE), back-end electronics (BEE), and data acquisition (DAQ) software.
FEE mainly contains one time over threshold Feeding Board (TFB) that leads the signal from the super module, ten TDC for time digitization, and one TDC Readout Motherboard (TRM) for the read out of multiple TDC. BEE is mainly composed of 16 data readout modules (DRMs) for data forwarding and one specific DRM for status and command routing in the PCI extensions for instrumentation crate. Furthermore, a clock and trigger system (CTS) is integrated for clock and trigger distribution. Each DRM is based on the system-on-chip (SoC) and Ethernet techniques, so that data can be transmitted in parallel to a back-end computer. The total data rate of the readout electronics system with 16 DRM is about 6 Gb/s in the future heavy-ion collision experiment, which indicates that for a single data transmission path, the network transmission rate should reach up to about 375 Mb/s. However, the super modules are mostly evaluated by the cosmic ray tests, where the data rate can be dozens of megabits per second. On the other hand, in a beam test, the data rate of a super module is usually less than 1 Gb/s. Accordingly, a DAQ software was designed to meet the requirement of the quality control and will be upgraded to meet the high data rate requirement of the heavy-ion collision experiment.
0018-9499 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. The DAQ software can transmit the data in real time. Moreover, it also has a graphical user interface (GUI) and can provide functions such as event building, command distribution, and online status monitoring.
II. SOFTWARE ARCHITECTURE There are two basic functions required for the DAQ system in the particle physics experiment: to transmit the data obtained from the FEE to the back-end computer as well as to send command and configuration to the FEE and BEE and to generate a feedback of the operating status of the FEE and BEE to the back-end computer. The DAQ software that focuses on the function of the first aspect has a distributed and hierarchical architecture, consisting of three parts connected with Ethernet as shown in Fig. 2 : data forwarding node (DFN), data aggregation node (DAN), and GUI.
DFN forwards data to DAN or transmits status and commands with GUI. Both DAN and GUI run on the back-end computer. DAN is mainly responsible for data receiving and event building. GUI provides a user-friendly and interactive interface for the control and monitoring of the electronics system. DAN and GUI also offer online data analysis and hit-map display to evaluate the quality of the detector. The client-server model is typically used for network programing. Here, DFN and GUI act as clients, and DAN acts as a server.
The upgrade of such a layered design is straightforward, since the number of DFN can be configured according to the experimental requirement, which is suitable for a distributed readout system; furthermore, GUI can be customized without any code modification. In this way, the data readout channel is separated from the system status and the control command transmission channel, thus avoiding interference.
III. IMPLEMENTATION OF DFN
There are two types of DFN: the first is a distributed data readout node that is responsible for data transmission and runs on DRM; the second is responsible for parameter configuration, command control, and status monitoring and runs on the specific DRM. The structure of DFN is shown in Fig. 3 . DFN only forwards the received data, status, and command instead of processing these.
The FEE data interface is a logical interface that is uniformly defined. This is realized by the field-programmable gate array (FPGA). The DAQ interface uses a TCP/IP gigabit Ethernet interface, which is easier to implement via CPU. Therefore, DRM is based on Cyclone V SoC, which combines a hard processor system (HPS) and an FPGA in a single device. This provides a variety of benefits such as higher integration, lower power, smaller board size, and higher bandwidth communication between processor and FPGA.
The HPS consists of a dual-core ARM Cortex-A9 MP Core processor up to 925-MHz maximum frequency, a shared multiport synchronous dynamic random access memory (SDRAM) memory controller, and a rich set of peripherals, including two Ethernet media access controls (EMACs) and a direct memory access (DMA) controller. To decrease the CPU usage, while providing sufficient available CPU resources for the DAQ software, DMA transfer is implemented to realize communication between HPS and FPGA. This requires the driver to run on the embedded Linux operating system of HPS.
A. Multithreading
DFN is designed as a concurrent software based on the multithreading technology. Using Portable Operating System Interface (POSIX) multithreading techniques, the program is divided into multiple independent tasks, which increases the response speed [5] . There are three threads in DFN: a main thread, a data transceiving thread, and a command transfer thread. The main thread connects the client socket to the server and creates other threads with detached attributes. The command transfer thread sends the command for configuration and hardware control to the FPGA, using the write method defined by the driver of HPS-to-FPGA interface. The task of the data transceiving thread is to receive data from the FPGA via the DMA driver of the FPGA-to-HPS interface and then to transmit these to the PC via the Ethernet.
B. Direct Memory Access
The DMA transfer efficiency is critical. At first, DMA interrupts are synchronized, and the entire reading process is serial, which results in a low transmission efficiency of 240 Mb/s of full link. Since the FEE constantly generates a large volume of data, even a small response delay will lead to data overflow.
Therefore, several changes were implemented to the driver and corresponding FPGA logic module. First, the 32-MB DMA buffer was divided into eight blocks used as Fig. 4 . The DMA transfer and kernel interrupt response are executed in parallel, which improves the transmission efficiency to 510 Mb/s. The use of the IRQ-FIFO avoids the loss of interrupt when the interrupt is generated too quickly.
Furthermore, the data transceiving thread in DFN uses the mmap method to decrease the time required for copying data from the kernel space to the user space and to improve the full-link transmission efficiency from 510 to 550 Mb/s.
IV. IMPLEMENTATION OF DAN
DAN runs on the back-end PC and transmits various data with DFN and GUI through the network. In the DAQ software, DAN functions as a server. A structure diagram of DAN is shown in Fig. 5 .
A. Multithreading
DAN utilizes the multithreading technology, comprising five parts: main thread, data collection threads, data saving thread, command processing thread, and online analysis thread. The main thread initializes the list of threads, the list of event building buffer, and the mutex that protects the list of event building buffers, creates TCP sockets, and accepts client connection requests from both DFN and GUI; it then creates data collection threads and a data saving thread. The data collection threads with an equal number to the number of the DFN, receives data from DFN, and writes data to the list of event building buffers according to the trigger id number. Only one data saving thread exists, which will be wakened up by the condition variable; then, assembled data are saved to the file. The online analysis thread read assembled data saved in the file, counts the hit in each position, and then sends the hit-map to GUI.
B. Event Building
The list data structure shown in Fig. 6 is utilized for event building. The number of nodes in the list depends on the data rate and the processing capacity of the data saving thread.
The data collection thread inserts the data into the corresponding node if it received the data frame, defined by the system communication protocol, and increments end_count of the corresponding node if it received the end frame. When the end_count in any node of the event building list is equal to the number of connected DFN nodes, the data collection thread that conducts the last increment of end_count increments the value of the global variable packet_assembled_counter to notify the data saving thread. The data saving thread verifies the value of packet_assembled_counter after waking up, and as long as this is not 0, the data saving thread reads data in the first node of the list of event building and saves them to the file. Then, the node is deleted, and packet_assembled_counter is decremented by 1. After receiving the command to stop working, the main thread closes the file and ends the event building.
From the perspective of the producer-consumer model, the data receiving thread is the producer, while the data saving thread is the consumer.
V. GUI
GUI was programed with the Qt language [6] , which also uses multithreading technology to improve the response rate of the interface application. It consists of a GUI main thread, a transfer thread, and a hit-map display thread.
The GUI main thread displays the status and interacts with users. When a user pushes command buttons, the GUI main thread creates corresponding strings and sends them to the transfer thread. The transfer thread receives status data, converts it to a text message, and then sends this to the GUI main thread. In addition, it responds to the GUI main thread command signal and sends a command string to DFN, DAN, and CTS. The hit-map display thread reads the hit-map information from DAN and displays it in a histogram so that the performance of the detectors can be directly accessed [7] . The structure diagram of GUI is shown in Fig. 7 .
VI. TRANSMISSION TEST
As mentioned above, the total data rate of the readout electronics system will reach up to 6 Gb/s in the future heavy-ion collision experiment. Therefore, the transmission performance of the system is particularly important. To detect the bottleneck of the transmission rate and to improve the performance, tests have been conducted and are described in the following.
A. FPGA-HPS Interface Test
FPGA in SoC generates data; then, the test software reads the data without processing them. In this test, DMA worked at 125 MHz, and the width of the interface was 64 bits, which means that the theoretical transmission speed is 1000 MB/s. As the results shown in Table I , the transmission rate of FPGA-HPS interface increases with the size of the DMA buffer block. However, this increase becomes insignificant when the block size exceeds 1 MB, which indicates that the interrupt response speed exceeds the DMA transfer speed. The maximum rate is 755 MB/s with a block size of 4 MB, which is used in the subsequent tests. This rate is limited by the 80% transmission efficiency of the hardware and the refreshing rate of SDRAM.
When the test software reads data from the driver, the result shown in Table II becomes completely different. Apparently, it is better to use the mmap method than the read method to reduce the time of copying data from kernel space to user   TABLE II   RESULT OF FPGA-HPS INTERFACE TRANSMISSION TEST   TABLE III   RESULT OF DAQ SERVER INTERFACE TEST space. The maximum transmission rate decreases to 240 MB/s because it requires more CPU to copy data to the user space.
B. DAQ Server Interface Test
Iperf is a TCP performance testing tool that was used for the DAQ interface transmission test. Since HPS has a dual-core processor, the multithread case was also tested. To evaluate the performance of HPS, DRM and PC (T9600CPU at 2.8G dual-core, 4-GB dual channel memory) were tested separately as clients, while the back-end computer (i5-3470CPU at 3.2G Quad-core, 8-GB dual channel memory) was used as a server.
The result in Table III shows that the rate can reach 607 Mb/s in a single thread and 714 Mb/s in dual threads of DRM. A single thread has a lower and more volatile rate than dual threads. While DRM was conducting this test, both CPU load and EMAC interrupt frequency were observed. In the case of dual threads, the interrupt frequency was lower than that of single thread (2722 Hz vs. 4940 Hz), and the CPU load was higher than that of single threads (85% vs. 69%). This indicates that the single-core processing capacity of HPS also has a specific impact on the TCP/IP transmission.
C. Single Full-Link Test
Full-link tests have been conducted to verify the feasibility and performance of the DAQ software, where FPGA in TRM generates incremental code as data source, and the server is identical to the DAQ server interface test. The test software basically uses the architecture of the DAQ software in this paper, with the exception that the test version of DAN adds rate calculation section to the data collection module. Timestamps are established in each transmission, and the amount of data transmitted is accumulated at the same time; consequently, the transmission rate could be calculated once every 4 s. The TCP/IP buffer size is 4 MB, which is identical to the size of the DMA buffer block. Moreover, bit rate error (BER) verification was added, in which the data saved in the file are verified as incremental code. Fig. 8 shows the result of the transmission rate for a single data transmission path. The average rate for a single data transmission path is 550 Mb/s, and the standard deviation is 29 Mb/s. This is worse than the SoC FPGA upper limit of 700 Mb/s (see Table III , according to the iperf dual threads test results). The main reason is that the memory 1) The accelerator coherency port interface provided by HPS could be used to solve the cache coherence problem.
2) The DMA driver could be modified to simulate it as a network card device. This would utilize the forwarding function of the kernel TCP/IP protocol stack for data transmission. In this way, a zero-copy operation can be implemented.
3) The data transceiving thread in DFN could be changed to dual threads. Moreover, no error occurred during 1 day of testing, while the function of command sending and status monitoring is confirmed.
D. Multiple Full-Link Test
Multiple full-link tests have also been conducted for 1 h, the experimental condition of which is the same as the single full-link test. The result in Table IV shows that the average rate for two transmission paths was almost a double of the rate of a single path, while the CPU load average of the back-end PC remained within the capability of the quad-core processor. Furthermore, for four transmission paths together, this was 1.6 Gb/s; and this already meets the requirement of 1.5 Gb/s for four transmission paths in the heavy ion collision experiment. The results imply that the front-end multichannel electronic system could meet the requirement.
More upgrades and tests will be conducted in the future. However, in the present production stage of the super module, the transmission rate is sufficient for the quality control since we could only conduct cosmic ray tests and beam tests; the data rate of which is less than 1 Gb/s due to the low case rate.
E. Cosmic Ray Test
Cosmic ray tests with MRPC detectors were also conducted. At the first stage, the signals were digitized by eight TDC boards. The event building and the online analysis in DAQ software worked well as shown by the hit-map in Fig. 9 .
VII. CONCLUSION
The data acquisition software for the quality control of the CBM-TOF super module detector can run stably, and it has a hierarchical structure that extends easily. The functions of event building, command sending, status monitoring, and online analysis in the transmission scheme all worked well. Furthermore, it meets the readout demand for quality control at present as shown via laboratory tests, where a single data transmission path achieved approximately 550 Mb/s data transfer rate in case of full link from FEE to back-end computer storage, and the data transfer rate for four transmission paths is 1.6 Gb/s.
Moreover, several constraints of transmission have been found that can be improved in the future to meet the high rate of 6 Gb/s in the future heavy ion collision experiment.
