ABSTRACT: The Data Acquisition System of the LZ experiment, the 10-tonne dark matter detector to be installed at the Sanford Underground Research Facility (SURF), will collect signals from 788 photomultiplier tubes (PMTs). Because the signals from the time projection chamber PMTs will be passed through dual-gain amplifiers, the DAQ system will collect waveforms from a total of 1276 channels, using custom built, 32-channel, FPGA-based digital signal processors. The appropriately conditioned signals will be digitized at 100 MHz with 14-bit resolution. Based on actual measurements with a small-scale prototype system, the LZ DAQ is expected to be able to handle a maximum sparsified data rate of 1500 MB/s. During calibrations, it is estimated that only 33% of the system resources are utilized. The digital filters that are used for data selection operate with an aggregate throughput in excess of 595,000 MB/s. Data selection decisions are based on, for example, the amount of scintillation (S1) and photoluminescence S2 light, S1 and S2 hit-patterns, and total energy deposition.
Introduction
LZ is the next-generation Dark Matter Search experiment that will be deployed at the Sanford Underground Research Facility (SURF) as a successor to the Large Underground Xenon (LUX) detector [1] . LZ is dual-phase xenon detector that will have a total xenon mass of 10 tonne. The TPC region accounts for 7 tonne with a fiducial volume of 5.6 tonne [2] . Figure 1a depicts the main parts of the detector. The detector will be instrumented with 488 TPC PMTs, 180 skin PMTs (looking at the outer TPC Xe volume), and 120 outer detector PMTs.
The principle of event detection is shown in Fig. 1b . When an incoming particle interacts with a xenon atom scintillation photons and ionization electrons are created. The scintilation photons result in a prompt S1 signal. The electrons drift towards the liquid surface and are extracted into the gas phase where a secondary wider S2 signal is generated. From the relation between the S1 and S2 signals, the type of interaction can be inferred. The time separation of the S1 and S2 signals provides information about the depth of the initial interaction, while the S2 illumination pattern on the top PMT array can be used to determine the planar position of the initial interaction [1] . 
Data Acquisition System
A simplified schematic operation of the LZ DAQ system is shown in Fig. 2a . At the xenon to air interface, the signals are shaped and amplified with an array of custom built amplifiers. The signals are digitized and sent to the internal circular waveform buffers where baseline-suppressed pulse waveforms are stored [3] . Reduced quantities, such as hit-vectors, multiplicity counts, and pulse area, are extracted and passed to the data sparsification system (DS), where they are analyzed in real-time in search for events of interest. Once a potentially good event is identified, the data are off-loaded from the circular waveform buffers for further off-line analysis.
The breakout amplifiers, provide dual gain outputs for the TPC PMT signals and single gain outputs for the skin and outer detector PMT signals. The dual-gain maximizes the available dynamic range and thus extends the spectrum of interactions that can be probed with the LZ detector [4] . The PMT channel count is summarized in Fig. 2b . Thus the LZ DAQ system must be able to handle at least 1276 digitization channels.
The digitization of the analog signals is performed at 100 MHz with 14-bit resolution, using custom built, 32-channel, FPGA-based DDC-32 digital signal processors. Based on the LUX event sizes, it has been determined that the amount of internal FPGA memory is sufficient to hold events expected in LZ. For example, a typical 83m Kr event is predicted to occupy about 5% of the available waveform memory. The hardware is developed in collaboration with Skutek Instrumentation [5] , while the entire firmware and software is developed by the LZ collaboration. The decision to use custom hardware was based on cost savings and previous experience from the LUX experiment that showed that full access to hardware design details as well as control over the software and firmware is crucial to the success of experiments of this size and complexity.
Figure 3 schematically shows the top-level representation of the entire LZ DAQ system. The data collection (Fig. 3c ) and data sparsification sides of the system share a common set of digitizers (Fig. 3b) . This approach is beneficial because both sides of the system work with a single common digital representation of the incoming PMT signals. This greatly simplifies off-line cross checking and system performance evaluation.
Most of the data is sent between system boards using custom serial links, built on FPGA SerDes blocks [6] , using HDMI cables as the physical transportation medium (indicated with red arrows in Fig. 3 ). In our prototype system [7] , the links have been made to work reliably at 225 MB/s. The HDMI cables will also be used to distribute the global clock and synchronization signals from the DAQ and Sparsification Master units to the lower level units.
When an event of interest is off-loaded, it is first extracted by the Data Extractors (DE) where the serial waveform data is packaged into a series of FPGA generated Ethernet packets. These packets are sent to Data Collectors (DC). The dedicated Ethernet link between each DE and DC pair is a 1 gigabit User Datagram Protocol (UDP) link. Because we implement consistency checking at the application level and this UDP link is setup in a point-to-point configuration, it can be used reliably, with minimal overhead and close to full gigabit bandwidth. In our prototype system, with some tweaking of the DC network stack, we have been able to send data continuously at 109 MB/s without corruption or data loss. This will be the most stressed data off-loading link in the DAQ system and yet we do not expect to utilize more than 33% of its bandwidth. Considering that there are fourteen DE/DC pairs in the system, we expect it to be able to collect data at a peak rate of 1500 MB/s. The data stored on the DCs is organized by channels and made available to the Event Builder (EB) for full event assembling.
The DCs are server-grade, rack-mountable (2U) workstations, which are custom built from off-the-shelf parts. We have assembled a few units and verified reliable data transfer form the DE to the solid-state drive (SSD) in the DC. The DCs will have adequate storage resources (~1 week) to buffer the collected data in case of network outage problems, which might temporarily prevent the data from being transferred to off-site storage facilities.
Each board in this system has an on-board Blackfin processor [8] which runs a µClinux operating system [9] . This allows for convenient system setup and operation over a dedicated Ethernet control network (indicated with brown arrows in Fig. 3 ). Because of the sheer volume of data that come from the PMTs, where just the volume associated with PMT dark counts amounts to 8.5 PB over the course of the experiment, a real-time event selection is needed and is done by the data sparsification side of the system (Fig. 3a) .
All digitization channels have a set of two digital filters that provide an output that is proportional to the area of the input pulse and perform real-time baseline subtraction. One filter has a width that is matched to a typical S1 pulse contribution seen by a single PMT (~60 ns FWTM) while the other is matched to a typical S2 pulse with a FWTM width of a few microseconds. The output of these filters is used to create hit and multiplicity vectors. Based on the experience of signal processing for the LUX experiment [10] , as the starting point, we plan to use filters shown in Fig. 4 . Since these are essentially integrating filters, a large S1 pulse may trigger the S2 filter if its area exceeds the S2 filter threshold. We are working on techniques based on the relative outputs of the S1 and S2 filters that allow for more robust S1/S2 pulse type discrimination. The total expected throughput of the data processed by these filters is estimated to exceed 595,000 MB/s [11] .
Each DDC-32 digitizer provides a digital sum of all or selected channels, allowing for construction of a total sum at the Data Sprasification Master (DSM) (Fig. 3a) . This sum is processed by its own set of digital filters and allows the total pulse area to be part of the decision making process.
DS information, such as hit/multiplicity vectors and the sum waveform are off-loaded with each event and merged with the DAQ data stream. This enables off-line data sparsification performance verification and optimization.
The entire system is intentionally grouped by main detector sections (TPC, skin, and outer detector). Such grouping allows for more optimal data processing, because each of the groups has different event characteristics (e.g. no S2-like signals in the outer detector). This simplifies event selection for each group. The system will accommodate external triggers, such as a system wide heartbeat or a LED calibration synchronization signals.
The LZ DAQ System is designed to handle event rates up to~250kHz. This is much higher than the highest expected event rates in the LZ detector, which will occur during LED calibrations (~4 kHz). Figure 4 . a) S1 filter whose width (n) is matched to typical S1-like pulse. b) S2 filter whose width (4n) is matched to typical S2-like pulse. c) Example of S1 filter in operation.
Conclusions
The LZ data acquisition system must be capable of digitizing and processing over 1200 PMT channels. The system under development provides real-time event selection and a total data off-loading rate capability of~1500 MB/s. The system utilizes computational parallelism offered by FPGA technology and processes waveforms at rates in excess of 595,000 MB/s. A test system, representative of two full signal chains, is currently being put together. This test system will allow for verification of signal propagation from actual PMTs, all the way to event files on disk. Additionally elements such as data throughput, system control and monitoring will be developed and optimized. This first test stage is expected to be completed by spring of 2016 [7] .
