Abstract: A data acquisition system for the Angra antineutrino detector is under development. The system is able to digitize, to process and to store the analog signals coming from photomultiplier tubes (PMT), after the front-end electronics. We present here the design of a VME-based data acquisition module which is part of the Angra DAQ as well as the Muon Electronics in the Double Chooz experiment. This module features eight analog-to-digital channels, running at 125 MHz sampling frequency. In order to measure time between PMT pulses, an 82 ps time-to-digital converter is included. A Field Programmable Gate Array is used to implement digital signal processing algorithms on the digitized data, where a FIR filter for optimal pulse amplitude estimation has been implemented. The design and preliminary results with single-photoelectron measurements are presented.
INTRODUCTION
Nuclear reactors play a very important role in neutrino physics. Indeed, neutrinos were first experimentally detected fifty eight years ago by Reines and Cowan [1] using the outcomingν e flux from a nuclear reactor, and observing neutrino interactions through the inverse-β decay,
Besides the fact that nuclear reactors are intense source of antineutrinos, the thermal power released in the fission process is directly related to the emitted antineutrino flux. As antineutrinos interact very weakly with matter and escape the reactor containment without any significant change in their number, measuring the antineutrino flux can provide * Electronic address: hlima@cbpf.br quasi real time information on the reactor status (on/off) and thermal power. It has been shown that antineutrino detectors have the potential capability to monitor nuclear reactors operational status and power level from outside the reactor containment [2] .
The Angra Neutrino [3] is an experiment to observe antineutrinos at the Angra dos Reis nuclear complex in Brazil. The experimental approach is to measure the antineutrino flux with a Cerenkov based detector placed in a very short distance (≤ 30 m) from the reactor core. The target detector is a 1 ton volume of water doped with Gadolinium in order to increase the cross section for neutron capture. Eight inches photomultiplier tubes (R5912, Hamamatsu) will be used for the target detector (32 units) and for two active shielding volumes (4 units on the top and 4 units on the bottom). The target detector will be placed inside a larger volume filled with water, having one photomultiplier tube in each side, to work also as an active shielding. Figure 1 illustrates the target detector design. The two shield volumes fully cover the bottom and the top faces of the target. being designed for the Angra Neutrinos Project. This DAQ will be responsible for the following tasks:
• analog-to-digital conversion of all signals from the target detector and the shield volumes;
• time-to-digital conversion between discriminated PMT pulses;
• trigger decision based on the application of conditions on the PMT signals (energy, multiplicity).
The DAQ has also to communicate and interact with two other sub-systems: Outer Veto and Slow Control. The first one is an outer volume of water covering the target detector, also equipped with a few photomultiplier tubes that should detect muons crossing the inner detector. The Slow Control system will monitor and control parameters like the high voltage of the PMTs, current consumption per PMT, detector and local environment temperature, and water stability. Figure 2 shows an overview of the Angra Neutrino Project, including the main sub-systems.
In this contribution we present the development and first tests of the main data acquisition module in the Neutrino Detection DAQ -the NDAQ module. Results of DSP algorithms applied to real signals are presented. These signals have been generated in laboratory with one of the photomultiplier tubes that will be used in the detector, so that they are assumed to be a good approximation to the detector signals. It is worth mentioning that the module will also be used in a system dedicated to acquire high-energy muons that crosses the Inner Detector of the Double Chooz experiment [4] . This reactor based experiment, located at the Chooz reactor in France, aims at measuring the neutrino mixing angle θ 13 , the most important step towards further progress in the field of neutrino oscillations [5] .
NDAQ DESIGN -THE HARDWARE
The NDAQ module is able to digitize, to process and to store the analog signals coming from the photomultiplier tubes, after being amplified and shaped by the front-end electronics. The pulses coming out of the photomultiplier tubes present spectral content (bandwidth) up to 160 MHz, where the spectrum falls −20 dB. This result was achieved by applying the FFT on the average of thousands of pulses. After the front-end electronics, the pulses present a bandwidth of less than 50 MHz, as it can be noted in Figure 8 , where the samples are 8 ns apart from each other.
The NDAQ design includes eight analog-to-digital channels (AD9627, Analog Devices), each one featuring 12 bits vertical resolution and 125 MHz sampling frequency (due to SNR (Signal-to-Noise Ratio) and layout concerns, only the 10 most significant bits have been connected in the module).
In order to measure time intervals with high precision, for example between PMT hits for tracking purpose in the detector, an 82 ps resolution time-to-digital converter (TDC GPX, ACAM) is included in the design. The TDC works in a Single START -Multiple STOP mode, where time is measured between a START pulse and further STOP pulses. The START input is directly connected to a connector in the module front panel -the Trigger input. The eight STOP inputs of the TDC are connected to the outputs of eight voltage comparators in the module. These comparators receive as inputs the analog signal coming from the front-end electronics and a fixed voltage defined by an on-board DAC (Digitalto-Analog Converter). Besides the 8-channel DAC used for signal discrimination, a second DAC is available for shifting the analog signal baseline, what can be interesting for optimizing the use of the ADC dynamic range.
A single field programmable gate array (EP3C40F484C6, Altera) is used as the core processing, receiving all converted data coming from the ADC and the TDC, computing dynamic variables and implementing digital signal processing algorithms. A second FPGA (EP3C25F324C8, Altera) is used in order to decode and control bus transactions through the VME bus or the USB port. Figure 3 shows a block diagram of the NDAQ design architecture.
Other useful on-board resources included are:
• one 512K x 8 bits SRAM memory for on-board data storage, useful for applications like fast image acquisition systems where the image should be stored on hardware for fast processing (not used in the present application)
• 512K x 32 bits output buffer (FIFO), used to reduce dead time in the data flow
• clock distribution circuit that allows phase adjustments between the ADCs sampling clocks • CAN (Controller Area Network) port for slow monitoring or control of on-board registers
• USB-FIFO interface device used for standalone operation (outside the VME mainframe)
The configuration firmware in the VME FPGA allows block read transfers from the on-board FIFOs to external VME processor boards, reaching a bandwidth of 14 MB/s (tested in laboratory). In the Angra Neutrino Project, considering the worst-case estimated event rate of 1 kHz, including neutrino and cosmic events, the DAQ must be able to deal with 10 MB/s event rate, which is safely below the specified bandwidth. The NDAQ module is built as an eight-layer FR-4 printed circuit board in VME 6U size (16 cm x 23 cm). A picture of the first prototype is shown in Figure 4 .
NDAQ DESIGN -THE FIRMWARE
As described in the previous section, the NDAQ module makes use of two field programmable gate arrays for core processing (Core FPGA) and bus control/decoding (VME FPGA), as it is shown in Figure 3 . For each FPGA, a highly modular firmware has been designed based on state machines, parameterized components, dual-port FIFOs and register chains for clock synchronization. All the firmware blocks have been designed in VHDL (VHSIC Hardware Description Language) [6] . The design entry, synthesis, debug and fitting steps have been developed in the Quartus II Web Edition [7] tool. Figure 5 shows the basic building blocks of the firmware designed for both FPGAs -Core and VME. Converted data coming from the eight ADC channels, and the TDC, feed directly the Core FPGA, which also receives a dedicated clock (Core CLK), the ADC data sampling clock (ADC DCO) and an external trigger pulse.
Data acquisition triggering may be defined as one of the following modes: External trigger, Internal trigger and External+Internal. In the External trigger mode, a signal frame (128 samples, for example) is captured in the eight channels immediately after a pulse arrives in the Trigger input connector. This External trigger works like a global trigger for the whole module, since the trigger pulse is common for the eight channels. In the Internal trigger mode, a digital comparator inside the FPGA (Digital Trigger component in Figure 5 ) is used to start the signal capture. There is one digital comparator for each channel in the FPGA. Therefore, if one of the eight comparators activate, all the channels will be captured for further processing. In the last trigger mode -External+Internal -one can use both information to set a higher level trigger by making a combination of the global trigger (External) and the eight internal comparator outputs. The last mode is currently chosen for normal data taking with the detector, but deeper investigations are still going on to define the best trigger scheme.
A 32-bit output bus communicates the Core FPGA with four external FIFO memories, used for buffering data before final transferring to the VME data bus. The VME FPGA receives the output data from the four FIFOs, a dedicated clock and the control signals from the VME bus. The VME FPGA also interfaces the USB port through an USB Transceiver circuit in the module. Finally, in the data readout chain, the VME Data Bus communicates with the FIFO output buses and the VME FPGA, as illustrated in Figure 5 . In the following sections the most relevant blocks of each FPGA firmware are described.
The Core FPGA
The Core FPGA is the device responsible for receiving and processing data from the ADC and the TDC. In the next paragraphs, a brief description of the main parts synthesized in the Core FPGA is provided.
The block receiving the ADC data is a Finite Impulse Response (FIR) filter [8] . This filter is designed to implement an optimal pulse amplitude estimation in real-time for triggering and event energy reconstruction. Two trigger mechanisms are provided in order to acquire and synchronize a signal frame inside the FPGA. These two mechanisms -external and internal trigger -are represented by the blocks Trigger Cond and Digital Trigger in Figure 5 , respectively.
In order to buffer data coming from the FIR block, two FIFO (First-In First Out) memories are serially implemented inside the Core FPGA. The first one, Pre Fifo in Figure 5 , is continuously filled with the FIR output data as far as data acquisition is running. In the event of a trigger (external or internal), all the content of the Pre Fifo is flushed into the second FIFO -the Post Fifo. By means of this mechanism, it is possible to acquire signal samples before the trigger instant. Assuming that the firmware is designed to capture a signal frame with N samples for each trigger event, and that the Pre Fifo is M samples deep, the FIFOs are designed so that N > M, which means that all the Pre Fifo content will fit inside the Post Fifo, plus the samples after the trigger instant. For instance, in the current configuration, N = 128 and M = 32, so that a signal frame with 128 samples (1024 ns) is captured for each trigger, being 32 samples before the trigger instant and 96 after.
Other useful resources currently implemented in the Core FPGA include: (i) a trigger rate meter (Freq Meter block), (ii) configuration registers, (iii) a slave SPI block, used to communicate with the VME FPGA and (iv) a data builder, which is a collection of components responsible for receiving data from the FIR block, the TDC and the trigger rate meter, assembling them together in order to output them to the external memories.
From the external FIFOs, that represent the last buffer in the module, data can be transferred directly to the VME data bus or, through the VME FPGA, to the USB port.
The VME FPGA
The synthesized logic in the VME FPGA is responsible for bus protocol decoding and control for the VME and the USB communication paths. It means that the FPGA controls, with the exception of the CAN port, all communication available between the NDAQ module and external devices.
In order to establish a communication path between both FPGAs, a component called Master SPI has been designed in the VME FPGA. This component is directly controlled by commands sent through the VME bus. The Master SPI and the Slave SPI component, this one in the Core FPGA, communicate each other so that the VME bus can access configuration registers in the Core FPGA. The Master SPI component is also used to access configuration registers in the VME FPGA.
Concerning the USB communication path, the FT245BM Interface block, shown in Figure 5 , has been designed in order to receive/transmit data from/to the external USB interface chip (FT245BM, FTDI Chip). All the USB protocol is handled on-chip, simplifying the module design. 
MCP2551 PIC18F2680

The slow-control circuit
In order to access -read and write -some critical devices in the module, without disturbing the main data flow, a third communication port has been implemented. This dedicated port makes use of the standard CAN communication protocol [9] and it allows configuration, calibration and debugging of the module. Figure 6 shows the devices used to implement the CAN communication -a physical-layer transceiver (MCP2551, Microchip) and a microcontroller (PIC18F2680, Microchip) -and the devices controlled -the clock distributor for the ADCs, the four ADCs and the two DACs.
OPTIMAL FILTERING FOR PULSE AMPLITUDE ESTIMATION
The challenge for the Angra detector comes from the fact that the detector will be installed on the surface, resulting in a huge background contamination of the measurements, mainly due to cosmic ray interactions. Considering the detector active volume, our studies indicate a background rate of 898 Hz for cosmic muons and 291 Hz for neutrons, while 0.06 Hz is the expected neutrino rate. This scenario imposes a complex situation to the online event selection (triggering) procedure. The Angra trigger system requires good energy resolution in order to reconstruct SPE (Single Photoelectron) interaction signatures, coming from neutrino event candidates. SPE pulses typically present low SNR, since they correspond to low integrated charge signals. Due to specific front-end electronic characteristics, the digitized pulse can be modeled as having a fixed shaping function and an amplitude proportional to the PMT integrated charge. These features allow the use of an Optimal Filter (OF) to project a minimal variance amplitude estimator, based on the acknowledgment of the pulse shape and the second order statistics of the digitized electronic noise [10] - [11] . The algorithm and the hardware implementation of the proposed filter is presented in this section.
The Optimal Filter Algorithm
Equation 2 defines a model for the received pulse r as it is delivered by the ADC. Each sample r i is composed by three components:
1. The known pulse shape sample s i , weighted by the signal amplitude A to be recovered.
2. An unknown pedestal P, modeling the PMT baseline fluctuation, which can be considered constant in a short time window.
3. The electronic noise samples n i , modeled as a zero mean Gaussian distribution with known covariance matrix C.
The model given in Equation 2 does not use the pulse phase information commonly observed in colliders and medical diagnostic detectors [12] - [13] . In those detectors, where fast signals need to be used to avoid pile-up effects, two OF are implemented: one for phase and other for amplitude estimation. Combining information of phase and amplitude, one can perform an iteration procedure in order to match the correct pulse-shape for amplitude estimation. Obviously, the iteration procedure becomes prohibitive for real time environments like triggering systems, since it would require several steps to converge. Besides, in order to enable the use of phase information in a linear OF model, it is required to perform approximations on the model. One common procedure is to use the first term of a Taylor expansion around the signal peak. That leads to sub-optimal estimators for out-oftime signals and demands acknowledgment of pulse-shape derivatives, compromising the efficiency of the energy reconstruction procedure.
The proposed model benefits of the low event rate in the neutrino detector to implement a large signal shaper (integrator) on the front-end electronics. With a large number of digitized samples, it is possible to achieve good peak resolution, avoiding the use of the phase information in the model. Unfortunately, that approach amplifies the low frequency components of the noise, imposing a huge baseline fluctuation.
According to this model, the optimal filtering problem consists of finding the estimatorÂ by performing a weighted sum of the r i samplesÂ
where the constants a i need to be determined for minimal variance results. Merging Equations 2 and 3, we havê
To achieve the desired amplitude A, we apply the following constraint on the weights a i
The front-end shaper contains a high integration factor in order to obtain a long signal for good peak resolution. Unfortunately, that approach imposes high baseline fluctuation on the front-end output, modeled by the variable P on Equation 2. The baseline P needs to be estimated, event by event, and then subtracted before amplitude estimation. Although this can be easily accomplished in offline analysis, for real time triggering it is not a trivial task. The solution proposed here consists of adding one more constraint to the model in order to eliminate the baseline contribution on the output of the optimal filter:
The variance of the estimator is given by
where C is the known noise covariance matrix. The procedure of finding the constants a i for optimal amplitude estimation consists of minimizing Equation 7 with respect to the constants a i , with the constraints given by Equations 5 and 6. We use the minimization method by Lagrange Multipliers [14] . This approach results in a system with N + 2 equations, where N equations are given by vanishing the derivative of Equation 7 , plus Lagrange Multipliers terms to each sample with index k
and the other two equations are the constraints given by Equations 5 and 6. In Section 5 we perform this computation on experimental results given by SPE measurements.
Hardware Implementation
The FIR filters for optimum amplitude estimation are implemented on the Core FPGA (see Figure 5) . A total of eight real-time filters, one per ADC, are syntesized in parallel, running at 125 MHz. The high density feature of the Core FPGA allows the implementation of FIR filters with a large number of coefficients. The width of the pulses coming from the front-end shaper in the Neutrinos Angra experiment is about 800 ns. Therefore, filters with approximately 100 coefficients are used, achieving the high peak resolution (1 % of the pulse width) needed for real-time optimal amplitude es- timation without amplitude-phase iterations, as explained in the previous section.
In a first approach, in order to implement eight FIR filters, each one with 100 coefficients, it would be needed at least 800 dedicated multiplier blocks in the FPGA. Since the number of those blocks in the Core FPGA is 126, we propose an alternative implementation by using only logic cells. Combining the LUT (Look-Up Table) + Register architecture of FPGA logic elements with the transposed structure implementation of FIR filters [8] , it is possible to run filters at the maximal frequency allowed for the Cyclone III FPGA. In the direct FIR filter structure, the whole computation (filter tap weighting followed by the sum of all terms) must be performed in a single iteration (one clock). This would lead to a very big combinational logic and the maximal allowed frequency of operation would be around 20 MHz. Using the transpose structure, this huge combination logic is pipelined in small multiplier-adder sequences, allowing the filter to run at 125 MHz. Figure 7 shows the transposed structure for FIR filter implementation. The pipelined multiplication-summation structures are smoothly accommodated on a Logic Array block of the Cyclone III FPGA, enabling maximal throughput of 250 MHz. The triangles represent the gain of each tap (filter coefficients).
EXPERIMENTAL RESULTS
An experimental apparatus composed by a single PMT within a dark chamber and a front-end channel from the Neutrinos Angra detector has been built. A Light Emitter Diode (LED) is used to inject synchronized photons inside the chamber. The front-end electronics is connected to the NDAQ module for data acquisition. Figure 8 shows three typical synchronized single-photoelectron pulses, measured by channel 1 of the NDAQ without the FIR block. One can note that the low frequency noise spectrum and the signals have similar components, reducing the signal detection and amplitude estimation efficiencies without a pre-processing step.
An offline processing has been performed in order to synchronize those pulses and obtain an average pulse s i , which will be used as the reference pulse shape for the optimal filtering coefficients computation. Figure 9 shows the reference pulse for N = 100, after amplitude normalization. Also, asynchronous acquisition has been performed in order to store raw-data containing only noise. This information was used to estimate the 100 × 100 noise covariance matrix C 100×100 , as shown in Figure 10 . It should be noted that the noise samples are highly correlated due to the high integration factor of the front-end shaper circuitry.
By using the information of s i and C 100×100 computed from experimental data, one can use Equations 5, 6 and 8 to build the 102 × 102 system of equations. Solving this system for a k , with k varying from 1 to 100, we find the 100 constants to be used in the Optimal Filtering. Those constants are shown in Figure 11 . Figures 12 and 13 shows histograms of the peak value from synchronized single-photoelectron measurements without and with the FIR block. It is very clear a better separation between the noise peak (on the left) and the singlephotoelectron peak (on the right). The peak-to-valley ratio increases from 2.6 to 4.2 when the Optimal Filtering is ap- plied.
The noise distribution before and after the FIR block can be seen on Figures 14 and 15 , respectively. The noise standard deviation decreases from 24 mV to 12 mV . Assuming the signal-to-noise ratio as the peak of the SPE spectrum (180 mV ) over the noise standard deviation, the SNR without and with the FIR processing are 7.5 and 15, respectively. The Optimal Filtering doubles the SNR for amplitude estimation.
CONCLUSION
The development of a new data acquisition module, as well as the preliminary results of a high-speed optimal pulse amplitude estimation by using FIR filtering have been presented. This new module -NDAQ -is part of a data acquisition system currently being designed for the Angra and Double Chooz detectors. The designed card is able to digitize and the ADC and TDC conversion circuits. By means of a realtime optimal FIR filter, the peak-to-valley ratio in singlephotoelectron spectrum is increased by 60 % and the peak amplitude estimation resolution is doubled. These first results demonstrate that the proposed techniques are extremely important and promising for an experiment with a huge background like Angra.
