The main goal of the NA62 experiment is to measure the branching ratio of the K + → π + νν decay, collecting O(100) events in two years of data taking. Efficient online selection of interesting events and loss-less readout at high rate will be key issues for such experiment. An integrated trigger and data acquisition system has been designed. Only the very first trigger stage will be implemented in hardware, in order to reduce the total rate for the software levels on PC farms. Readout uniformity among different subdetectors and scalability were taken into account in the architecture design.
Introduction
The NA62 experiment at the CERN SPS aims at measuring O(100) K + → π + νν events in two years of datataking. The theoretical cleanness of the Standard Model (SM) branching ratio (BR) predictions for this decay mode makes it very attractive both as a powerful test of the CKM paradigm and as a probe for new physics beyond the SM. Experimentally, the detection of this process is very difficult due to the smallness of the signal (in the SM the expected BR is at level of ∼ 7 × 10 −11 ) and the presence of a very sizeable concurrent background, mainly from K + → π + π 0 decays. The present measurement of this decay channel is based on 7 candidates collected by E949+E787 Brookhaven experiments [1] leading to a value of BR = (1.47
+1.30
−0.89 × 10 −10 ). NA62 is a fixed target experiment in which a charged hadrons beam, containing ∼ 6% of kaons, will be produced from 400 GeV/c protons from the SPS accelerator. The kaon decays in flight will be studied in a fiducial region ∼ 100 meters long, placed in vacuum in order to reduce secondary interactions. The decay products' and the primary particles' momenta will be measured with high resolution by STRAWS and GIGA-TRACKER spectrometers, in order to achieve good signal reconstruction and kinematic rejection. An efficient veto system for photons and charged particles (LAV, LKr and SAC) and the PID system for primary particles and decay products (CEDAR, RICH and MUV), will guarantee the identification of decay modes not kinematically constrained. In fig.1 a schematic view of the experiment is shown. In order to collect the required number of events in a reasonable amount of time, a very intense hadron beam will be employed (3 × 10 12 proton per SPS pulse will produce ∼ 5 × 10 12 K + per year) . An efficient online selection of candidates represents a very important item for this experiment,because of the large reduction to be applied on data before tape recording. On the other hand a loss less data acquisition system is mandatory to avoid adding artificial detector inefficiencies, e.g. when vetoing background particles. This paper will focus on the general architecture of the integrated DAQ and trigger system for the NA62 experiment. 
Requests to DAQ and Trigger systems
The rate of events in the decay region is strongly dominated by background. According to preliminary simulations the rate on the main detectors is around 10 MHz (table 1 ). An additional rate of at least ∼ 1 MHz of muons coming from the beam production target, must be taken into account. In this environment the requests to the DAQ and trigger systems are:
• Very low DAQ inefficiency (< 10 −8 );
• High trigger efficiency (> 95%);
• Fully monitored systems;
• Readout without zero suppression for candidates;
• Low random veto probability at trigger level;
• Scalability in terms of bandwidth;
The first request is uncommon in other DAQ systems, but it's crucial for the NA62 experiment, where the full reconstruction of the background is an important issue. For the same reason zero suppression, mainly in the veto detectors, must be avoided as much as possible during the acquisition process. A good trigger efficiency can be obtained using information coming from several detectors with an excellent time resolution, in order to reduce the random veto probability. Thanks to the trigger system the final acquisition rate will be of the order of tens of KHz.
NA62 trigger and DAQ architecture
A fully digital and integrated DAQ and trigger system has been designed to fulfill the requirements presented in the previous section. The digitization in the early stage of the readout system allows efficient monitoring of each stage of the chain, in order to detect any possible source of losses. The trigger system will be split in two levels: the first stage (L0), implemented in hardware using FPGAs, will be used to reduce the total rate to ∼ 1 MHz, while the second and third stages (L1 and L2) will be completely software based exploiting powerful PC-farms with large input bandwidth. The data accepted by the L2 will be directly transmitted to the EB (event builder) farm, to be permanently recorded. The factor ∼ 10 in rate reduction at the first stage, will be obtained by a L0 trigger processor (L0TS) using informations coming from RICH, LAV, LKr calorimeter and MUV detectors. The trigger primitives from each detector involved in the L0 trigger decision, will be built directly in the same data acquisition board devoted to digitization and monitoring. For all the detectors (apart from GIGATRACKER) the building block of this system will be the TELL1 mother board developed for the LHCb experiment [2] . The TELL1 board (9U format) houses 5 Altera Stratix FPGAs allowing a fully customable configuration. A total RAM memory of 384 MB gives the possibility to store the data in a first buffer stage, waiting for the trigger decision delivered to the board through the TTC [3] interface. A credit card PC (CCPC) allows to control all the functionality of the board. The output stage uses a quad GigaBit Ethernet card (total output bandwidth of 4 Gb/s). The input stage can be adapted to different purposes using 4 custom daughter boards. On this daughter boards, for instance, the analog data coming from the detector front end could be digitized. The use of uniform system for all the subdetectors allows to have a common fully integrated trigger and readout architecture, exploiting the possibility to use the same data chain to monitor the whole system and avoiding the complications due to independent trigger and readout chains.
The TDC board
For the definition of trigger primitives and offline data analysis, several subdetectors will provide the time of arrival of a given events. Time resolution of O(100 ps) have to be guaranteed at event rates of O(10 MHz) and a good online time resolution is also important for the trigger. For this reason we have developed a daughter board (10 layers PCB) for the TELL1 motherboard, providing 128 TDC channels with 100 ps time resolution per channel ( fig.2) . Each mezzanine houses 4 HPTDC chips (developed at CERN [4]) controlled by an Altera Stratix II FPGA used for preprocessing (an on board SRAM memory is also provided for this purpose) and monitoring. Miniaturized connectors are present on both sides of the board, allowing the connection of 128 channels from the subdetectors front-end. Particular care has been used to assure a very good clock stability. The 40 MHz clock coming from the TELL1 is stabilized by the Stratix II PLL and an external quartz controlled QPLL [5] . After filtering residual noise from DC-DC converters, detailed tests showed that the level of the jitter in the clock is below 20 ps. The time resolution has been measured in a test beam with a RICH prototype (with 400 photomultipliers), and found in agreement with the expectations. A very compact readout system of 512 TDC channels is obtained by mounting four TDC boards on a TELL1. In the TELL1 FPGAs the fine time multiplicity is computed, crucial to define the trigger primitives, by exploiting the high time resolution given by the TDCs; two or more TELL1s can be connected together in a daisy chain using two Gigabit links dedicated to sending trigger information, in case the subdetectors need more than one TELL1 for readout. 
LKr calorimeter readout and trigger
The LKr calorimeter was built for the NA48 experiment [6] to provide excellent energy, time and space resolution. In the NA62 experiment it will be mainly used as veto counter for forward photons from the decay region. In spite of that, we want to profit from the good performance of the calorimeter both for background studies and to add other interesting physics cases to the NA62 main program. To do that, LKr calorimeter electronics will provide both time and pulse-height informations. An effective approach, already used in NA48, is to perform a continuous sampling with flash ADCs instead of using two separated time and charge measurement. The LKr is composed by ∼ 13500 channels sampled at 40 MHz with a resolution of 14 bits. No zero suppression applied at the L0 trigger rate of 1 MHz, would result in a quantity of data of 1 TB/s: this bandwidth can't be substain by the existing NA48 LKr readout, designed to work at much lower intensities. Such system has been modified in order to have larger buffers (1 GB DDR2 per channel) and faster connections. The old CPD boards, used to digitize and compute analog sums of groups of cells for trigger purposes, will be reused in 220 CARE modules connected with ∼ 900 Gigabit links to a readout farm (∼ 360 processor nodes). The 892 analog sums for the trigger (groups of 8x2 cells) will be sent to a system of 28 TELL1 boards housing 32 channels of ADCs each (developed by LHCb collaboration), to provide the first layer of the calorimetric trigger. A second layer of 3 TELL1 boards equipped with Gigabit mezzanine receivers (under design) will produce the LKr trigger primitives for the L0 central processor.
L0 central processor
The L0 central processor or L0 trigger supervisor (L0TS) will collect the information from all the detectors participating to the L0 trigger and take the final decision. Montecarlo simulations showed that a factor 10 in rate reduction can be obtained using RICH, LAV, LKr and MUV informations. The trigger decision will be dispatched to the TELL1 boards and other readout systems trough TTC. Three solutions are under investigation to realize the L0TS:
• PC with fast I/O connections;
• TELL1 equipped with gigabit receivers;
• Custom dedicated card.
The first solution is limited by the request to take decisions with a stable latency of one ms, depending on the front end buffer size in some critical detectors. The possibility to have such a latency, given by the large buffers in the TELL1, will be exploited to compensate the ethernet intrinsic latency and the computing time in the PC's solution.
L1 and L2 levels
The L1 trigger will be totally software. For each subdetector a dedicated PC (or a small cluster of PCs) will be used to implement fast reconstruction to apply single subdetector standalone algorithms (clusters presence in the LKr, tracks direction and momentum in the Straws, etc.). The input event rate for these PCs will be ∼ 1 MHz. The data will arrive at the L2 PC farm through a commercial GBE switch. At this level the full event will be completely reconstructed and more sophisticated high level trigger algorithms will be implemented, with the request of reduction at total rate of tens KHz for permanent recording on tape. Assuming a single event size of 10 KB (heavily dominated by LKr and Gigatracker) the total bandwidth at the end of the chain will be of the order of 100 MB/s.
