Abstract-The ATLAS detector is designed to observe proton proton collisions delivered by the LHC accelerator. The ATLAS
Trigger and Data Acquisition (TDAQ) system is responsible for the selection and the conveyance of physics data, reducing the rate of stored events from the initial 40 MHz LHC frequency to several hundreds Hz. The TDAQ system is organized in a three-level selection scheme, including a hardware-based first level trigger and second-and third-level triggers implemented as software systems distributed on commodity hardware nodes.
While this architecture was successfully operated well beyond the original design goals, the accumulated experience stimulated interest to explore possible evolutions. In this paper, we report on the performance of the current system and design of the new system, with particular attention to the prototyping effort that allowed to spot possible limitations and to demonstrate benefits. 10 34 cm-2 s-l. In 2011 LHC delivered a peak luminosity of 3.42 x 10 33 cm-2 s-l in 31 weeks of proton-proton collision runs at yS = 7 TeV with 50 ns bunch crossing interval. In 2012, the center-of-mass energy is increased to 8 TeV and the peak instantaneous luminosity exceeded 7.7 x 10 33 cm-2 S-l.
II. ATLAS TRIGGER AND DATA ACQUISITION
The ATLAS Trigger and Data Acquisition (TDAQ) system is designed to reduce the event rate from a 40 MHz nominal bunch crossing rate to several hundreds for permanent storage and is outlined in figure 1 . Left-hand side and right-hand side of the figure illustrates the trigger part and the data flow part respectively. TDAQ is composed of three levels. Level l (Ll) is built in hardware. Level 2 (L2) and Event Filter (EF) are based on custom, distributed software running on commodity PCs. L2 and EF form High Level Trigger (HLT).
The L1 trigger uses coarse information from calorimeters and muon systems and has a latency of less than 2.5 fJS. It is designed to reduce 40 MHz input rate to 75 kHz. It identifies Regions of Interests (RoIs) where the detector exhibits interest ing activity. For events accepted by L 1, these RoIs are passed to the L2. Meanwhile data on detector front-end electronics are sent to detector-specific ReadOut Drivers (ROD) to be assembled and pushed to the dedicated memories on ReadOut System (ROS) PCs through � 1600 optical links [2] . These RoI data is only a few percent of the full event and it is prepared by a dedicated hardware component called RoI builder [3] . L2 trigger applies a set of fast and coarse selection algorithms on RoI data to reduce the event rate from 75 kHz Additionally, HLT computing resources were incremented by �50% by introducing new nodes through a rolling replace ment policy though this led to an increased heterogeneity of the both network and CPU resources of the HLT farms.
During 2012 winter shutdown, another sixteen of the XPU racks were replaced and an additional BE network core router was installed to provide redundancy. Meanwhile, ROS soft ware was modified to collect calorimeter summary information from all calorimeter front-end electronics and make it available to L2. This provided a possibility to allow trigger selection based on missing transverse energy at L2. Substantial effort was also dedicated to improve monitoring tools, bottleneck prediction and automatic recovery procedures and they are described in [5] .
B. Pileup dependency
The cost of achieving nearly the design luminosity at the half the bunch-crossing rate was resulted in higher than expected average number of interactions per bunch crossing (\/L)). Current TDAQ has proven itself beyond expectations, how ever it has been also observed that it could be improved further, providing more scalability, flexibility and homogeneity. In order to achieve that L2, EB and EF are merged together in to one system. This merge simplifies the farm and network structure as well as workload balancing greatly. A schematic representation of evolution design is given in figure 5 . In this design HLT farm will be composed of HLT Nodes, each of which contains one Data Collection Manager (DCM) and one or more HLT Processing Units (HLTPU).
DCM will be in charge of data collection, caching and in tegrity. It will handle the event assignment to HLTPUs and all data flow of an event, fetching from ROSes, assembly, transfer to HLTPU and, for accepted events, transfer to Data Loggers.
This provides opportunity for more flexible approaches such as incremental event building and pre-fetching and caching.
HLTPU process will execute both L2 and EF algorithms. It will start processing with RoI data that is assigned by DCM.
As algorithms progress the HLTPU will request more data and eventually the full event. This will have several advantages compared to older system. Since both L2 and EF processing is done in same application, L2 data is already fetched and unpacked, L2 Results are in memory and don't need to be serialized or transported through network. Also event building will be initialized inside L2 algorithms, after first positive trigger selection. This enables possibility of tuning trigger selection execution sequence in order to minimize L2 latency, reducing loads on ROS buffers. HLTPU process will fork child process to reduce memory utilization per process through copy-on-write feature of the linux kernel [6] . Main HLTPU process will be used as a template and won't be processing any events. Instead it will monitor the child processes, kill stuck processes and will fork new processes if necessary. Fig. 7 . Scaling behavior of HLTSV prototype per HLTPU. HLTPU processes executing CPU burning algorithms that simulate realistic data access patterns are used for emulating expected nominal parameters. The line is calculated rate from emulated parameters [7] .
In the evolution design, HLT SuperVisor (HLTSV) will get the L1 Result from RoI builder and assign them to one of the available DCMs in HLT farm. It will provide load balancing, hiding heterogeneity of hardware resources.
The new design is being studied by prototype applications.
A raw prototype was implemented to validate the high level design. Then dedicated studies were performed, via prototypes, to evaluate different network libraries. Figure 7 shows scaling behavior of one of these prototypes per number of served HLTPUs. Since HLT steering code is not ready, the plot is prepared by emulating expected nominal request rates using dummy CPU burning algorithms during one of the technical stops of LHC. These algorithms simulated realistic data access patterns of the algorithms running on HLT trigger farms. The line on the plot is calculated from emulated parameters. Plot shows that the scalability of the HLTSV matches very well with the calculated rates. Being a mission critical component main focus of these prototypes were on HLTSV and all devel oped prototypes sustained required 100 kHz rate. Therefore a single HLTSV application seems to be enough to supervise whole HLT farm.
Prototypes for other components are also being developed to study evolution design. A demonstration implementation of DCM has been successfully used in HLTSV prototype tests and a partially functional implementation for HLTPU is ready for testing.
IV. CONCLUSION
The ATLAS TDAQ system shown great success between 2010-2012 with approximately 93.8% overall data taking efficiency while operating beyond its design specifications. This is achieved by exploiting the flexibility of the design.
During winter shutdown periods some of the hardware was changed through a rolling replacement scheme. Effects of high pileup rates are successfully kept under control and TDAQ kept operating smoothly throughout most of the year.
During data-taking period some improvement possibilities for the current design have been identified and a new evolution design have been prepared. This design, based on merging L2, EF and EB components in to a single system, has been studied with several prototype applications. No important limitations were observed so far. The first implementation of new design is scheduled for the beginning of next year.
