We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.
The Large Hadron Collider (LHC) will enter in the High Luminosity (HL) era, named as HL-LHC, around 2025 with a nominal levelled instantaneous value of 7.5 × 10 34 cm −2 s −1 , and the goal of the HL-LHC upgrade is to maintain the necessary performance of the precision measurements [1] . In order to improve the muon trigger system rate under that challenging condition, the ATLAS Experiment [2] will include Muon Drift Tube (MDT) chamber information to the Level-0 trigger, making use of Regions of Interest (ROI) constructed by the surrounding detectors, Figure 1 . 
Hardware Description
As illustrated in Figure 2 , raw hits are received on three groups of Multi-Gigabit Transceivers (MGTs) links from inner, middle and outer MDT stations, ROI data are received on a single fiber per sector from the Muon Sector Logic (SL), and track segments found by the station processors are passed to the track fitter, which transmits the fitted track parameters to the global Muon trigger. Figure 3 shows that all hits are buffered independently in the DAQ buffer and are matched to time windows surrounding L0/L1 trigger accept signals with matching hits sent to the FELIX system. There are a total of 192 copies of this logic (16 sectors, A/C/Barrel/End-cap, Inner/Middle/Outer) implemented in a set of 64 Advanced Telecommunication Computing Architecture (ATCA) carrier boards. 
Hit Extraction
An ROI is determined in the barrel SL from a coincidence of hits in the Resistive Plate Chambers (RPCs) trigger system likely to be originating from a single track. These hits are used to reconstruct an ROI segment per MDT station, Figure 4 to the end-cap and the Thin Gap Chambers (TGCs) trigger system. Matching of MDT hits to ROI, Figure 4 (b), will be performed in an Field Programmable Gate Array (FPGA) on the carrier board. Tube coordinates are transformed to convenient station-local coordinates. All MDT hits falling within a window centered on the ROI are matched by identifying the unique pair of MDT tube identifications from the innermost and outermost MDT layer with respect to the interaction point. These windows are different for each station. The matched MDT hits are then calibrated (drift time converted into distance) and sent to the sector processors on mezzanine boards.
(a) ROI segment reconstruction. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 
Segment Finding
After the hits are selected in accordance with an active ROI, a segment is reconstructed for each MDT station. One of the proposed designs under consideration for this stage is the contentaddressable memory, also known as Associative Memories (AM). The AM devices store a library of all possible track patterns and compare actual hits against the track patterns, producing a lowresolution segment candidate, Figure 5(a) .
The second approach, Figure 5 (b), uses FPGA logic to implement a Legendre transform based segment finder. This logic evaluates in parallel a total of 128 possible track segment angles for each MDT hit, calculating in a fast FPGA pipeline the offset of each track candidate from an arbitrary origin for each angle. The (angle, offset) pairs are used to fill a 2D histogram, with the maximum peak in the histogram representing a likely track where a number of drift circles concur on the position and angle [3] . As part of the filling process, the 128 highest-occupancy bin locations are maintained, so finding the overall histogram maximum requires only a few clock cycles. Preliminary results indicate that the total latency to process 100 MDT hits in an ROI is less than 1 µs. 
Track Fitting
Each station (inner, middle, outer) will process hits and identify track segments independently. All hits are then transferred back to the carrier board, where a Xilinx Zynq device, FPGA with embedded ARM processor cores, will be used to evaluate a final parameterized track fit. Depending on how many segments can be reconstructed per muon candidate in the different MDT stations, the muon's transverse momentum (p T ) can be determined using two different methods [4] .
If three segments are found, Figure 6 (a), each in a different MDT station, the positions of these can be combined to measure the track curvature by calculating the sagitta from the three points (3-station method). Otherwise, two segments in different MDT stations still can be combined to extract the p T by measuring their deflection angle (2-station method), Figure 6 (b). 
Monte Carlo framework and latency studies
Monte Carlo (MC) simulated event samples are used to aid in the development of the algorithms and measure the expected trigger rates and efficiencies under the HL-LHC luminosity and pileup conditions, based on the Run 2 Muon Spectrometer geometry [5] . In this project, MC events are also used as input to a cycle-accurate hardware simulation for the digital processes up to the segment finding mezzanines. Since real events are randomly generated and the processing in the front-end electronics adds uncertainties to the data delivery, this simulation proved necessary to understand better the behavior of the MDT readout/trigger chain.
Hardware implementation
The ATCA carrier board provides basic services including module management, power conditioning, base Ethernet and firmware management. The carrier board also contains 72 optical receivers and 72 optical transmitters, capable of operation up to 10 Gbps.
The carrier board contains one Xilinx Ultrascale-class FPGA which will handle the reception of the MDT data via MGT links, ROI information, hit extraction and calibration, and transmission of hits to the mezzanine board(s). Segment data is transferred back from the mezzanine board to the carrier board for track fitting. In addition, the carrier board FPGA will transfer the MDT hits to the ATLAS Data Acquisition (DAQ) via Front End LInk eXchange (FELIX) system. An external Double Data Rate (DDR) memory device may be required for buffering of DAQ data.
The combination of FPGA with Central Processing Unit (CPU) in the Xilinx Zynq chip provides an Ethernet interface and might also be used to implement certain track fitting algorithms. The Zynq device requires Random Access Memory (RAM) for its operating system as well as an interface to a µSD card or other flash file system storage.
Conclusions and future work
Studies showed that the hardware trigger processor will be able to improve the purity of the triggers and sharpen the p T turn-on to meet the rate limit for single Muon triggers for the HL-LHC.
Our best estimate for the total latency for the L0-MDT trigger processor (once all the data has arrived to the board) is less than 1 µs to process up to 100 MDT hits within an ROI and provide a high-resolution p T measurement as output.
A detailed conceptual design with extensive simulation studies is being prepared now, to be published in the ATLAS TDAQ Technical Design Report. A first generation of hardware prototypes is planned for 2018 -2019, with a full system ready for installation in approximately 2024.
