Abstract-The ATLAS experiment is located at the European Centre for Nuclear Research (CERN) in Switzerland. It is designed to measure decay properties of highly energetic particles produced in the protons collisions at the Large Hadron Collider (LHC).
I. INTRODUCTION
A TLAS 1 [1] is a general purpose detector at the Large Hadron Collider (LHC) at CERN. A three level trigger system is used to reduce the output rate of 40MHz to a more manageable rate of 400Hz which can be stored and used for physics analyses. The Level-1 Trigger in use during the 2010-2013 (Run 1) running periods was the fixed latency, hardwarebased system built to operate at the LHC design instantaneous luminosity of 10 34 cm −2 s −1 [2] , [3] , [4] . The Level-1 system is comprised of three subsystems: the Level-1 Calorimeter *corresponding author 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r,φ) are used in the transverse plane, φ being the azimuthal angle around the beam pipe. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). [4] . The trigger is part of the hardware of the detectors. Trigger information goes directly from the detector into the electronics cavern located in the ATLAS underground area. The signals go into pre-processors and the digitized information is then fed into the Jet/Energy and Cluster processors. The results are then merged and sent to the Central Trigger Processor (CTP) which sends the Level-1 Accept.
Trigger (L1Calo), the Level-1 Muon Trigger (L1Muon), and the Central Trigger Processor (CTP), and has an output rate of 75kHz and a latency of 2.5µs. The L1Calo trigger chain is shown in Figure 1 .
The L1Calo input data comes from 7200 analog "trigger towers" built with a granularity of 0.1×0.1 in the range η × φ covered by the ATLAS electromagnetic and hadronic calorimeters. The signal is digitized on a signal Pre-Processor (PPr), and then processed on the Cluster Processor (CP) and the Jet/Energy-sum Processor (JEP). The goal of the CP is to identify physics objects: electron/photon and τ /single hadrons, which have a transverse energy (E T ) above a set of programable thresholds [5] . The JEP identifies jet candidates and produces sums of total, missing and jet-sum E T . Inside the CP and JEP backplane are Common Merger Modules (CMMs), which count multiplicities of trigger objects and send the result to the CTP. The CTP makes the actual trigger decision. Upon receiving a "Level-1 Accept", the coordinates in the η/φ-plane from each trigger object (Regions of Interest or RoIs) are sent to the Level-2 Trigger.
The L1Muon input data comes from 800k Resistive Plate Chamber (RPC) strips in the barrel region and Thin Gap Chambers (TGCs) in the end cap regions. Multiplicities for different thresholds are measured by coincident hits in the RPC and TGC planes. The logic for multiplicity counting of the different thresholds is provided by the Muon Central Trigger Processor Interface (MUCTPI) [6] .
II. MOTIVATION FOR TOPOLOGICAL TRIGGER
The increase of the LHC luminosity up to 2-3×10 34 cm −2 s −1 for Run 2 (i.e.: post Phase-0 upgrade) and eventually 5×10 34 cm −2 s −1 for Run 3 (i.e.: post phase-1 upgrade) will require that the Level-1 Trigger is upgraded to cope with not only the higher backgrounds, but also the higher L1 rates without pre-scaling important physics trigger streams. The upgrade to L1Calo includes upgraded CPM and JEM firmware to send trigger object data across the backplane at increased data rates. The current CMMs will be replaced by new "CMX" modules capable of receiving and processing the high-speed backplane data. A key role of these CMX modules will be transmitting the real-time date to a completely new element in the level-1 chain, to be completed during the Phase-0 upgrade: the Topological Processor (L1Topo).
The capabilities offered by L1Topo will allow a trigger decision to be made using more than just p T or E T whose current thresholds will be impossible to maintain into Run 2. This is essential because interesting physics events have different topologies than minimum bias events, which are the majority of events produced by the LHC.
A. Introduction to Topological Decisions
L1Topo will form trigger decisions based on topological information provided by the L1Calo and L1Muon systems. It will be provided a latency budget of 300 ns or 12 bunch crossings (BCs). Figure 2 shows graphically three potential algorithms that can be implemented to make the topological decision. These decisions can be loosely assigned to three categories defined in Table I : angular separation, invariant mass, and hardness of interaction. Figure 3 shows the Level-1 Trigger chain with the inclusion of L1Topo. The existing hardware (green) will be upgraded and new hardware (light brown) will be implemented into the current system during the Phase-0 upgrade.
B. An Example: Z bosons and Higgs decaying into τ 's
The projected p T Level-1 thresholds for semileptonic and hadronic decays of τ 's will be 40 GeV and 80 GeV respectively. This puts severe limitations on Standard Model and Fig. 2 . Three different examples of topologies that can be used to make trigger decisions. On the left, differences in angular distributions are illustrated. In the middle, the hardness of the interaction can be tested with sums of energy or momenta. Finally on the right, invariant mass like calculations can also be used to make topological decisions. 
Type
Name Details
Outline of the L1Calo trigger chain with the inclusion of the Topological Processor. The existing hardware (green) will be upgraded and ew hardware (light brown) will be implemented into the current system following Run 1. New CPM and JEM firmware will send Trigger OBject (TOB) data, containing RoI-based information, across the backplane, instead of partial sums. The upgraded CMMs (CMXs) provide the bandwidth and connectivity required to transmit this information to the Topological Processor.
Higgs physics that will be accessible during Run 2 given p T thresholding only [7] . There will be a signal efficiency loss of 30% for semileptonic and almost 100% loss for fully hadronic events, thus making this very important Higgs decay channel impossible to access. However, this loss is based Fig. 4 . Layout of the L1Topo prototype board. A and B are 2 Xilinx Virtex 7 FPGAs: XC7VX690T and XC7VX485T for the first prototype. C is a nonreal-time module control FPGA. X is the FMC extension module for Read Out Drivers (RODs). RX and TX are the receiver inputs and transmitter outputs. The receivers interface to 160 optical fibers from the CMXs and L1Muon system. on raising p T thresholds only. Additionally, the processes H → τ τ semileptonic and Z → τ τ hadronic has been compared with minimum bias simulation.The difference in η (∆η) of the two decay products (τ ), where the τ decays either semileptonically or hadronically have different distributions than the minimum bias sample. A ∆η < 2 requirement can reduce the Level-1 rate up to 15 times.
III. THE TOPOLOGICAL PROCESSOR
L1Topo is a single shelf (crate) system equipped with two identical ATCA-compliant blades (modules), of the type shown in Figure 4 and in Figure 5 . It receives Trigger OBject (TOB) data from L1Calo and L1Muon over parallel-optical ribbon fibers and processes them in two large Virtex-7 FPGAs per module, each with up to 80 Multi-Gigabit Transceivers (MGT). L1Topo then sends the results to the CTP via electrical and/or optical links.
A. Real-time data path
The input data are received optically through four blindmate MTP/MPO connectors in the ATCA Backplane of L1Topo. In the baseline design 48-way connectors will be used. The optical signals are distributed over octopus cables to 12-fiber optical-electrical (o/e) receivers on the main module. MiniPOD receivers, which are rated up to 14 Gb/s, are placed close to the FPGAs to reduce the high speed link track length and optimize signal quality. The MGTs on the FPGAs will run at an initial rate of 6.4 Gb/s per channel and were tested up to 10 Gb/s. Each FPGA will be capable of receiving the full event information from both L1Calo and L1Muon optically and the two processor FPGAs will process data independently and in parallel. The L1Calo input currently consists of: 6 Output to the CTP will consist of individual result bits each indicating whether a specific topology algorithm has been passed. The resulting trigger data will be transmitted to the CTP optically or electrically via an extension mezzanine module. Two 12-way fibers are routed into the CTP from the processor FPGAs via Avago miniPod transmitters. The optical signal from each Avago miniPod is driven to the front panel via a short pigtail male MTP Molex connector.
B. Implementation
L1Topo is designed to work with a choice of several compatible devices. The prototype module has been outfitted with two lower logic capacity devices with only 56 input links (XC7VX485T) due to component availability. The prototype board is shown in Figure 5 . More powerful devices (XC7VX690T) with the full 80 link input MGTs will be mounted on the production modules. The backplane optical connectors of the production device will provide a capacity of up to 288 fibers, and four 48 way fiber bundles.
For synchronous operation, data transmitters will have to be operated with multiples of the LHC bunch clock. Receiver reference clocks are segmented and derived from local crystal oscillators. The L1Topo module is designed for 40.0789 MHz operation of the real-time data path only. The fabric clocks run at the LHC bunch clock frequency. The jitter on the MGT bunch clock path is tightly controlled with help of a PLL device. The control FPGA receives additional local clocks since it handles DAQ, ROI, and control links as well A single fibre optical ribbon connection per processor FPGA, running through the front panel of the module is provided for optical result transmission to CTP. Figure 6 shows the eye diagram measurement of the L1Topo prototype at 10.26 Gb/s. Results for the Avago miniPOD receiver (RX) is shown on the left, and the transceiver (TX) is shown on the right. Note that the mask is not shown with these diagrams because these initial tests are meant in this context to be illustrative of performance. Further detailed tests are upcoming.
IV. HIGH SPEED LINK SIMULATION AND TESTS
Approximately 136 Bit Error Rate (BER) scans have been performed on the L1Topo prototype. For these tests, 14 Avago miniPod RX and 2 Avago miniPod TX were connected. Figure 7 shows the BER scan test results. The overall BER was found to be ≤ 6 × 10 −17 . Figure 8 shows the resulting Error Free Region (EFR), in percent of UI. UI is the Unit Interval defined as the width of a single data bit on the serial stream which is a percentage of the nominal eye width. The left tail of this distribution is understood and is due to a reworked FPGA. These initial tests show very promising results for the prototype.
V. ALGORITHMS ON THE FPGA AND RESOURCE USAGE
Feedback from the physics analysis groups (an example of which is detailed in Section II) has resulted in the development of several firmware algorithms to be designed for the FPGAs. The challenge is threefold: high input bandwidth, short latency and high processing power are all required. Since topological decisions can be made with any of three different TOBs, topological algorithms require a high concentration of data in a single processor. The entire Level-1 system has a latency budget of 2.5 µs, and only 12 bunch crossings (BCs) are allocated to the topological processor (which gives a latency budget of 300 ns).
Algorithms can be implemented in logic blocks, Digital Signal Processors (DSPs) and block RAM. Each FPGA has over 90k logic blocks and over 3k DSP slices. Resource usage and latency depends on several aspects of the decision. Some of these include: the precision of the decision (bit width), the number of calculations of the algorithm, the type of TOBs used, and the speed of the FPGA. Each TOB can contain up to 30 bits of information to process and the topological trigger will receive an estimated 120 TOBs from the cluster processor, 64 TOBs from the jet processor, and 32 TOBs from the muon system. To cope with the high concentration of data, the number of TOBs will first be reduced via a sorting or selection algorithm. The sorting algorithm offers a fully parallel sort in two stages. The resource usage as a function of the number of input channels is shown in Figure 9 . The tests were done assuming 20 bit input, and 10 bits used for the comparison. The sorting algorithm takes up more resources than any other individual algorithm. If 50 ns of latency is allowed (∼ 2 BCs), up to 6 leading TOBs can be selected from up to 120 inputs.
Additional algorithms have been developed for L1Topo, and are detailed in Table I and shown in Figure 2 . Each of these algorithms individually uses relatively few of the resources allocated in the FPGA. These algorithms also consume little latency, on the order of fractions of a BC. The most resource heavy are those which require a square root to be calculated. In most instances, the squared quantity can be used to make the trigger decision. When a square root is required, COordinate Rotation DIgital Computer (CORDIC) 2 can also be utilized. Transcendental functions such as cosh (cos) are calculated using Look Up Tables (LUTs) with 5 bit inputs and 11 (8) bits output respectively.
An example of the total algorithm layout on a single FPGA Fig. 9 . Resource Usage of sort algorithm. The resource usage was tested using the Virtex-7 XC7V690T FPGA. Five different numbers of input channels ranging from 40 to 250 were sorted to five different values (ranging from the 2 leading to the 6 leading) were measured to test the capabilities of the FPGA.
Tests were done assuming 20 input bits with 10 bits used for the comparison. The green lines labeled as muons, jets, and electron/photon/tau represent the number of TOBs which will be provided by L1Calo as inputs to the sorting algorithm. Fig. 10 . Algorithm location on FPGA. This includes sorting each input from the L1Calo and L1Muon subsystems as well as implementing the data path of the output of the sorting algorithms into different algorithms. The output of these algorithms will provide the trigger decision which will then be sent to the CTP.
is shown in Figure 10 . This includes sorting each input from the L1Calo and L1Muon subsystems as well as implementing the data path of the output of the sorting algorithms into different algorithms. The output of these algorithms will provide the trigger decision which will then be sent to the CTP. Currently the algorithms use approximately 40% of the resources of the FPGA which means that there is room for further algorithm implementation. Discussions with physics groups to determine the most useful algorithms for trigger decisions are ongoing.
VI. CONCLUSIONS
In addition to firmware and hardware upgrades to the existing Level-1 Calorimeter Trigger (L1Calo) during the Phase-0 upgrade, a new topological processor (L1Topo) will be added into the Level-1 system. With L1Topo it will be possible for the first time to apply topology cuts at Level-1 using TOBs from the calorimeter and muon sub-detectors. L1Topo will receive input from new CMX modules which are part of the L1Calo upgrade. It will then use these inputs to make topological based decisions, thus allowing much important physics to be saved from the alternative of raising the p T and E T thresholds. Real-time L1Topo output will be sent to the Central Trigger Processor CTP, where the final Level-1 accept decision is taken. The crucial aspect here is the design of the topology algorithms, including optimization of bandwidth, FPGA resources and latency.
The L1Topo prototype has undergone initial testing and is producing extremely promising results. A summary of the results from the first initial high speed link simulation and tests has been presented. The L1Topo production modules are scheduled for installation on site in 2014.
In addition to testing of the prototype, several algorithms have been developed for the main L1Topo processor. These algorithms have provided a measure of the resource demands and latency compliance, and fall within the available budgets. This demonstrates that these algorithms will be suitable for inclusion into the final L1Topo system for use in Run 2.
