ABSTRACT: Starting in 2014, the LHC will collide bunches of protons at up to 14 TeV with an instantaneous luminosity increasing above the design value of 1 × 10 34 cm −2 s −1 . Even though the resulting higher event rate will challenge the existing ATLAS data acquisition system, the trigger rate can be reduced by selecting channels based on their expected decay topology and thus reducing background. This will be achieved by introducing a new FPGA based module in the Level-1 Trigger: the Topological Processor (L1Topo). With L1Topo it will be possible to concentrate detailed information from the entire calorimeters and the muon detector into a single module. L1Topo will receive a total aggregate bandwidth of ≈ 1 Tb/s. High density optical I/O and state of the art FPGAs with embedded multi-Gb/s transceivers will be required. For a typical algorithm, the topology data will be processed in less than 100 ns. This paper focuses on the design of the first L1Topo prototype and results from a full-size, full-function demonstrator module. Implementation details of a topological algorithm and latency figures are presented.
Introduction
ATLAS [1] is one of the multi-purpose experiments at the Large Hadron Collider (LHC) at CERN. At a frequency of ≈ 40 MHz, bunches of opposing protons collide. A Trigger system is used to filter collision events without physics interest, lowering the average output rate to a level, which is suitable for further offline analysis. The design output rate of the Level-1 Trigger is 200 Hz. The Trigger system consists of three levels, of which only the first is relevant to this paper and will be described in brief below.
The ATLAS Level-1 Trigger is a fixed latency, 40 MHz, pipe-lined, synchronous system, built to operate at the LHC design instantaneous luminosity of 10 34 cm −2 s −1 [2, 3, 4] . The Level-1 Trigger consists of three subsystems: the Level-1 Calorimeter Trigger (L1Calo), the Level-1 Muon Trigger (L1Muon) and the Central Trigger Processor (CTP). The hardware of the Level-1 Trigger is primarily based on FPGAs and custom ASICs. Including cabling, the maximum latency budget of the Level-1 electronics chain is 2.5 µs. An outline of the Level-1 trigger chain is shown in fig. 1 .
In the CTP the trigger decision is made, based on thresholds received from L1Calo and L1Muon, and forwarded to the higher levels of the trigger.
The L1Calo input data comes from 7200 analog "trigger towers". The trigger towers are built with a granularity of 0.1 × 0.1 in the η/φ range, which is covered by the ATLAS electromagnetic and hadronic calorimeters. Digitization of the input signals and digital filtering is done on a mixed-signal Pre-Processor. Then the trigger tower signals are forwarded to two feature processors, namely the Cluster Processor (CP) and the Jet/Energy-sum Processor (JEP). The CP's aim is to identify electron, photon, tau and hadron candidates with a Transverse Energy E T above a set of programmable thresholds [5] . Simultaneously the JEP identifies jet candidates and produces global sums of total, missing, and jet-sum E T . Inside the CP and the JEP are the Common Merger Modules (CMMs) [4] . The CMMs count the multiplicities of trigger objects and send the result to the Central Trigger Processor (CTP). Upon receiving the 'Level-1 Accept' signal from the CTP, the coordinates in the η/φ -plane for each trigger object (so called Regions of Interest, RoIs), which were identified by the feature processors at Level-1, are sent through ReadOut Drivers (RODs) on to the Level-2 Trigger. The L1Muon input data comes from 800k Resistive Plate Chamber (RPC) strips in the barrel region and Thin Gap Chambers (TGCs) in the endcap regions. Multiplicities for six thresholds at high and low transverse momentum are measured by coincident hits in the RPC and TGC planes. The logic for multiplicity counting of the different thresholds is provided by the Muon Central Trigger Processor Interface (MUCTPI) [6] .
Motivation for the upgrade
The increase in instantaneous luminosity above design implies that the Level-1 Trigger needs to be upgraded so as to cope with higher background rates and to maintain Level-1 trigger rates at the current level less than 100 kHz ( [7] ) without unduly raising thresholds or prescaling trigger streams of physics interest.
The current Level-1 Trigger [3, 4] allows making selections on counts of objects. For example in L1Calo these are identified via a cluster finding via a sliding window algorithm (numbers of jets at various thresholds and clusters) and Missing Transverse Energy [5] . To achieve additional background rate reduction at Level-1, the topological information on jet or muon direction in space can be used.
To be able to form trigger decisions based on topological information, an entirely new element in the Level-1 Trigger is required: the Topological Processor (L1Topo). L1Topo will provide high optical input bandwidth and powerful state of the art FPGAs, receiving and processing the RoI information within the Level-1 latency budget. This will provide great flexibility for topological algorithms that can be implemented, the results of which will be sent to the CTP.
Level-1 Topological Processor

Functional Requirements
L1Topo will be a single processor crate, equipped with one or several processor modules. It will be designed with an AdvancedTCA (ATCA) form factor. The information received by L1Topo will comprise the RoI data, which consists of a description of the position of a trigger object (jet, electromagnetic cluster, and muon) along with some qualifying information, eg. the energy sum. Currently the RoI data are transmitted to the Level-2 Trigger directly.
High bandwidth and low processing latency on the real time data path will be essential. Data will be received from the JEP, CP and MUCTP and will be sent to the Central Trigger processor (CTP) via optical links.
Implementation
14 multi-Gb/s opto-electrical (o/e) converters of high density (miniPods) will be employed [8] . They will be mounted as close as possible to the FPGAs to optimize signal quality on the PCB tracks. No on-board electrical duplication of the real time data path is foreseen. The FPGA's onchip Multi-Gb/s Transceivers (MGT) will be run at an initial line rate of 6.4 Gb/s per channel. A simplified outline of the board design is shown in fig.2 .
It will be possible to assemble L1Topo with a choice of footprint compatible devices. Initially 2 Xilinx Virtex-7 FPGAs (XC7VX485T) will be mounted on the prototype processor module due to component availability. More powerful Virtex-7 devices (XC7VX690T) will be mounted on later copies of the module.
The back panel optical connectors will provide a capacity of 288 fibers maximum, if 4 shrouds are mounted [9, 10] . At the envisaged density of 48 fibers per connector, 160 input channels will populate 4 connectors. The aggregate bandwidth required for electron, tau, jets and muon topological information will be ≈ 1 Tb/s.
The clock circuitry will comprise various crystal clocks and a clock mezzanine for clock generation. A jitter cleaner will be used for signal conditioning, as the MGTs require low jitter clocks. The jitter cleaner that will be used on L1Topo is a Silicon Labs 5326 and it will allow for jitter cleaning and frequency synthesis up to multiples of the bunch clock frequency. Several stages of electrical fan-out are forseen to drive out LVDS and CML clocks at low distortion.
Module control is done via an Ethernet connection to the ATCA base interface, either via an embedded processor, or with an FPGA based IP stack. Initially, VME access via an SFP connector on the front panel will be used for module debug and control in the test lab. 
The technology demonstrator GOLD
In 2011, the technology demonstrator module GOLD (Generic Opto Link Demonstrator) was built to resemble the components that will be used for L1Topo. The GOLD mainboard is based on the ATCA form factor, with clock generation and o/e converters located on separate mezzanines, allowing for much flexibility. Real firmware implementations of topological algorithms can be tested on the GOLD. Moreover it can be used as an optical data sink (and source) for future L1Calo modules and for standalone link tests. In the following section, a summary of some key features is briefly discussed. A full review of the GOLD specifications is available in [11] . Figure 3 . Assembled demonstrator GOLD under test. In the text we refer to the two FPGAs on the left marked with A,B as input-FPGAs. The main-FPGA (E) can not be seen in the picture on the right, because it is hidden by the opto-mezzanine. The opto-mezzanine can be equipped with 4 10 Gb/s AVAGO o/e receivers and 2 transmitters.
GOLD key features
The processing power and high-speed connectivity of the GOLD is provided by three Xilinx Virtex-6 FPGAs (XC6VLX240T). Each FPGA is equipped with 24 MGTs that can run up to a maximum line rate of 6.6 Gbit/s. On the GOLD various clocks are available. The mainboard itself has a 125 MHz crystal clock oscillator. Additionally a 40.08 MHz or a 160.32 MHz crystal clock from a clock mezzanine can be used. The clock mezzanine has itself a mezzanine that decodes the 40.08 MHz clock of the LHC Timing, Trigger and Control (TTC) system. Among other things, the TTC system provides for the distribution of synchronous timing and control signals to the Level-1 Trigger [12] . The TTC clock is used to drive the MGTs. A jitter cleaner on the clockboard is used to prepare a clock derived from the TTC clock for this purpose. The two input-FPGAs on the top left in fig.3 receive signals optically from the opto-mezzanine in the center. The main-FPGA is used for board control and is meant to be used as a merger, since it is connected to the input FPGAs via 80 electrical differential lines per input-FPGA. o/e conversion is performed by 12-fiber industry standard AVAGO receivers and transmitters, mounted on the opto-mezzanine. Electrical signals from the opto-mezzanine are fed through two FMC connectors to the GOLD mainboard. A USB micro controller, which supports 480 Mb/s USB high-speed mode, is mounted on the GOLD. A JTAG interface connects the micro controller to the FPGAs. The micro controller design is fully compatible to the Xilinx platform USB solution.
Tests on the demonstrator at Mainz
Transceiver and AVAGO o/e converter parameters fine tuning
The MGTs and the AVAGO o/e converters used on the demonstrator board have many parameters that can be fine tuned to improve signal quality and the reliability of the data transmission. Systematic tests on 12 high speed optical channels have been carried out at Mainz. IBERT (Integrated Bit Error Ratio Tester), an IP core provided by Xilinx, can be used to adjust the settings of the MGTs and to measure the Bit Error Rate (BER). The IBERT core is controlled via a desktop application, which communicates with the core using the JTAG interface. The settings of the AVAGO o/e converters are controlled via an i 2 c bus, which is controlled separately from the desktop application using a mixture of custom software and firmware. At a line rate of 6.4 Gb/s, the parameters were modified to maximize the width of the "bathtub" (see fig. 4 ). An oscilloscope (30 GSa/s, 32 GHz) was also used to measure eye diagrams in various positions to check the signal integrity.
The error free region in the "bathtub" is about 40% of the one bit period, which is sufficiently wide for error free data transmission. BER measurements have been conducted over several days and no transmission errors were found. This leads to a BER < 10 −16 for one channel.
Latency Measurements
Due to the tight latency budget of only 2.5 µs for the whole Level-1 trigger system, it is essential to estimate the expected latency of L1Topo. A significant part of L1Topo's latency budget is consumed in the MGTs for the high speed data transmission itself. The demonstrator GOLD is equipped with MGTs that are very similar to the MGTs that will be found on L1Topo allowing the demonstrator to be used for MGT related latency measurements. To measure the latency induced by a receiving MGT, a test setup was made as illustrated in fig.5 . The main-FPGA is used as a data source. It sends a signal on one optical channel at a line rate of 6.4 Gb/s. This signal mostly consists of the 8b/10b comma character D28.5. A different character is sent about once every 1 µs and in parallel an electrical trigger signal is sent to an oscilloscope, which indicates the start of the measurement. One of the input-FPGAs is used as a data sink. The signal from the main-FPGA is received here. The signal can be measured at several points on the PCB as shown in fig.5 . Different loopback modes are available on the receiving FPGA. In these, the received signals only propagate through a portion of the MGT before being sent to the TX (PMA, PCS) or it can enter the FPGA fabric from where it can be sent back manually. For an exact definition of the different loopback modes see [13] . The latencies measured for the different loopback modes are (1 LHC bunch tick= 25 ns):
• Far End PMA loopback: ≈ 1.5 LHC bunch ticks
• Far End PCS loopback: ≈ 3 LHC bunch ticks
• loopback in FPGA fabric: ≈ 3.5 LHC bunch ticks
• RX only (electr. output): ≈ 2.5 LHC bunch ticks PMA and PCS loopback modes are of special interest, because these might provide a possibility for low latency data duplication in the future. The data paths of the signal entering the FPGA on the RX side and leaving the FPGA on the TX side could be identified with the oscilloscope. Cable propagation delays have been subtracted. The latency for RX only was measured by driving out an electrical trigger signal to a pin of the receiving FPGA, which could be identified with the oscilloscope (see fig. 6 ). 
Sorting algorithm logic consumption
An initial implementation of a generic topological algorithm has been created (see fig.7 ). This algorithm consists of a sort stage and a look-up table (LUT). The sort stage is used to sort the data from the 32 input channels and it returns the two largest trigger objects (eg. jets, clusters of electrons/taus). The LUT is used to calculate and produce cuts on topological correlations among them. It can be freely configured depending on the desired topological cuts. The advantage of the LUT is that the result can be obtained within a fixed latency. Logic consumption of the algorithm in the FPGA is crucial as we plan to run several algorithms on L1Topo in parallel. The logic slice utilization of the current implementation in a Xilinx Virtex-6 FPGA (XC6VLX240T) is 13%. Figure 7 . Scheme of the current implementation of the topological algorithm. The block labeled "COMPONENT sort" consists of a matrix of comparators to sort the input data. The block labeled "COMPONENT LUT" is a lookup table, which is used to apply topological cuts.
Conclusions
This document gives an overview of the L1Topo prototype module, which is a completely new element of the ATLAS Level-1 Trigger. With L1Topo it will be possible for the first time to apply topological cuts at Level-1 using RoIs from calorimeters and muon sub-detectors. At the time of this writing the L1Topo prototype is under design capture and the components have been procured. The prototype module will be produced by the end of November 2012. The L1Topo production modules are aimed to be installed in ATLAS in 2014.
A summary on the test results conducted with the L1Topo functional demonstrator, which was built in 2011 as a test bench for the baseline technologies to be implemented in the L1Topo design, is also given.
