Abstract-The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Vertex Ultrascale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 276 optical fibers with the data transferred at the 40 MHz Large Hadron Collider (LHC) clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor Field-Programmable Gate Array (FPGAs), monitor board health, and interface to external signals. Now, the Prototype 1 board which includes one ZYNQ and one Vertex-7 FPGA has been designed for testing and verification. After the elementary technologies have been verified in the Prototype 1, a more advanced prototype with three Vertex Ultrascale FPGAs is being designed. Although the board is being designed specifically for the ATLAS experiment, it is sufficiently generic that it could be used for fast data processing at other high energy physics or nuclear physics experiments.
I. INTRODUCTION
HE Large Hadron Collider (LHC) will undergo a series of upgrades, which allow luminosity increases in next ten years. In Table 1 the different phases are shown. The ATLAS [1] experiment will also follow the same upgrade steps. During the so-called Phase-I upgrade, the ATLAS first level trigger (level-1) will be updated with a new component in the calorimeter system (called L1Calo): the gFEX. It is one of the new components designed to maintain trigger acceptance against increasing luminosity. The gFEX is used to select large-radius jets, typical of Lorentz-boosted objects, by means of wide-area jet algorithms refined by subjet information.
The L1Calo system processes signals from electromagnetic and hadronic calorimeters. As shown in the Fig.1 , the L1Calo before this upgrade has three major subsystems (marked with green color): the Cluster Processor Subsystem (CP) comprising Cluster Processor Modules (CPMs) [4] and Common Merger Extended Modules (CMXs) [5] ; the Jet/Energy Processor Subsystem (JEP) comprising S.Tang, M.Begel, H.Chen, F.Lanni, H.Takai and W.Wu are with the Brookhaven National Laboratory, Upton, NY 11973 USA (telephone: 631-344-2863, e-mail: stang@bnl.gov).
Jet/Energy Modules (JEMs) [6] and CMXs; and the Preprocessor Subsystem, comprising Preprocessor Modules (PPM). The CPM is used to identify electrons and taus, while jet energy and are measured in the JEMs. Three additional feature identification systems will be installed during Phase-I: the Electron Feature Extractor (eFEX) [7] , the Jet Feature Extractor (jFEX) [8] , and the gFEX. The eFEX and jFEX provide similar functionalities to the CPM and JEM, respectively, albeit with finer granularity and more advanced algorithms. Each system consists of multiple modules that operate on limited regions of the calorimeter. The gFEX, in contrast, has the entire calorimeter data available in a single module and thus enables the use of full-scan algorithms.
The Preprocessor receives shaped analog pulses from the electromagnetic and hadronic calorimeters, digitizes and synchronizes them, identifies the bunch collision from which each pulse originated, scales the digital values to yield E T , and prepares and transmits the data to downstream elements. In LHC Run 3, the electromagnetic calorimeter will provide L1Calo with both analog signals (for the CP and JEP) and digitized data (for the FEXes). The hadronic calorimeter will continue to send analog signals. These are digitized on the Preprocessor and then transmitted optically to the FEXes through an optical fiber plant. Initially at least, the eFEX and jFEX will operate in parallel with the CP and JEP. The older analog subsystems will be decommissioned once the performance of the FEXes has been validated.
The Phase-II upgrade project includes substantial changes in the trigger electronics, which will be installed during the so called Long Shutdown 3 (LS3). Calorimeter input to L1Calo will be in digital format. The Preprocessor, CP, and JEP subsystems will be removed, and the FEX subsystems, with modified firmware, will be relabeled as L0Calo in a possible two stage (L0/L1) real-time trigger. Hence, the FEX subsystems must be designed to meet both the Phase-I and Phase-II upgrade requirements including the still to be specified Phase-II timing and control signals. [9] . The red circle area (R<1) is the L1 narrow jet, and the black circle is the large-R jets with gFEX.
III. FUNCTIONALITY OF GFEX
The gFEX has a single module with several large FieldProgrammable Gate Arrays (FPGAs) for data processing and a combined FPGA & CPU System-on-Chip (Hybrid FPGA) for control and monitoring. A special feature of the gFEX is that it receives data from the entire calorimeter enabling the identification of large-radius jets and the calculation of wholeevent observables. Each processor FPGA has 2 azimuthal ( ) coverage for a slice in pseudorapidity ( ) and executes all feature identification algorithms. The processor FPGAs communicate with each other via low-latency GPIO links while input and output to the board are via Multi-Gigabit Transceiver (MGTs). A simplified functional representation of the module is shown in Fig. 5 . The gFEX is a customized ATCA module based on the PICMG ® 3.0 Revision 3.0 specification [10] . The gFEX module will likely be placed in a sparsely populated ATCA shelf so that it can occupy two slots if needed: one for the board and one for cooling (e.g., large heat sinks), fiber routing, etc. Fig. 3 . Acceptance gain for boosted top after adding gFEX [11] . The blue curves show the acceptance without gFEX. The 140 GeV gFEX trigger threshold is chosen to match the L1 J100 single subjet turn-on curve. After adding gFEX, the acceptance of two and more sub-jets is recovered and the resolution is nearly the same as that of one sub-jet. 
A. Input and output interfaces
The gFEX receives data from the electromagnetic and hadronic calorimeters via optical fibers. 
The real time data to the L1 Topological Trigger (L1Topo) [12] is sent by three processor FPGAs respectively with 12 MGTs. The data received by the processor FPGA A and B are sent to the FPGA C, and then combined with the data of processor FPGA C.
The
B. Feature Identification Algorithms
The core trigger algorithms are implemented in the firmware of the processor FPGAs. The input data, after deserialization, are organized into calibrated gTowers in the gTower-builder step; this procedure is common to all downstream algorithms. A seeded simple-cone jet algorithm is used for the large-area non-iterative jet finding. Seeds are defined by gTowers over a configurable E T threshold. An illustration of the seeds identified in an event is shown in Fig.  5 . The gTower E T in a circular region surrounding the seeds is summed. Portions of the jet area might extend into an region on a neighboring processor FPGA. Part of the energy summation must therefore take place on that FPGA necessitating the transfer of seed information with low latency parallel GPIOs. These partial sums are then sent to the original FPGA and included in the final E T of the large-R jets as displayed in Fig. 6 . Note that the jets are allowed to overlap. This enhances the efficiency for events with complex topologies where multiple energy depositions are close together, as is typically found in events containing boosted objects.
The architecture of the gFEX permits event-by-event local pileup suppression for these large-R objects using baseline subtraction techniques [13] . Pileup subtraction is performed using the energy density measured on the gTowers within each processing region and is calculated on an event-by-event basis. The energy subtracted from each jet is determined by the product of the area of each jet and the energy density from the associated region. Studies are ongoing to optimize the performance of pileup-subtracted jets. 
C. Slow Control & Environmental Monitoring
A CACTUS/IPbus [14] interface is provided for high-level control of the gFEX. This allows algorithmic parameters to be set, modes of operation to be controlled and spy memories to be read. The IPbus protocol will be implemented in the Hybrid FPGA including the standard firmware modified to run on the FPGA part and the software suite from CACTUS for the Linux instance running on the ARM processor.
The Hybrid FPGA implements Intelligent Platform Management Controller (IPMC) [15] , to monitor voltage and current of every power rail on the gFEX. They also monitor the temperature of all FPGAs via their embedded sensors, and of any areas of dense logic via discrete sensors where needed. These data can be transmitted to an external monitoring system by the Hybrid FPGA.
If any board temperature exceeds a programmable threshold, the IPMC powers down the board payloadeverything not on the management power supply. The thresholds at which this function is activated should be set above the levels at which the Detector Control System (DCS) will power down the module. Thus, this mechanism should activate only if DCS fails. This might happen, for example, if there is a sudden rapid rise in temperature to which the DCS cannot respond in time.
IV. IMPLEMENTATION OF PROTOTYPE 1 A first prototype has been designed to verify all the functionalities of the chosen technologies, to test power distribution and sequence, MGT link speed and high speed parallel GPIOs.
As shown in the Fig. 7 , one Hybrid FPGA -ZYNQ and one processor FPGA are included in the Prototype 1. There are also several MiniPODs [16] , MicroPODs [17] , power modules and high speed parallel GPIOs. 
A. Power distribution and sequence
The gFEX is an ATCA module, so the power design should meet the requirements of ATCA standard. Firstly, two negative 48 V inputs are ORed and inverted to one +48 V by a ATCA board power input module -PIM 400; and then the +48 V power is stepped down to 12V by a DC-DC converter produced by General Electric; at last, all the other power rails, such as 1.0V, 1.2 V, 1.8 V, 2.5 V and 3.3 V are generated from 12 V with different DC-DC power module respectively.
To meet the large current and low ripple requirements of Xilinx FPGA, the power module LTM4630 is used to generate the core and MGT related voltages.
Since the IPMC is not available yet, two configurable sequencing devices ADM1066 that offers a single-chip solution for supply monitoring and sequencing are used in this board.
B. MGTs design
There are two types of optical transceivers (MiniPODs and MicroPODs) and two different MGTs (GTX and GTH) to be verified on the Prototype 1 board. So each MGT is connected to two kinds of optical transceivers. Moreover, the GTH to GTH loopback, GTX to GTX loopback, GTH to GTX loopback and GTX to GTH loopback are also included.
C. High speed parallel GPIOs
The high speed parallel GPIOs are used to transfer data between FPGAs. It requires at least running at 480 Mb/s with 50 bits width. Three different 50 bits GPIOs are designed. The first one is from the processor FPGA High Performance (HP) banks to HP banks with LVDS differential interface, and the second is from processor FPGA HP banks to ZYNQ HP banks with LVDS differential interface, and the last is from processor FPGA HP bank to HP bank with single-ended HSTL interface.
D. ZYNQ design
Two Giga-bit Ethernets, QSPI interface, 4Gb DDR3 memories, I2C interface, UART and SD card interface are design in the ZYNQ PS system.
V. TEST RESULTS OF PROTOTYPE 1
The major functionality and performance tests for the Prototype 1 have been done. The power sequence and monitoring circuit work well as programmed, and the interface of ZYNQ are verified, such as the SD card boot mode and QSPI boot mode. All the hardware technologies are verified and work as expected. 
A. Link speed test for MGTs
As explained in the previous MGTs design section, there are many different loop backs of MGTs. The link speed tests are done with the IBERT tool provided by the Xilinx Vivado. When all the 80 channel GTHs of processor FPGA and 16 channel GTXs of ZYNQ are turned on, all the links are stable at 12.8 Gb/s with no error bit detected when Bit Error Rate <10 -15 . The eye diagrams shown in Fig. 8 -Fig.10 are obtained for different combinations of links, when they are all running at 12.8 Gb/s. Comparing all the eye diagrams, the GTH is much better than GTX, and the MiniPODs is almost the same as MicroPODs. 
B. High speed parallel GPIOs
The three chosen 50-bit parallel GPIO are tested with different data patterns (such as PRBS) to measure their stability and the extension of the stability window for each of them.
With the IP -IDELAYE2, the clock delay can be adjusted with 32 steps. Each step is about 78 ps.
All these three data buses are stable at 960 Mb/s. The stable range for the processor FPGA HP banks to processor FPGA HP banks LVDS and HSTL interface are about 0.78 ps, which is 75% of half cycle of 480MHz; for the processor FPGA HP banks to the ZYNQ HP banks LVDS interface, the stable range is about 0.702 ps, which is 67% of half cycle of 480MHz.
VI. CONCLUSION AND NEXT STEPS
Prototype 1 has successfully verified the validity of the chosen technologies. The GTX and GTH links are stable at 12.8 Gb/s and the three high speed 50-bit parallel GPIOs are stable when running at 960 Mb/s. Now a much more advanced prototype is on the way. The prototype will include all the functionalities of gFEX and three ultra-scale FPGAs are designed in. Currently the schematics design is completed and the routing and layout is about 40% finished.
