Abstract-The Fast Tracker is an integral part of trigger upgrade program for the ATLAS experiment. At LHC Run 2, which started operations in June 2015 at a center of mass energy of 13 TeV, the luminosity could reach up to 2·10 34 cm −2 s −1 and an average of 40-50 simultaneous proton collisions per beam crossing will be expected. The higher luminosity demands a more sophisticated trigger system with increased use of tracking information. The FTK is a highly-parallel hardware system that rapidly finds and reconstructs tracks in the ATLAS innerdetector at the triggering stage. This paper focuses on the Mezzanine Board that is input module of the entire FTK system.
I. INTRODUCTION
T HE ATLAS experiment [1] is being upgraded to cope with the higher luminosity and higher center-ofmass energy that the Large Hadron Collider (LHC) will provide. The high instantaneous luminosity at the LHC Run 2 and beyond poses challenges for the trigger system [6] . The existing ATLAS trigger system, consisting of a hardware-based Level-1 trigger and a CPU-based High Level Trigger (HLT), was designed to work well at the LHC design luminosity, 1×10 34 cm −2 s −1 . However, after the planned luminosity upgrade, the increase on the detector activity arising from simultaneous interactions (pile-up) will complicate measurements. Tracking information is important for distinguishing which events triggered by the Level-1 should be kept for further processing, and which objects that originate from pile-ups since it has fine resolution. The Fast TracKer (FTK) [2] is one of the ATLAS upgrade programs to keep physics sensitivity in the high pile-ups environment. It is an electronics system that will do global track reconstruction after each Level-1 trigger to enable the HLT to have early access to tracking information. FTK will use data from Insertable B-Layer (IBL) [3] , pixel, and semiconductor tracker (SCT) detectors. The FTK performs track reconstructions in hardware with a high degree of parallelism allowing for the tracks to be readily available at the HLT.
This paper focuses on the FTK Input Mezzanine cards (IM) that are the input interface and the first processing stage of the FTK system. 
II. FUNCTIONALITY OF IM
The IM functions are implemented in a mezzanine card of 12 layers with a size of 149 mm × 74 mm that connects the DataFormatter (DF) [4] motherboard with a High Pin Count FPGA Mezzanine Card (FMC) connector. The IMs receive data from IBL, pixel, and SCT data with about 100 million channels. The cross-section view of the tracking detectors is shown in Figure. 3. Each IM receives data by up to 4 SLINK optical fibers from RODs through four SFP+ connectors. There are two types of IM, one uses Spartan-6 FPGA (XC6SLX150T) for processing pixel and SCT data, and the other uses Artix-7 FPGA (XC7A200T) for IBL/pixel and SCT data. The IMs with Spartan-6 and Artix-7 are shown in Figures. 4, 5, respectively. On each mezzanine there are 2 FPGAs. Each FPGA receives two links, one from IBL/pixel and the other from SCT, processes data independently, and transmits the output to the FMC connector. In each FPGA, a clustering algorithm [5] is implemented to reduce the volume of the input data as well as to improve precision of hit spacial measurements by identifying clusters. There are two types of clustering algorithms: 2D for IBL and pixel detectors and 1D for SCT strips. The later is aided by pre-clustering in the SCT Front End (FE) electronics. For the data transfer from IM to DF, Double Data Rate (DDR) source-synchronous parallel bus of LVDS operated at 200MHz is used. The InterIntegrated Circuit (I2C) bus is used for slow control of IM from DF. Each FPGA is equipped with a 18 Mb external SRAM and a 32 Mb flash memory. The IM is powered from the FMC connector or an external power connector used for the standalone tests where there is not DF. The JTAG chain for FPGA configuration is accessible both from the FMC connector and an external connector. The S-LINK runs up to 3.1 Gbps. 
A. Detailed description of the clustering algorithm
Clustering has two purposes, one is to reduce the volume of the received data before further processing, and the other is to determine the cluster center for obtaining the best spatial measurement. The clustering is implemented by using a 2D-clustering algorithm for the IBL and pixel detectors, while it is 1D-clustering for SCT.
For SCT data, 1D-clustering is performed, which is partly implemented at the SCT Front-End chips. Continuous strips are found and clustered, then cluster size and cluster center information are passed to DF.
A multi-core FPGA-based 2D-clustering algorithm is used for IBL and pixel detectors. The challenge is to perform 2D cluster finding in real time. The algorithm utilizes a sliding window technique with adjustable window size in order to minimize the FPGA resources required for cluster identification. A key element of this algorithm is the possibility to instantiate multiple clustering cores working on different windows that can be used in parallel to increase performance exploiting better the FPGA resources. In addition to the parallelization, the algorithm is executed in a pipeline, thus allowing for data preparation, clustering and cluster post processing to be executed in parallel.
The incoming data is transformed from the native detector format to a format which is useful to the following processing step. It is a pre-processing step that selects, formats and organizes the information that is used by the clustering algorithm such as start/end event words, module headers/trailers, and pixel hit words. Along with the format transformation, the incoming data is realigned. The ATLAS pixel modules 16 front-end (FE) chips are arranged in 2 rows, each of 8 readout chips, and they are numbered anticlockwise. The hit data are readout in the same FE sequence. This means that half of the pixel module data arrive in reverse column order than the other half. The hit decoder module needs to restore the order of the hits since the clustering algorithm is based on the assumption that hits are ordered by increasing column number sequence.
The finder logic starts with the first received hit. It defines the cluster window (21times8 pixel size) with respect to the first hit position. Then the logic loads all hits within the window. Once all the hit data are loaded, the algorithm selects the first reference hit. Then on each clock cycle the hits that are in direct neighborhood with the selected hit are also selected until no directly neighboring hits remain. Eventually the selected hits are sent to the centroid calculator as a cluster. The cluster post-processing performs further data volume reduction and improve precision by calculating the cluster centroid. One fundamental characteristic of the 2D clustering implementation is that different clustering engine can work independently and in parallel to identify different clusters, therefore increasing performance while exploiting more FPGA resources.
III. PRODUCTION AND TEST STATUS OF IM
The final version of the IMs were produced, and quality control tests were performed. The tests include visual checks, electrical checks, and bit-error-rate (BER) measurements. The BER of all IMs satisfies the ATLAS experiment requirements of 10 −15 . All produced IMs passed the tests, including enough spares. The mass produced IMs with Spartan-6 are shown in Figure. 7. Installation and integration tests are ongoing. Communication with inner-detector RODs are confirmed for IBL, pixel and SCT. The DDR data transfer between DF is running at design level of 200 MHz. With 4 IMs mounted on 1 DF board, stable dataflow is achieved at 100 kHz event rate using full 16 input channels, with the clustering functionality working. For configuration and monitoring, I2C bus is fully tested and in use. Several registers for monitoring are defined and read out via I2C. More will be added as necessary. The output of IM hardware was compared with that of the emulation, and was found to match exactly the simulation. Now IMs are being installed, and tested under real operation conditions. The installed IMs and DFs are shown in Figure. 8. Data taking with SCT detector was established in 2015 with input event rate of up to 100 kHz. For real IBL/pixels, firmware development is ongoing to cope with special data.
The "First" track of FTK will come soon, and FTK will start operation/commissioning by the start of the physics run in 2017.
IV. CONCLUSIONS
In this paper the functionality and current development status of the IM are described. The IM is the input stage of the entire FTK system. The mass production of IMs was completed, integration tests and staged installation are ongoing.
FTK will start data taking with full coverage (|η| < 2.5) in the early 2017, and will be upgraded. 
