Abstract-This paper presents an overview of the ATLAS Fast TracKer (FTK) processor, reporting the design of the system, its expected performance, and the current integration status. The FTK is an upgrade of the trigger system of the ATLAS experiment. The system is designed to reduce the event rate from the proton-proton collisions occurring at 40 MHz to about 1 kHz for the expected LHC luminosity (2×10 34 cm −2 s −1 ). To achieve this selection rate, the FTK system must exploit an intensive use of particle tracking. To this purpose, a dedicated hardware tracker has been designed: the FTK processor. To achieve the required performance, FTK uses a combination of custom VLSI chips and latest generation FPGAs, all embedded in dedicated boards, and it exploits a fully parallel architecture. FTK provides track reconstruction based on the full silicon (inner) detector with resolution comparable to the offline reconstruction with a latency of approximately 100 µs.
I. INTRODUCTION
FTK is an electronic system able to find and reconstruct particle trajectories (tracks) from the inner detector of ATLAS [1] , [2] . It rebuilds the track information by looking at 12 logical layers 4 pixel layers including the new Insertable BLayer (IBL) [3] ; 8 Silicon Microstrip Trackers (SCT) layers corresponding to the axial and stereo sides of 4 physical layers. Dual-output High-speed Optical Link (HOLA) has been installed by replacing the existing HOLA output mezzanine cards in the pixel and SCT Read Out Drivers (ROD). An identical copy of the silicon data is generated by the dual HOLA. The two copies of data are given in parallel to the FTK processor and to the Read Out System (ROS).
As data are transmitted from the RODs according to the level-1 trigger rate, FTK receives input silicon data at full rate and high bandwidth. After processing, FTK fills ROSs with the helix parameters and hits for all tracks with p T the handling of huge amount of data is an important design challenge.
II. FTK ARCHITECTURE
FTK is a massively parallel system that could manage input data from 64 η-φ 2 towers. It is able to perform pattern recognition among a huge quantity of possible track combinations.
1 the transverse momentum of the lepton with respect to the beam line 2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y axis points upward. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the beam pipe. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Essentially, the system works in two stages: (1) 8 of 12 silicon detector layers are used to perform pattern recognition and to obtain an initial fitting; (2) the found tracks are refined and output data are formatted to be compatible with the ATLAS protocols for the level-2 trigger system. Fig. 1 shows the data flux diagram. The pixel and SCT data are transmitted from the RODs on serial links and received by the Data Formatters (DF) which divide the data depending on the geographical region of the detector and assign the data to different Processing Units (PU). The DF mezzanine cards, also called Input Mezzanine (IM), perform two-dimensional cluster finding for each pixel layer. Clusters are constrained at size of 4 pixels in the φ direction and 5 pixels in the z or r direction for the barrel and endcap, respectively. For each cluster, the DF mezzanine calculates the position of the centroid, and transmits the centroid information to the PUs which are able to recognize patterns and refine particle tracks. Inside each PU, data are received by the Data Organizer (DO) that converts data into a coarser resolution (Super-Strip or SS). This operation is performed to improve pattern recognition. In addition DOs keep the high-resolution information of the found tracks (also called roads). When a road is found, the system accesses the DO database to retrieve high-resolution data of the found tracks.
Coarsely-binned hit data called "Super Strips"(SS) are transmitted to the AM boards which contain the core of the FTK system: the Associative Memory (AM). An AM is a dedicated ASIC, designed in a 65 nm CMOS technology [4] .
The stored patterns are determined in advance from a full ATLAS simulation of single tracks. The whole FTK system can store and compare simultaneously about 1 billion of patterns. Each AM board contains 8 M patterns, and the total number of AM boards is 128. Each AM board contains 64 AM chips, split in 4 different Local AM Boards (LAMB). Each chip contains 2 17 patterns. It is worth to point out that the high pattern density requires a large silicon area. In addition, a large number of input interfaces is required to distribute the input data. Hence, to reduce the I/O congestion, 2.4 Gbit/s serial links have been integrated into the AM chips. Finally, this system requires a large amount of power, because each AM board consumes 250 W. For this reason, an ad hoc cooling system has been designed.
The roads found are the input to the Track Fitter, made with a modern FPGA which refines the particle trajectories by means of χ 2 acceptance test. Fake roads are cut thanks to the so called Hit Warrior (HW) subsystem, and after that data are transmitted to the Second Stage Board (SSB) which integrates the incoming data with the 4 missing silicon data (the data not used in stage 1). When a track passes the stage-1 filter, the road number and hits are sent to the SSB. The track is extrapolated into the 4 additional layers. A minimum of 3 hits in 4 layers is required. With this criteria, duplicate track removal is again applied to the tracks, but now tracks in all roads are used for the comparison. SSB output tracks consist of the hits on the track, the χ 2 , the helix parameters, and a track quality word that includes the layers with a hit are sent to the FTK-to-Level-2 Interface Crate (FLIC). The FLIC organizes the tracks and sends them to the High Level Trigger ROSs using the standard ATLAS protocols, and carries out monitoring functions. The FLIC crate is organized so that in the future global event functions can be carried out in other cards in the crate.
The whole FTK system will consists of 13 crates. DFs Fig. 2 . The proposed final layout of the VME crate is Step5 and it different from the TDR one, taking into account that the slots with reduced cooling in the crate are the ones where the SSB is placed ( having a lower power dissipation than AMB). occupy 4 ATCA crates. They have full-mesh backplanes to facilitate the massive reorganization of the silicon hits from the readout optical fiber to the η-φ tower organization. Eight VME crates are used for the AM chips and for the auxiliary card (AUX) in order to hold the DO and the first-stage of track fitters. The board positions are shown in Fig. 2 for the final configuration (TDR) and two intermediate steps (4 and 5) . A final crate contains the FLIC cards.
The whole system uses a large number of ASICs and FPGAs that must be characterized and monitored. Dedicated firmware has been written for the FPGAs.
III. CURRENT STATUS
Since 2013, the FTK has been tested and integrated at CERN. In 2013, the vertical slice test has been performed to validate the functionality of old IMs and AMs. The test was performed at 70 kHz (Fig. 3) . In 2014, a global integration test has been performed up to a trigger frequency of 100 kHz. At the end of 2015, we started to install boards and crates inside the experiments in the ATLAS USA15 counting room (Fig. 3) . Currently, we are testing the chain that start from ROD through the IM and DF with a data flow of about 50 kHz (corresponding to a luminosity of about 5 × 10 ). Reference runs confirmed the functionality of this chain.
A. Data Formatter (DF)
The Data Formatter reorganizes the incoming data from the silicon RODs [5] . The final system is expected to have 32 DFs. The Production Readiness Review (PRR) has been passed and now all DFs have been produced. 33 on 36 DFs passed the quality tests. Unfortunately, shorts were found in dc power supply of remaining 3 DFs, that have been fixed in meantime. Some improvement on the Intelligent Platform Management Controller (IPMC) have been performed. The tests have been performed at Stanford and at CERN. All the boards have been integrated in ATCA crates within the ATLAS experiments and are fully functional.
B. Input Mezzanine (IM)
The IM is installed on the DF and it finds clusters. For this task we use two different board design: (1) a first board designed in Frascati (Italy) with a Xilinx Artix-7 FPGA; and (2) a second board designed in Waseda (Japan) with a Xilinx Spartan-6 FPGA. The 32 IMs installed on 8 DFs that could be considered a close to final installation of 1/4 of the DF system. Fig. 4 shows the IM and DF boards. 
C. Auxiliary Card (AUX)
The AUX card is installed in the rear of the VME crate. It receives and organizes the hits from DF for pattern matching. In addition, it receives track roads and performs the first stage of track fitting. The first AUX has been integrated at CERN. Fig. 5 shows the AUX card.
D. AM Board (AMB)
The AMB performs pattern recognition. It spreads input SS to the AM chips in parallel, and it collects roads (Fig. 6 . A first AMB, containing the AM05 chips, has been installed at CERN. AMB installation of the others boards is ongoing at CERN. In early 2016, we plan to install 3 AMBs with AM06 chips, and then to perform cooling tests inside the VME rack. This is an essential point for the final layout of PUs. At the end of 2016, we plan to install 32 AMBs. The integration will be completed in 2018, when all the 128 AUXs and AMBs will be mounted inside the VME crates. 
E. Associative Memory (AM) Chip
In 2015, the AM05 chip has been extensively characterized. AM05 is an Associative Memory (AM) chip containing 2k patterns based on the XORAM cell [6] . It has been designed in 2014 and it is able to compare patterns with a data rate of 100 MHz. Data are transferred and received by means of SerDes IP block. Currently, we are characterizing the AM06, which is the final version of the ASIC designed at the 65 nm technology node. Fig. 7 shows an AM06 chip, which contains 2 17 patterns.
F. Second Stage Board (SSB)
The SSB takes the roads from the AMBs and integrates these results with the 4 missing data layers. Two different types of SSBs have been designed. Both have identical Extrapolator (EXP) and Track Fitter (TF) functions, and different HW functions to perform duplicate 12-layer track removal that takes into account the η-φ overlap regions.
1) Preliminary SSB (pSSB):
It sends its tracks to the φ neighboring fSSB in the same core crate for output to the FLIC. In addition, it sends its tracks to the neighboring fSSBs solely for use in duplicate track removal. In this way, the HW on a pSSB is essentially just a fanout.
2) Final SSB (fSSB): It receives tracks from its own track fitter and the φ neighboring pSSB for output to a FLIC. These tracks are only output if they are not duplicate with any tracks from two φ neighboring pSSBs (excluding the pSSB just mentioned) and two φ neighboring fSSBs.
The SSB has been tested at Urbana and CERN in 2015. It passed the PRR. The version 4 of SSB has been sent for the production of 10 boards + 10 Rear-Transition Modules (RTMs). Mechanical and electrical tests are ongoing. Most of the efforts are done on the firmware development. Fig. 8 show the version 3 of the SSB.
G. FTK to Level-2 Interface Card (FLIC)
The FLIC collects final track segments from SSBs. The board production has been delivered and it has been tested at Argonne. In the same month, the firmware needed some adaptation for the final design. The ATCA installation for the FLIC is completed with extra sensors connected to the IPMC. A first FLIC board (Fig. 9) has been installed at CERN. In March 2016, the two FLIC boards will be integrated at CERN.
IV. EXPECTED PERFORMANCE
When the luminosity increases, the superposition of interactions brings to a so called event pile-up. FTK simulations have been perfomed at high pile-up (46 and 69 average number of interactions per crossing). The comparison is made for several variables. The efficiency is 83% for tracks with p T = 1 GeV and rises to above 90% for p T > 10 GeV. There is a slight loss in efficiency for tracks with p T > 40 GeV. The performance with 69 pile-up is less few efficient than with the 46 pile-up sample [7] .
V. CONCLUSION
In this paper we described the FTK architecture and its expected performance. The current status of each subsystem has been provided in this paper. Currently, we are almost at the end of the integration step at CERN. The whole system will be online in 2018 for enhancing the trigger system of ATLAS.
