Abstract-An additional inner layer for the existing ATLAS Pixel Detector, called Insertable B-Layer (IBL), is under design and it will be installed by Phase I. New front-end readout ASICs fabrication (FE-I4) will replace the previous chips in this layer. The new system features higher readout speed -160Mb/s per ASIC -and simplified control. The current data acquisition chains are composed of front-end and readout chips, Back-OfCrate (BOCs) cards and ReadOut Driver cards (RODs). This paper presents a proposal for the new ROD board, which implements modern FPGAs and high-speed links with the detector and with the ATLAS TDAQ system.
I. INTRODUCTION
or the LHC Phase I upgrade, in 2016, a new pixel layer is expected to be installed in the ATLAS experiment: the Insertable B-Layer (IBL) [1] . IBL is a fourth layer added to the present ATLAS Pixel detector [2] between a new beam pipe and the current inner Pixel layer (B-layer). Luminosity induced inefficiencies in the existing Pixel detector will be recuperated by the IBL, which will allow to keep robust tracking despite effects arising from luminosity, hardware lifetime and radiation effects. IBL will also provide improved tracking precision for vertexing and b-tagging to the current detector.
The Pixel Detector readout architecture has been designed in order to be fully efficient at the nominal LHC peak luminosity of 10 34 cm -2 s -1 and for an expected trigger rate of 100 kHz. Two bottlenecks arise in this architecture at luminosities greater than nominal, one in the front-end chip FE-I3 [3] and one in the link between the Module Controller Chip (MCC) [4] and the off-detector electronics:
• the double-column bus in the FE-I3 (sensitive to the occupancy), • the link from MCC to the off-detector (sensitive to the product of the hit occupancy and trigger rate). These bottlenecks may give rise to readout inefficiencies that may impact b-tagging efficiency. IBL readout architecture is designed to fulfill the requirements of higher luminosity that reflects in larger event occupancy and transfer bandwidths.
IBL front-end electronics features a new front-end ASIC, named FE-I4 [5] : this front-end chip has a completely new internal architecture that fulfills the larger request of occupancy and bandwidth. The FE-I4 features a new readout architecture based on 2x2 pixel regions to improve throughput, compression of hit pairs in single hit words and 8B/10B output data encoding scheme at 160 Mb/s. Two FE-I4 chips read out an IBL module as shown in Fig. 1 . IBL features 224 pixel modules arranged in 14 staves at 3.3 cm from the interaction zone.
Together with the front-end electronics also new VME [6] off-detector electronics are being designed:  a Back of Crate card (BOC) implementing optical I/O interface;  a ReadOut Driver card (ROD) implementing data processing functionality. This document focuses on the proposed architecture for the ROD card, which provides hardware backward compatibility for operation with the current Pixel BOC and support for an improved architecture of the off-detector readout. 
II. PROPOSAL OF A NEW OFF-DETECTOR READOUT
The current ROD is a 9U VME board hosting 11 FPGAs plus 5 Texas Instruments Digital Signal Processors (DSPs) for board control, calibration data histogramming and fit operations. With a change in the firmware it could be used for data acquisition in the IBL readout chain. So the open question was whether the existing ROD was sufficient for the job or if a new one was needed.
In order to take a decision the following points were discussed:
Proposal for a readout driver card for the ATLAS Insertable B-Layer operate with a maximum of eight 160 Mb/s input links (from the sensors) to one output S-Link [7] , while, in order to respect IBL natural modularity, thirty-two 160 Mb/s links to four S-Links have to be handled (one ROD board acquiring data from 16 IBL modules).  The current ROD board hosts obsolete FPGA devices: the 11 Spartan-2 devices are no longer supported by the currently available design tools. It can also become difficult to find this devices on the market in the near future.  The VME bus puts a limitation on the bandwidth for data exchange with the VME CPU of 4 MB/s per board. These considerations led us to decide to design a new ROD board, whose schematic diagram is shown in Fig. 2 together with the new BOC board concept. The main idea is to perform on-board event fragment building and histogramming, while sending histograms (or even raw data) off-board to a computer farm for fitting operations. In this way the new ROD will be able to process 4 times the data bandwidth of the current ROD (5.12 Gb/s vs 1.28 Gb/s), while delegating the fit operations to an external device. This choice should also allow to shrink the ROD design and debug times with respect to the current one.
The main proposed features are:  Modularity corresponding to thirty-two 160 Mb/s input channels and four 160 MB/s S-Link output channels. In this way a ROD board is able to read data from 32 FE-I4 chips leading to a required total number of 14 ROD boards, that can be hosted in a single VME crate.  Hardware compatibility with the current BOC board: an ad-hoc firmware can be developed if required with the possibility to reduce to eight 160 Mb/s input links and one output S-Link in compatibility mode with the current ROD board.  Full data path implemented in few large FPGAs (2 Xilinx Spartan6 devices) in order to maximize the design flexibility and simplify the inter-communication between the components.  Increase the output bandwidth for calibration histograms.
This can be achieved by using Gigabit Ethernet high speed serial links for sending calibration data to an external PC farm. We plan to use 2 Gigabit Ethernet links per ROD board, thus leading to a bandwidth increase from 4 MB/s to around 200 MB/s per board. A solution with 2 Gb/s links per ROD makes it convenient to extract raw histograms as soon as they are ready and process them on commercial processors, in this way eliminating the need for DSPs on the ROD. The execution of data fitting using computers allows an improved flexibility for fit code development, since more convenient tools are available compared to the DSP environment.
 The ROD board encodes data towards the external PC farm as blocks of UDP packets, each block being made of a SW programmable number of packets. The communication transaction with the external PC relies on a custom made protocol in which the ROD board needs to receive a continuous feedback from the PC. In particular the ROD board transmits block n after receiving from the PC the acknowledge that all previous blocks at least up to block n-2 have been correctly received. This is done to minimize the wait time among the transmission of different blocks and to maximize the data bandwidth usage. A timeout based mechanism is foreseen in case of packets loss.  Use of an embedded PowerPC core (inside a Virtex5 FPGA) performing system control and non real-time functions. In particular it allows to run iterative code executing on-board scans without host intervention. The PowerPC core also controls a third Gigabit Ethernet port that can be used for having a faster link for downloading front-end configuration data to the ROD. Using an embedded processor should allow to decrease debug times since it allows to use a common simulation environment for both processor and FPGA logic behavior. The ROD prototype under design also equips a Texas Instruments DSP equal to the one in the current ROD that can be used in case the PowerPC proves to have worse performances.
III. HISTOGRAMMING BLOCK
As for the Pixel Detector, the calibration of the IBL detector is performed by repeating relatively short (100-1000) series of events, recorded while injecting a known charge into each pixel, and with different settings of the front-end parameters (e.g. different thresholds or different pre-amplifier feedback currents). This sequence, called calibration scan, generates a very large number of events, which have to be analyzed in order to extract the histogram showing how many times each pixel has registered a hit for a given setting. For this reason, in calibration mode, the data stream coming from the sensor is not sent over the S-Link, but pre-processed in the ROD, where the relevant histograms are produced. At the end of the scan, the histograms are transferred to an external farm via Gbit Ethernet for fitting and archiving.
A novel approach to carry out histograms and analysis on-ROD is proposed (see Fig. 3 ). The histogramming block receives the hits addresses and the Time Over Threshold (ToT) information, a number that specifies how long the hit signal stays over a predefined threshold value. While the current ROD implements histogramming on DSPs, the new ROD executes the calibration loops to accumulate the perpixel occupancies, sums of time-over-threshold (ToT) and sums of ToT 2 parameters on the Spartan6 FPGAs by using internal RAM and external SSRAM (Synchronous Static RAM) for data buffering. The histograms are eventually transferred via Gbit Ethernet as UDP packets to an off-line high-performance computer. 
IV. CONCLUSION
The architecture of the new ROD board for the readout of IBL is presented. The board is currently under design and a prototype is expected by early 2011.
