Abstract-We have develo p ed a slice p rototy p e of the full TileCal readout chain based on p rototype modules and off-the shelf com p onents. As different module p rototypes are develo p ed and become available they can replace earlier prototypes or emulators in the chain. Due to its modular and flexible structure, the p rototype can adapt to changing requirements. This will allow most of the final functionality to be developed and tested before the hardware design is finalized.
I. INTRODUCTION
P LANS for upgrade phase 2 of the AT LAS hadronic calorimeter (TileCal) aims at full digital readout of all data to the counting room. The concept uses modern FP GAs containing large amounts of logic resources, generous connectivity and built-in multi-gigabit serial transceivers for communication. An early slice prototype for the upgraded Tile Calorimeter readout chain has been developed based on development boards and prototype hardware. The on detector logic is emulated using an off-the-shelf evaluation board equipped with a XILINX Virtex 6 FPGA [1] . Analog data generated in a front-end prototype board [2] are digitized [3] , encoded with the GBT protocol [4] and transmitted off detector via fiber optics at 4. 8 Gbitls, using built-in multi gigabit transceivers (GTX) . The off-detector electronics are emulated by a second evaluation board, also equipped with a Virtex 6 FPGA [5] . Here the data are received, derandomized, pipelined and read out using a PCIe Gen 2 interface connected to a common Pc. Commands and clock signals are distributed to the on-detector board using again the GBT protocol and fiber optics and to be compatible with the current TileCal readout also the TTCrx functionality was implemented. The entire prototype design can be controlled using software on the Pc. This slice prototype is used as a platform for developing possible implementations of the future readout and trigger electronics for the TileCal, including hardware, firmware and test software. The firmware structure has been kept as generic as possible, to allow the design to be ported to future hardware implementations.
II. COMPONENTS AND REQUIREMENTS
The main structure of the test setup is shown in Fig. 1 .
The system is divided up in two main parts (off-and on detector) which are communicating via an high speed serial
The communication between on-and off-detector elec tronics is realized using high speed serial links. Here built in FPGA gigabit transceivers (GTX) provide a convenient solution. To configure these is straightforward. The most critical parameter for operating these is the clock signal used to drive the transmitter and receiver. Here, design schemes can affect the signal quality very much. For this reason the key issue for a successful communication between on and off-detector electronics is distribution of a high quality global clock signal. This not only ensures that both sides run completely synchronous, but also that the signal quality itself allows operation with a very low bit error rate.
A. Commercial Components
Development of the firmware was performed in steps ac cording to functionality and hardware. At first, a very reduced set of functionality was implemented using almost solely commercial off-the-shelf components. These parts will then gradually replace by custom components, which provide the base for implementing new functionality. Starting with the ML605 and the PCIe Gen2 evaluation platform we evaluated the distribution of clock signals, commands and data from one board to another. Expanding this with the ADC mezzanine card [6] from Linear Te chnologies gave the possibility to gain experience with the later-used ADC, and to adapt the firmware.
For e/o conversion we used SFP modules which unfortunately Fig. I . Schematic of the test slice structure including gigabit transceivers (I), the GBT protocol (2) and the application logic (3) for every side. Fig. 2 . The test slice setup using only off the shelf components including one ML605 development board (I), one ADC mezzanine card (2), one front-end board (3) and a PCIe Gen2 evaluation platform (4).
were only specified for transmission rates up to 3. 75 Gbitls.
We had to use these modules for our purpose because the ML605 does not support SFP+. A 100 m multi-mode fiber was used to connect the boards.
B. MainBoard
To evolve this first test system to closer model the future TileCal, and to increase functionality, the MainBoard shown Picture of the MainBoard prototype developed at Stockholm University including power supply (I), four 12 bit ADCs (2), two 14 bit ADCs (3), one clock distributor (4), two 12 bit DACs (5), one FMC-HPC connector (6) and four 40 pin connectors (7) to the front-end boards.
voltage for every ADC input.
C. DaughterBoard
To move further towards a possible demonstrator implementation of the test slice it was also necessary to replace the processing FPGA board with a custom solution.
For this reason we developed the DaughterBoard which is shown in Fig. 4 . It is intended to serve as a processing board in a future application in TileCal, and thus is designed for redundancy and high data throughput readout. Two separately programmable FPGAs are connected in a multipoint topology to the MainBoard, ensuring that no data will be lost also if one FPGA fails and has to be reprogrammed. The power supply was also divided into two independent parts to maintain redundancy in case of failure. The up-links from the off-detector emulator and data readout are implemented using SFP+ modules. Later, the SNAP12 transmitter will be used to transfer data coming from both FPGAs using only one ribbon cable.
III. FIRMWARE SETUP
The firmware setup is divided in two parts, each with a dedicated function. The first part emulates the off-detector functionality by implementing a general set of functions that allow the reception of commands and data either from the user side or the on-detector electronics. It also distributes the clock signal to the on-detector electronics along with a stream of commands for control. The second part emulates the on detector electronics by implementing a set of functions more specific to the AT LAS Tile Calorimeter, and which will be described later.
A. Off-detector
The firmware implemented on the off-detector part of the test slice is composed of five main parts, shown in Picture of the DaughterBoard prototype developed at Stockholm University including power supply (I), two Virtex6 FPGAs (2), two SFP+ modules (3), one SNAPI2 connector (4) and one FMC-HPC connector (5) on the bottom of the board.
on-detector application logic r--,;;;;;;;;;;;;;;����. GTB-be Using Direct Memory Access (DMA) the user is able to trans fer commands and receive data from the off-and on-detector electronics. This interface is realized purely in hardware, and does not need a embedded system for correct operation. Thus achievable data rates can be much higher. This part of the off-detector application logic is the only one that uses a clock from a different source (the PCIe endpoint). Data integrity is ensured by using FIFOs between the PCIe endpoint and the rest of the logic running with the LHC clock.
To increase bandwidth the DMA had to be implemented in a streaming mode and therefore a new driver had to be developed. This new driver was also developed to make the 
B. On-detector
The firmware on the on-detector side is composed out of four main parts shown in Fig. 5 . These parts form the on-detector application logic implemented in the FPGA situated on the ML605.
1) Clock manager:
The clock manager on the on-detector side is connected to the recovered clock of the GTX receiver, whose frequency corresponds to the one used to drive the GTX at the off-detector side. This frequency is used to synthesize the three frequencies described earlier plus one used to drive the transmitting part of the GTX. This clock signal is very sensitive to jitter and design constraints and is therefore routed in a special way described later in this document.
2) Control unit:
This side of the test system is controlled through commands sent from the off-detector board to the control unit. Here the commands are encoded and corresponding actions performed, like reprogramming the data acquisition unit, the SPI interface or the connected peripheral hardware. It is also possible to monitor the behavior of the system and send the data back using the reserved bits in the slow control field of the GBT-protocol.
3) SPI program unit: The peripheral hardware used in this test setup is programmable using a SPI interface or u Wire.
For this reason a SPI program unit was implemented using a block RAM. The contents used for programming are stored there, and can be modified by the control unit. By doing so the system can automatically be adapted to changing requirements.
4) Data acquisition unit:
One of the largest parts of this design is the data acquisition unit. Here the serial input data stream coming from 12 bit ADCs (LTC2264-12) is parallelized and shifted bit-wise to match the correct output format. The data rate is 480 Mbitls for every channel, submitted using LVDS. Each input also has be adapted in terms of latency because of different wire lengths on the connected hardware. Important to this part is that it has to be easy scalable to ensure compatibility with later-added custom hardware.
IV. THE PROTOTYPE HARDWARE
Building up the test slice is being done stepwise, by first using off-the-shelf components and then successively replacing them with custom parts. During this process the firmware could easily be adapted in every step, and previously used test procedures could be reused in the updated system. In parallel, new functionality was developed according to the needs and also the test procedures were expanded. Using this approach we were able to identify problems at an early stage.
In summary this process can be divided in two main stages of modification: one using the MainBoard and ML605 and the other using the MainBoard and DaughterBoard.
A. Using MainBoard and ML605
In the first stage of modification the MainBoard was con nected to the ML605 together with two front-end boards, shown in Fig. 7 . Unfortunately because of layout restrictions we were not able to equip the front-end connectors on the other side of the board. The firmware was adapted according to the requirements and extended in terms of programmability and number of channels. To avoid the usage of additional components, the clocking scheme should use the recovered clock directly without the need to be cleaned externally. Doing so is not without risk, and is not recommended by XILINX. However, our studies have shown that it is possible to use the recovered clock directly to establish a stable communi cation using the GBT protocol, and send data to the off detector board. For example a composition of four independent recorded data streams is shown in Fig. 8 , where every data stream was phase shifted by 90 degrees with respect to the global LHC clock to improve the sampling.
To verify the quality of the recovered clock signal and the stability of the system, jitter measurements and bit error tests were performed. As a result we extracted design recommendations to improve the clock quality for direct usage with the GTX. To ensure the best performance when using the recovered clock, it is necessary to avoid using a global or regional clock buffer on the path between the recovered clock output and the MMCM. This usage is possible because there is a direct path between MMCM and GTX in the same region. Using a clock buffer for this path increased the jitter of the clock in every case we implemented. We also tried to determine the clock quality of the one used to drive the GTX transmitter. The best lowest error rate of 4 . 10-11 was achieved using so called High Performance Clock paths from the MMCM to the GTX transmitter. This rate is still very high and not acceptable for a final application. We are confident that this is due to the fact that we use SFP modules designed only for up to 3. 75 Gbitls. 
B. Using MainBoard and DaughterBoard
In the next step of the development process, the ML605 will finally be replaced by the DaughterBoard, which is currently in the debugging phase. This next firmware test platform is shown in Fig. 10 . The firmware, including ADC readout and programming, was implemented and tested successfully. High speed communication is still under test and will be available soon. Both FPGAs receive valid data, which means that the multipoint topology in which all signals are connected to both FPGAs is working well.
V. CONCLUSION
We successfully implemented firmware that supports the main functionality needed for a slice prototype for the up graded readout electronics of TileCal. Furthermore we estab lished design constraints that can be used to implement gigabit transmission in a Virtex6 FPGA without additional hardware for clock cleaning. Additional tests have to be performed using the SFP+ modules to lower the bit error rate to a more reliable level.
We also developed and partially tested custom hardware that is closer to the requirements of a future TileCal upgrade.
