Abstract-During 2014, upgraded-demonstrator electronics will be installed in a Tile calorimeter drawer to obtain long term experience with the redundant electronics proposed for a full upgrade scheduled for 2022.
I. INTRODUCTION
T HE electronics of the scintillating hadronic calorimeter of ATLAS [1] (TileCal) need to be upgraded, both to extend the detector lifetime and to cope with the higher particle fluxes expected due to the planned LHC luminosity upgrades.
TileCal is partitioned into four cylindrical segments ( Figure  1 ), comprising two segments forming the central barrel and an extended barrel segment on each end. Each TileCal segment is made up by 64 wedge-shaped modules, which are built of interleaved steel and scintillating plastic tiles. Energetic hadrons create showers in the steel, and the scintillating tiles sample the energy of the showers.
The light output of the scintillators is collected by wavelength shifting fibers and brought to the rear beam of each module, where photomultiplier tubes (PMTs) and the detector electronics are situated in extractable drawers. To provide angular and depth segmentation, the TileCal scintillating tiles in each module are grouped into cells. Each cell is read out by a pair of PMTs, one for each side of the module.
In the current system, the TileCal electronics produce analog sums of cells forming projective trigger towers aimed towards the interaction point, which are sent to the Level-1 calorimeter trigger. Each tower covers a solid angle of 0.1 x 0.1 radians in the transverse plane φ and pseudorapidity η. Digital data with full resolution and granularity is only read out for events that pass the Level-1 trigger, thus reducing the bandwidth required to bring data off the detector. See figure 2 for a schematic view of TileCal.
In the new, upgraded system all TileCal data from every bunch crossing will be digitized and read out directly to the electronics cavern. This will allow more detailed information to be sent to the Level-1 trigger as needed. This upgrade is made possible by advances in FPGA-based data processing and high-speed optical link technologies. The new on-detector electronics will comprise multiple boards: PMT pulses will be received and shaped in Front-End Boards (FEBs), one for each PMT; the FEB outputs are then received and digitized on MainBoards (MB), one for every 12 FEBs; the data are then sent off-detector via high-speed links on the attached DaughterBoard (one per MB) with minimal delay.
Detector interventions to service will become increasingly problematic as the ATLAS detector becomes activated over time by exposure to high radiation levels. To reduce the need for detector interventions it is necessary to design for highest possible reliability. Either ASICs or FPGAs are needed to perform the fairly complex functionality required, and the latter is preferred for reasons of cost and flexibility. Because of the comparatively moderate radiation levels expected at TileCal drawers with respect to the inner parts of the detector, we have chosen to base the design on Kintex-7 FPGAs from Xilinx [2] . We do not expect ionizing radiation to cause problems, due to the small CMOS technology (28nm) used in Kintex-7. The total ionizing dose over 10 year of high luminosity runs is expected to be much less than 10 kRad and the Kintex-7 has been shown to withstand ∼300kRad [7, p.19] However, Single Event Upsets (SEUs) will certainly occur and must be mitigated.
To gain experience and test the reliability of the new concepts, we are developing a hybrid Demonstrator that can be used to evaluate the new upgraded TileCal readout design while operating transparently within the current system. We plan to outfit one full TileCal module with the hybrid Demonstrator electronics during the 2013-2015 LHC shutdown.
II. EXPECTED RADIATION ENVIRONMENT
The ATLAS detector layout in Figure 1 also shows the contours of expected radiation levels inside the detector. Radiation levels decline with increasing radial distance from the beam pipe, and are relatively low near the TileCal electronics drawers as long as one avoids the gap region between the central and extended barrel segments.
To estimate the SEU rate of our TileCal Demonstrator we used simulations provided by the Radiation Background Task Force [3] . These simulations predicted rates for various radiation types in each detector part to a good degree of accuracy. The expected hadron fluxes in this figure are based on the nominal luminosity, L = 10 34 cm −2 s −1 expected at 2015 after the first shutdown period.
The dose level is highest in the gap region (one order of magnitude higher than elsewhere in the Tile Drawer), and the DaughterBoards nearest to the gap region will be exposed to ∼1 Gy/Yr (=100 rad/y). To minimize radiation exposure to critical components, these are placed as far away as possible from the gap region.
III. RELIABILITY STRATEGY
Several methods are available to achieve the needed reliability of the TileCal readout system. The existing system has been redesigned with respect to mechanics, power distribution, and electronics and data integrity.
Mechanics
The drawers in the rear of each module have been redesigned to make them easier to extract from the detector. Each drawer, which is currently divided into two mechanical parts during extraction, will instead be segmented into four independent parts, called Mini-Drawers. Cables will be organized in flexible carriers, allowing them to be easily disconnected. Failing Mini-Drawers can be more easily replaced than the current design. This results in minimal radiation exposure while in the ATLAS cavern and modules can be serviced outside the detector after they have cooled radiation-wise.
Power
Instead of the current daisy-chained power distribution that has proved to be prone to voltage drops, each drawer will be outfitted with independent and redundant power supplies.
Although it requires more space for cabling, this solution should ensure a better controlled power distribution by locally managing the different voltages using Point-of-Load regulators.
Electronics
Reliable, radiation tolerant components have been selected throughout the boards according to the strict recommendations for testing components and subsystems issued by the ATLAS management [4] , and taking the predicted radiation levels into consideration. The actual component placement was also chosen so that sensitive components were placed in lowradiation regions.
To avoid single point failure modes, the readout chain uses redundancy of most critical components. The firmware is also designed with redundancy in mind, using appropriate error mitigation techniques to reduce the SEUs.
IV. IMPLEMENTATION OF RELIABLE READOUT SYSTEM A. System wide
The smallest unit of the upgraded system covers half of a Mini-Drawer, and services six PMT channels covering six calorimeter cells on one side of the scintillator module. The other side serves the six PMTs from the same cells, on the other side of the module.
There are 16 such Mini-Drawers along the full length of the detector, grouped four-by-four into full drawers. The PMTs Fig. 3 . TileCal Demonstrator schematic. This includes on-detector summing cards (blue) that are used to ensure compatibility with the current readout system. Readout and slow control (downlink) are duplicated on the DaughterBoard and for increased reliability the two halves are interconnected to allow full readout if one transceiver fails. In short, the MainBoard and DaughterBoard contain two separate but functionally identical parts -each part serves the same the same set of calorimeter cells but with different PMTs. If one side fails, data can still be read out, albeit with more limited precision.
A schematic of the entire Demonstrator system can be found in figure 3 .
B. Board wide
The Demonstrator DaughterBoard consists of symmetric parts with two Kintex-7 FPGAs. They are configured via an optical link and a radiation hard protocol chip GBTX [5] .
The optical modulators chosen were the radiation tolerant Molex 10 Gbps QSFP modulators [6] .
A schematic of the DaughterBoard illustrating all the major components and data paths is shown in figure 4 . As can be seen, each Kintex FPGA is connected to both QSFPs. The two KINTEX chips are also interconnected to ensure reduced single point failure modes.
V. FPGA-SPECIFIC RELIABILITY FPGAs are more susceptible to radiation induced SEUs than ASICs since more logic is needed for the same amount of gates. However, due to recent performance increases and price reductions they now provide a viable alternative to ASICs in HEP applications.
The flexibility of FPGAs are extremely valuable because their firmware can easily can be updated whereas an ASIC needs to be reprocessed and replaced. Based on the radiation level estimation above and tests made by other groups [7, p.11-12] , we expect an SEU rate of 3.8 · 10 −5 upsets/s or 3.2 upsets/day for the configuration memory and 1.1 · 10 −5 upsets/s or 0.9 upsets/day for the block RAM. To mitigate these radiation induced errors, several techniques are available.
Two independent and redundant Kintex-7 FPGAs read digitized PMT data from their corresponding MainBoard halves. The FPGAs are configured via a JTAG implementation over an optical link using the radiation-hard GBTX interface [5] chip. The chip implements the GBT protocol that uses CRC, FEC (Forward Error Correction) and encoding of the data stream to reduce errors. Two JTAG chains are also implemented, one for each half. Each chain consists of the GBTX chip, a Kintex-7 on the DaughterBoard and two Altera Cyclone IV FPGAs on the MainBoard. It is also possible to configure one FPGA from its companion.
The FPGAs use internal scrubbing of the configuration memory. This allows repairs of single bit errors and adjacent two-bit errors, as well as flagging of uncorrectable multi-bit errors. However multi-bit errors are not as rare as simply multiplying probabilities of single bit errors. This is since e.g. minimum ionizing particles can leave tracks in several FPGA logic elements [7, p.14] . In the case of an uncorrectable error, the neighbouring FPGA is notified and can initiate partial or full reconfiguration from the off-detector system. By connecting an external memory to the scrubber, it is possible to repair multi bit upsets by replacing the damaged frame and not reconfigure the whole FPGA. This takes much less time, but requires more logic, which will be placed in the assumed correctly working companion FPGA.
Given the expected upset rate, frame-by-frame reconfiguration should not be needed, but the option exists. If the scrubber itself is upset, reconfiguration will be triggered via a watchdog. The status of the scrubber is also sent via serial interface to its neighbour. Triple mode redundancy will be used on selected FPGA components. This approach however, is very resource consuming and should not to be used unnecessarily.
The estimated upset rate is well within what the scrubber can handle. Because of scrubbing CRC checks and memory interleaving, uncorrectable multi bit upsets are expected to occur at a 10-100 times lower rate. Only around 10% of the total number of bits of the FPGA are expected to be essential, as only a fraction of the total available FPGA bits will be used in the design.
VI. OUTLOOK
Initial radiation testing will be conducted with a firmware that implements optical link loopback with an error counter, configuration scrubbing, block memory readback and flip-flop utilization.
The tests will show if our SEU rate estimation is correct and whether our current mitigation strategy is sufficient. If not, more mitigation techniques are needed.
There are many ways to mitigate radiation induced errors, as outlined above. They are all complementary and it is important to find a balance between desired reliability and over-engineering.
