ABSTRACT: We present a scheme to upgrade the CMS HCAL front-end electronics in 2015-16. The HCAL upgrade is required to handle a major luminosity increase of LHC which is expected for 2017. This paper focuses on the requirements for the new electronics and on the proposed solutions. The requirements include increased channel count, additional timing capabilities, and additional redundancy in a harsh environment which is constrained by the existing system. The proposed solutions span from chip level to system level. They include the development of a new ADC ASIC, the evaluation and use of circuits from other developments, evaluation of commercial FPGAs, better thermal design and improvements in the overall architecture.
Introduction
The existing CMS [1] hadron calorimeter (HCAL) Front-End Electronics (FEE) is used to read out three different partitions: HCAL Barrel (HB), HCAL Endcap (HE), HCAL Outer (HO). A modified version of the HCAL FEE is used to read out three more sub-detectors: HCAL Forward (HF), Castor, Zero-Degree Calorimeter (ZDC). They use a mixture of photo-sensor technologies: HB, HE and HO were built with Hybrid Photo-Detectors and are going to be upgraded with Silicon Photomultipliers; the remaining detectors use Photo-Multiplier Tubes that are also going to be upgraded. This paper often refers implicitly to the HB case, being the case with the highest density of channels in the FEE.
In the present system, a single channel consists of a chain of a photo-sensor, a charge-integrating ADC, framing logic and the transmission over an optical link. Overall there are ∼10 4 channels. A Readout Module handles 18 channels; four Readout Modules are plugged on a Readout Box and are controlled by a Clock and Control Module via a custom backplane. Each Readout Box is served by two cables for the power supplies and is linked to the control room by 24 data fibers, 1 or 2 fibers for calibration and 2 fibers reserved for control.
The physics motivations to upgrade the FEE are beyond the scope of this paper; they are described on references 2 and 3. The upgrade of the HCAL FEE is constrained to keep the existing infrastructure intact, including the Readout Boxes (enclosure and backplane), the power cables, the cooling pipes and the optical fiber connections with the off-detector electronics. The photo-sensors generate fast current pulses. Presently, these signals are digitized at 40 MHz by a charge-integrating ADC ASIC called QIE8. The A/D conversion is non-uniform in order to be more efficient [7] . The digital outputs of QIE8 ASICs are combined in groups of three, serialized and transmitted over optical fibers to the counting room.
The upgraded FEE will have three major improvements in signal processing. The first improvement is an increase in the number of channels, in order to allow longitudinal segmentation of the detector and redundant readout. In the HB case the number of channels increases from 72 to 256 keeping the existing Readout Box. The second improvement is better timing information. This is needed for background rejection and improved characterization of signals, and it is achieved with a TDC scheme that will give a raw resolution of ∼2ns. This value has to be compared with the existing timing information that is 25ns. The third improvement is the increase in the number of bits in the charge-ADC, as discussed later.
Redundancy
The existing FEE has one Clock-and-Control Module (CCM) per Readout Box with no redundancy. A failure of a CCM causes the loss of a significant number of detector channels, and such an occurrence would be a severe fault. The upgraded system must be robust against individual failures of (part of) a CCM. The desired lifetime of the CCM's is 15 years.
Radiation and magnetic field
The upgraded FEE must survive an integrated luminosity of 3000 fb −1 [3] . The radiation levels can be estimated with the CMS unofficial calculator [4] or scaling up the values from table 5 of reference 12. The two estimates differ by less than a factor of 2. Taking worst-case estimates and then multiplying them by a safety factor of 5, the resulting target radiation environment is shown in table 1.
The magnetic field will be 4 T as in the existing system.
Power and cooling
Readout modules and CCM's are cooled by contact with the metal enclosure of the Readout Box, which in turn is water-cooled. This is part of the infrastructure that will not change. The existing FEE generates ∼90W of heat per Readout Box, but the cooling infrastructure can remove 200W. This value is the upper limit of the power consumption for the upgrade. The operating temperature of the FEE is monitored and can vary between 15 o C and 45 o C. The cooling system can have a temporary fault. This is detected within a few minutes and it leads to powering-off the FEE. The new FEE should be robust to these occurrences.
Architecture and building blocks
Given the strong constraints listed in paragraph 1, the architecture of the upgraded FEE cannot be too different from the present FEE. On the other hand, there is a significant change in the overall architecture, which will be discussed in section 3.5.
Most of the work is focused on developing and integrating new building blocks. We plan to regulate the voltages with the DC-DC converter described in [5] . The optical communication with the off-detector systems will use the chip-set from the GBT project [6] . In particular the GBT serializer-deserializer chip (called GBTX) has a line rate of 4.8 Gb/s.
Charge integration and digital conversion
In the existing FEE, the QIE8 ASIC [7] performs the charge integration, digitization and encoding of a wide dynamic range into a floating point format. The upgrade plans rely on a version (QIE10), presently under design, with the following new or modified features:
1. improved radiation tolerance 2. input impedance and sensitivity (3fC) matched to the new photo-sensors over a greater dynamic range (330pC)
3. number of encoded, 40MHz ADC bits increased to 8
4. circuitry to insert time-markers or other word-alignment strategies 5. 320Mb/s serialized output data. This has two advantages compared to the parallel output of QIE8: it allows reducing the power consumption and it requires fewer connections on the PCB pack. The serial output should be SLVS in order to be compatible with the GBTX 6. output ("signal-over-threshold" or less likely "constant fraction discriminator") for the TDC 7. 4 channels in parallel on a single chip (as opposed to a single channel in the QIE8)
8. clock phase adjustments 9. additional supply voltage (3.3V). This allows saving power generating the 3.3V on an offchip switching regulator 10. migration to a finer process technology (AMS 0.35µm SiGe BiCMOS) Figure 1 . A channel of the upgraded electronics.
FPGA
Commercial-grade FPGAs can be fabricated with one of the following technologies: Antifuse, RAM, Flash memory. Antifuse-based FPGAs show good radiation tolerance and they are used in the existing HCAL FEE. The main disadvantage is that they are one-time programmable. RAMbased FPGAs show poor radiation tolerance; particularly their configuration ("firmware") can be corrupted by radiation. Mitigation techniques exist (scrubbing, etc) but are complex and require extra power and parts. Moreover, they have high power consumption. Flash-based FPGAs are relatively new and show good radiation tolerance. Actel offers two families of flash-based FPGAs: ProASIC3 and Igloo [11] . ProASIC3 radiation tolerance is described in the literature [8] . According to available test results [9] , Igloo FPGAs show a similar tolerance. The Igloo family is characterized by lower power consumption. Effects of radiations on Actel flash-based FPGAs includes: decrease in the maximum operating frequency, increase in power consumption, SEU in the user logic and in the I/Os. Remarkably, the FPGA configuration is not upset by radiation [9] . Up to a radiation dose of 30 krads, Actel Igloo FPGAs do not show an increase in power consumption, while their maximum clock frequency is reduced by about 35% [10] . Functional failure has been observed [10] at 30 krads on ProASIC3 and at 40 krads on Igloo FPGAs. The proposed FPGA is from the Igloo family of Actel FPGAs. These nonvolatile FPGAs are reprogrammable 1000 times as long as TID < 15 krads. In order to mitigate the SEU in the user logic, triple modular redundancy (TMR) should be used in the FPGA design ("firmware"). Moreover, triplication of critical FPGA I/O's should also be considered. In order to validate the Igloo FPGAs in a radiation environment as close as possible to our target, we are preparing tests of the device on the IRRAD-6 facility at CERN.
TDC
TDC plans include producing a discriminated pulse out of the QIE10, and then digitizing that pulse in the FPGA (figure 1). Taking into account the performance degradation due to TID, the Igloo FPGA can run with a 160 MHz clock (4 times the system clock). Such a clock allows a TDC with a resolution of 6.25 ns, which does not meet our goal. In order to reach a resolution of ∼2ns the design shall use fractional clock period techniques, while allowing excursions in temperature. An additional system-level consideration is the fact that the rise-time of the analog signal (input of the QIE10) depends on the amplitude of the signal itself, so the signal-over-threshold operation gives a timing resolution that has some intrinsic limitations. For this reason the TDC does not need to be extremely precise. At the same time, we are considering to extract two values from the signal-over-threshold: the positive crossing and the negative crossing. Combining these two values can improve the estimation of the timing of the signal from the photo-sensor. A limitation of this approach is the bandwidth available to transmit data off-detector. So we will evaluate the possibility of doing TDC only on certain channels or bunch crossings. The ability to tune the TDC in the FPGA will play a key role.
Readout module
An upgraded Readout Module will contain 64 of the channels shown in figure 1 . As the number of channels increases almost by a factor of 4, the major issues are heat dissipation (discussed in section 3.6) and signal density. The present module is composed of three identical cards which plug on three backplane connectors. In order to fit more channels we envision a module (figure 2) with four identical cards, each card holding four QIE10 chips (i.e. 16 channels). In order to plug these four cards on the three backplane connectors, an adapter board is needed. Eight GBTX will be on the Digital Transmitter Board. A few FPGAs (Actel Igloo) will implement the TDC and control the QIE10 and the GBTX. The TDC outputs of the FPGAs must be fed to the SLVS inputs of the GBTX. The SLVS levels are not supported by any presently available FPGA, so there is an interface issue. It seems possible to drive an LVDS signal into an SLVS input, but this reduces the safe margins. The most appropriate solution is to use a level translator chip. The FPGA side of the translator should support either LVDS or LV-CMOS. In order to minimize power consumption we used values from table 2-15 of [11] and we found that the optimal solution is a LV-CMOS translator with the Igloo FPGA driving 1.8V-CMOS 80Mb/s outputs. The translator chip could be powered with 1.5V, provided that its CMOS inputs are 1.8V-tolerant. Such a chip should be placed near the FPGA in order to minimize the length of the CMOS lines, while the SLVS levels can drive lines longer than a meter. Finally, each GBT output link should carry the ADC data (64 bits at 40MHz) and the TDC data (18 to 24 bits at 40 MHz, depending on the size of the GBT user data field) from eight channels.
A redundant control scheme
Presently the off-detector electronics sends commands to the CCMs via TTC links [3] and there are no connections between CCMs. In the new version the TTC links will be replaced by GBT links. In order to avoid the single-point of failure mentioned in section 2.2, we study a control system with two levels of redundancy: inside a CCM and between CCMs. The first level is achieved by duplicating some of the components. In case of a failure of a single component, appropriate multiplexing logic should switch to the spare component. The second level of redundancy is achieved allowing each CCM to take over the control of one of its neighbor CCMs. This will be implemented via a ring architecture that connects all CCMs, as illustrated in figure 3 . In figure 3 the connections from the off-detector electronics are the GBT links, while the new connections from CCM to CCM (blue lines in the figure) create a ring meant for redundant operations. Figure 4 exemplifies how the ring can be effective: if there is a failure in the link from the off-detector electronics (including the fiber and the interface part in the CCM), a neighbor CCM will replace the faulty circuitry. Another feature of the new CCM is that it should be possible to reprogram the FPGAs remotely. A CCM prototype is presently under design. 
Thermal design
The limit on the power dissipation was discussed in section 2.4. Moreover, there is the problem of heat transfer. The infrastructure does not foresee the possibility to have water cooling inside the Readout Box. Cooling fans will not be used because of the magnetic field. For the Readout Module, we are studying a design where the highest power parts (GBTX, FPGA, voltage regulators) are close to the heat exchange (Digital Transmitter board in figure 2 ). The heat exchange plates of the Readout Box are on top and bottom.
We are also studying how the PCB can transfer heat more efficiently. While there are solutions based on special manufacturing processes, we prefer the solution described below, as it is based on standard PCB manufacturing. In the preferred solution, the PCB has exposed metal around the edge of the board on the signal layers of the board. The exposed metal of these layers is connected with a row of tightly spaced vias and transfers heat from the core of the board to the edge. The left side of figure 5 shows the edge of the board with all of the layers stacked; note that the ground and voltage planes are held ∼0.3mm away from the edge of the board. The right side of figure 5 shows the exposed metal from the top of the board. The board would be mounted on a heat sink in contact with the exposed metal, in order to transfer the heat to the chassis. This solution may apply to both Readout Module and the Clock-and-Control Module.
Voltage and power
An estimation of the power consumption is in figure 1. One can see that the power budget is close to the upper limit (total). Therefore we intend to use switching regulators ("DC-DC converters", [5] ) which are more efficient than the linear regulators used in the present FEE. Assuming 80%-efficient converters, they supply 160W to the rest of the FEE, of which 134W are already taken by QIE10 and GBT chips.
The existing FEE is supplied with two different voltages (5V and 6.5V). New FEE equipped with DC-DC converters can be supplied at a higher voltage (∼ 10V) in order to reduce cables losses, which otherwise would be not acceptable. 
