It is anticipated that the LHC will deliver Pb þPb collisions at a minimum bias interaction rate of about 50 kHz after the second long shutdown of the LHC in 2018. This will be roughly two orders of magnitude greater than the current data recording rate capability of the ALICE experiment. Therefore a major upgrade of the ALICE detector is planned for the next shutdown to enable ALICE to record data at the full Pb þ Pb minimum bias interaction rate delivered by the LHC. A new point-to-point readout system for the electromagnetic calorimeter (EMCal) of ALICE has been developed, to replace the legacy readout bus, that essentially accomplishes this goal, and is being installed during the current LHC shutdown (2013)(2014). The new readout uses the existing EMCal front end electronics yet provides more than an order of magnitude decrease in the readout time, to about 21 μs, with modest cost and effort.
Introduction
ALICE (A Large Ion Collider Experiment) [1] at CERN is a general purpose detector designed to study the physics of strongly interacting matter (the quark-gluon plasma) produced in nucleus-nucleus collisions at the Large Hadron Collider (LHC) [2, 3] . After the next long shutdown of the LHC in 2018 (LS2), the Pb beam luminosity is expected to be increased to 6 Â 10 27 cm À 2 s
, corresponding to a minimum bias PbþPb interaction rate of about 50 kHz [5] .
The data taking rate of the full ALICE detector currently is limited by the readout capabilities of the front-end electronics (FEE) of the slowest detector subsystem to about 500 events per second [4] for minimum bias Pb þPb collisions, and to about 1 kHz for p þp collisions. Since much of the ALICE physics programme is based on the measurement of rare but soft processes, large minimum bias data samples are desired. Therefore ALICE is preparing a significant upgrade of its readout capabilities to further exploit the physics potential provided by the increased Pb þPb interaction rate expected after LS2 [5] .
The readout of the ALICE electromagnetic calorimeter EMCal [6] is significantly limited by the use of a readout bus. The EMCal readout takes about 270 μs, for a maximum readout rate of 3.7 kb events per second. An upgrade of the EMCal readout has been implemented which decreases the readout time by more than an order of magnitude without the need to replace the existing EMCal FEE boards. This upgrade is achieved with the following hardware and firmware modifications:
Replace the readout bus with point-to-point data links between the FEE boards and the readout concentrators: this solution is based on the Scalable Readout Unit (SRU) of the Scalable Readout System developed within the CERN RD51 [7] collaboration. The point-to-point data link here refers to the communication connection between a FEE board and the SRU. Each FEE board has a dedicated point-to-point link with the SRU. Therefore, the readout time is reduced by reading out ten FEE boards concurrently rather than sequentially over a bus, reducing the readout time by roughly an order of magnitude.
Implement a readout suppression algorithm in the FEE FPGA firmware to suppress Low Gain (LG) channels when they are not needed: the readout of the LG channel can be suppressed unless the associated High Gain (HG) channel is saturated. Since signals that saturate HG channels are rare, this can save nearly a factor of two in the readout time.
The above solutions have been implemented with the following hardware and firmware developments: Upgrade of the FPGA firmware on the FEE for the communication with the SRU and LG readout suppression algorithm.
Custom FPGA firmware on the SRU for application to the EMCal readout.
The SRU [7, 8] is a readout concentrator developed in collaboration with the RD51 project. The DTC daughter card and the custom FPGA firmware for the SRU and FEE boards are discussed in the sections below.
The EMCal readout

The EMCal detector and its previous readout system
The EMCal [9] is a shashlik-type sampling calorimeter that consists of ten full size super modules (SMs) and two 1/3-size SMs already installed and operated in the ALICE experiment. The EMCal coverage will be extended with six 2/3-size and two 1/3-size EMCal SMs to be installed during this 2013-2014 long shutdown of the LHC (LS1). A full size SM consists of 1152 readout towers. An individual EMCal tower is read out with an avalanche photodiode and preamplifier mounted on the tower. The preamplifier signal is split into energy and trigger shaper channels on the Front End Electronics (FEE) [10] boards. The energy shaper signals are sampled at 10 MHz with 10-bit resolution using the ALTRO chips [11] designed for the ALICE TPC (Time Projection Chamber) [12] . Prior to digitization, each energy signal is split into a HG and LG channel, each shaped separately, with a gain ratio of 16 to provide an effective dynamic range of 14-bits.
The trigger signals of 2 Â 2 towers are summed and transmitted to a Trigger Region Unit (TRU) module [13] where the 2 Â 2 tower sums are digitized and processed in a FPGA [14] . With respect to the readout system, the TRU may optionally include the trigger primitive data in the data stream, using the same format as the FEE boards. Therefore, for the purposes of the readout discussion of this paper, the TRU is equivalent to a FEE board. Each full EMCal SM requires 3 TRUs and 37 FEE boards where one FEE board is used to read out reference channels of the EMCal LED-based monitoring system. In order to simplify the description and figures, only the FEE boards are mentioned in the following discussion.
In the previous readout system, each EMCal SM has two independent readout partitions. The topology of one fully independent readout partition is shown in Fig. 1 . Each readout partition has one Readout Control Unit (RCU) [15] , with one Detector Control System (DCS) daughter card, and one Source Interface Unit (SIU) [16] daughter card, each mounted on the RCU. Event data, triggers, and commands are transmitted over the Gunning Transceiver Logic (GTL)-based readout bus between the RCU and multiple FEE boards. Each RCU can drive up to two independent GTL buses, with ten FEE cards on each GTL bus. The FEE boards are configured through the DCS card by the ALICE DCS system. The raw data from the FEE boards are concentrated into sub-events in the RCU and transmitted to the ALICE DAQ system by the SIU over a 2 Gbps optical fiber link [17] . The RCU and DCS modules are physically the same as those used by the ALICE TPC, and the SIU modules are used for all ALICE detector systems.
The data volume and readout limitations
The readout of 640 ALTRO channels within 10 FEE boards on a single GTL bus takes place sequentially. Addressing one channel on the GTL bus takes at least 0:5 μs. So addressing 640 channels takes at least 320 μs. With a sparse readout strategy that reads only those ALTRO channels with hits, the readout time of the EMCal over the GTL bus was reduced to a minimum readout time of about 270 μs, which was adequate for recent ALICE data taking, however far from the readout goal of 50 kHz after LS2.
The EMCal detector records fifteen 10-bit time samples per readout channel per event. They are compressed by discarding samples close to the reference level (pedestal) that contain no useful information ("zero suppressed") [11] . Since the timing information would be lost by the removal of a variable number of suppressed samples between samples with valid data, two 10-bit flags of sequential time bins with data are added after each cluster. In order to prevent an increase of the event size caused by the added flags, consecutive clusters of samples which have less than three intervening time bins are merged, without suppression of the intervening samples.
Therefore, the minimum and maximum number of 10-bit words of a hit channel is 5 (3 samples þ2 flags) and 17 (15 samples þ 2 flags) respectively. The 10-bit data words are then converted into 32-bit raw data words. Each 32-bit word contains three 10-bit data words plus two flag bits. If the number of 10-bit data words is not a multiple of 3, a "0" pattern is inserted to complete the last 32-bit word. In addition, a 32-bit trailer word that contains the channel address and word count is added for each readout channel. Thus, the minimum and maximum byte count for a hit channel is 12 (three 32-bit words) and 28 (seven 32-bit words), respectively.
The data of one readout partition are packed into a single subevent. The sub-event consists of an ALICE Common Data Header (CDH, 32 bytes), a payload of 20 FEE boards or 1280 channels ( Fig. 1) , and a trailer (36 bytes). The maximum event size per readout partition is then:
ð1Þ Table 1 lists the measured average EMCal event sizes and the estimated maximum number of occupied channels in pþ p and Pb þPb collisions per readout partition. The average size of the EMCal physics events are typically less than 15% of the N max and less than 20% of channels have hits in minimum bias Pb þPb collisions.
EMCal readout upgrade solution
Replace the GTL bus with point-to-point links
In the previous system, the readout of 640 ALTRO channels within 10 FEE boards on a single GTL bus takes place sequentially. After the replacement of the GTL bus with point-to-point links between the FEE boards and the readout concentrator, the FEE readout time can be reduced by reading out all of the FEE boards concurrently. This solution is based on the SRU developed in collaboration with the Scalable Readout System project [7] of CERN RD51.
In order to retain compatibility with the existing ALICE online system and the off-line decoding software, and since the bandwidth of the ALICE DAQ system is not a limiting factor for the EMCal readout (see Section 4), the EMCal readout partition organization and its interfaces to the ALICE online system are unchanged, as illustrated in Fig. 2 . Each SRU provides the two readout partitions of a full size EMCal SM. As described in Ref. [8] , the SRU board integrates a TTCrx (LHC trigger, timing, and control receiver) [18] which can receive trigger and timing information from the ALICE trigger system. It also has three SFP þports directly connected to the FPGA's high speed serial transceivers for serial data transport at up to 5 Gbps. One additional SFP þport provides a 10 GbE link. For the EMCal application, one of these transceivers Table 1 The measured average size in bytes (N event ) of various types of events, and the associated number of hit channels (N ch ), per readout partition of the EMCal detector. The number of hit channels is N ch % ðNevent À 68Þ=12, where 68 is the number of bytes of the event header and trailer; 12 is the minimum number of data bytes per hit channel. is used for the Ethernet connection to the ALICE DCS system, the other two transceivers are used for the two DDL links to the ALICE DAQ system. The functionalities of the DCS and SIU boards in the previous system are implemented in the FPGA firmware of the SRU. Each SRU has 40 point-to-point links for the 40 FEE boards of the two readout partitions of a full size EMCal SM. These links are designated as DTC (Data, Trigger, Clock, and Control) links. Event data, triggers, clock, and commands are transmitted over the DTC link between the SRU and each FEE board. The maximum bandwidth of a DTC link on the SRU is 2 Gbps. In the EMCal application, the bandwidth of the DTC link is conservatively limited to 20 Mb/s due to the hardware capability of the rather outdated FEE FPGA (Altera ACEX 1 K Family EP1k100QC208-3) and because it is sufficient to insure that the DTC link does not limit the EMCal data throughput (see Section 4) .
The SRU interconnects with each FEE board through a custom DTC daughter card which was designed for the EMCal FEE board. Fig. 3 shows the simplified diagram of the FEE board before and after installation of the DTC daughter card. It provides interface compatibility between the SRU and the existing EMCal FEE board. The DTC daughter card mainly consists of an RJ45 port, an LVDS driver, and a power switching circuit. It mounts on the FEE board by making use of existing test-point holes into which pins and sockets have been inserted, allowing the DTC daughter card to be plugged on without soldering. About 600 DTC daughter cards have been produced and installed on the EMCal FEE boards.
Suppression of low gain readout
Each EMCal tower energy signal is split into the HG and LG channel, and shaped separately with a gain ratio of 16. The LG channel data is used in the offline analysis only when the associated HG channel has saturated. The concept of the LG readout suppression algorithm is to check the HG signals in a real time in the FEE FPGA and then omit the ALTRO readout of the associated LG channel if the HG signal is not saturated. For low energy signals, the HG channel information is sufficient. The EMCal offline analysis experience shows that it is very rare that the LG channels are needed. Therefore, the LG suppression readout algorithm can save readout time by eliminating entirely the readout of nearly half of the readout channels.
Implementation and test results
The above solutions have been implemented for the EMCal readout using the SRU of the CERN RD51 project, the EMCal specific DTC adapter card, and the custom FPGA firmware for the FEE and SRU for the EMCal application.
The FEE firmware has been upgraded with the following new modules shown in Fig. 4 : ALTRO readout module: in the previous readout system, the ALTRO data was read out directly by the RCU through the GTL bus. This new module performs the ALTRO readout procedure inside the FEE FPGA.
Data decode and format conversion module: this module decodes the ALTRO data and converts it into the EMCal raw data format.
HG saturation check module: this module checks the saturation status of the HG channel in a real time and provides the decision to the ALTRO readout module about whether or not to read out the associated LG channel. DTC interface module: this module performs the custom DTC protocol between the EMCal FEE board and the SRU. It decodes triggers and commands from the SRU, and transmits status and event data to the SRU.
The FPGA firmware in the SRU includes functions provided by two sets of RCU, DCS, and SIU boards in the previous EMCal readout system. Pipelining is used in both the FEE and the SRU firmware to minimize overhead in the readout process. Fig. 5 shows the simplified readout data flow from the ALTRO to the ALICE DAQ system.
The event data is transmitted through three stages: (1) from ALTRO chips to FEE FPGA via the ALTRO bus; (2) from FEE FPGA to the SRU through DTC links; (3) from SRU to DAQ through the DDL link. The events are buffered in the FEE and the SRU, allowing the three stages to operate in parallel. Since the three stages are decoupled, the maximum event readout rate R max of the SRU system for the EMCal detector can be estimated as follows:
where t DDL is the time to transmit the data over the DDL link, t DTC is the time to transmit the data over the DTC link, and t ALTRO_32 is the time to read out event data from 32 HG channels in the ALTRO chips of a single FEE board which can be written as follows:
where t ai ði ¼ 1; …; nÞ is the readout time spent to address channel i, which requires about 0:5 μs, and t di ði ¼ 1; …; nÞ indicates the time spent to transmit the variable number of data words of channel i. The time to transmit the FEE data over the DTC link is calculated as follows:
where N FEE is the number of bytes of data from all channels of a Fig. 3 . Simplified diagram of the FEE board before and after installation of the DTC daughter card. single FEE board and f DTC is the bandwidth of the DTC link. The time to transmit the SRU data over the DDL link to the DAQ is given by the following:
where N event is the size of the event in bytes to be transmitted over the DDL link (as given in Table 1 Table 1 ). Further improvement in the EMCal readout speed would require redesign and replacement of the EMCal FEE, at significant cost and effort. For event sizes larger than 3.6 kb, the transmit times over the existing DDL links, t DDL , will limit the maximum event readout rate (the top panel of Fig. 6 ). If necessary, this limitation can be alleviated by future firmware changes in the SRU to use the available 10 GbE link (shown as solid circles), or to upgrade the DDL link speed to 5 Gbps (both under consideration in ALICE).
The bottom panel of Fig. 6 shows the measured total event readout time with the SRU for different event sizes. The measurement was done with a laboratory test setup shown in Fig. 7 using the ALICE DAQ system with SRU readout of 20 EMCal FEE boards (one readout partition) with DTC cards mounted. The different event sizes were produced by configuring the FEE with different numbers of time samples and zero-suppression thresholds. With the latest SRU and FEE firmware the total readout time for minimum bias Pb þPb event sizes is $ 21:4 μs. The measured readout times are somewhat larger than expected, which can be attributed to readout overhead (observed to be $ 2 μs per event)
that was not included in the estimation, and may be improved with further firmware optimization.
During LS1, the DTC daughter cards have been mounted on all of the FEE boards and the SRU readout has been implemented and tested on all of the installed EMCal SMs. The additional EMCal SMs being installed during this shutdown will be commissioned with the SRU readout. With the SRU readout the EMCal can be read out either upon the receipt of the ALICE minimum bias trigger, up to almost 50 kHz, or upon rare triggers, such as the high energy shower or jet triggers provided by EMCal, which remain available unchanged with the new EMCal readout.
Conclusion
The upgrade of the EMCal readout system uses point-to-point links between FEE boards and a new readout concentrator, the Scalable Readout Unit of the Scalable Readout System of RD51 project. A plug-in DTC daughter card has been designed to preserve the compatibility with the existing EMCal hardware. The Low Gain readout suppression algorithm, the ALTRO readout function, and the custom DTC protocol have been implemented through a FEE FPGA firmware upgrade. The function of the ALICE detector control system and DAQ data link boards of the previous readout system have been implemented in the FPGA firmware of the EMCal SRU to provide full compatibility with the present ALICE online system.
Full readout chain tests of the new system demonstrate a readout time of 21:4 μs for EMCal event sizes expected for minimum bias Pb þPb collisions, which may be reduced with further fine-tuning of the firmware. While this is more than an order of magnitude improvement over the previous readout system, it is ultimately limited by the minimal readout time of the ALTRO chips (19:3 μs) on the FEE boards. The new SRU based readout system has already been installed on the EMCal during LS1. It nearly attains the ALICE goal for the period following the 2018 shutdown to be able to record data at the anticipated 50 kHz minimum bias Pb þPb interaction rate. Fig. 7 . The test setup of the newly developed readout system for the EMCal detector with 20 FEE boards (one previous RCU-based readout partition) connected.
