on behalf of the eMS DT group Abstract-The readout server (ROS), which constitutes the second level of the CMS drift tubes (DT) subdetector readout architecture, is a complex VME 9U board, currently placed in the CMS cavern, in the racks on one side of the detector wheels. The Results show that the ROS board could become a limiting factor due to the event processing time. Virtex-6 and Spartan-6 series are being considered. In both families, fully-automatic asynchronous deserialization is carried out by the gigabit transceivers, which are not suitable for our application due to the minimum data rate and reduced availability. Nevertheless, asynchronous data reception can be carried out by making use of the dedicated deserializers present in each of the 110 tiles, plus some additional logic and clocking resources.
Results show that the ROS board could become a limiting factor due to the event processing time.
The capabilities of currently-available FPGAs allow incorporating most of the ROS functions (input deserialization, input buffer, data processing and multiplexing, slow control interface, test-mode operation) into a single device. In particular,
Virtex-6 and Spartan-6 series are being considered. In both families, fully-automatic asynchronous deserialization is carried out by the gigabit transceivers, which are not suitable for our application due to the minimum data rate and reduced availability. Nevertheless, asynchronous data reception can be carried out by making use of the dedicated deserializers present in each of the 110 tiles, plus some additional logic and clocking resources.
The improved performance of these devices (as compared with current ROS technology) allows reducing the event processing time, increasing maximum system's operation frequency. . The outermost layer of the CMS detector, the muon system, is in charge of identifying and accurately determining the trajectory described by muons as they are deflected by the magnetic field. In the barrel, the position of the muon is measured by the drift tubes (DT). In the DT cells, the distance to the central anode is calculated from the transit time of the electrons, produced by the ionization of the gas particles and accelerated by the intense electric field. The current pulse is amplified and shaped in the front-end boards (FEB), and its time stamp digitized in the readout boards into a ROS board [3] , situated in racks close to the detector.
The ROS is in charge of carrying out a basic processing of the information and multiplexing these data into a single 800
Mbps, fiber optic link. This link is routed through tunnels to the CMS counting room, approximately 60 m away. The ROS links are received by 5 device-dependent units (DDU) that further pack and process this information and hand it out to the data acquisition (DAQ) system for subsequent storage and analysis in the Grid.
II. MOTIVATION
The readout system was designed in the late 90s, and was optimal in view of the expected data volume, the technology availability and cost at the time, and the tight restrictions imposed by CMS on the electronic design (space, tolerance to magnetic fields and radiation, power dissipation, reliability). , which is currently known as High-Luminosity LHC (HL-LHC).
During the ROS design phase, various analyses were performed [4] in order to study the expected occupancy of the detector from Monte-Carlo simulations.
The study showed that for LHC's nominal luminosity However, with current luminosity levels, the number of muons per event is low, and the ROS board is performing properly,
with no foreseeable problems in the short term.
Under the luminosity conditions of HL-LHC, however, both background hits and muon frequency are expected to increase in a factor 10. For 25 evenly-distributed background hits per event, the maximum LIA would decrease to 120 kHz. Adding the delay caused by processing of muons, whose frequency will approach 1 per event per sector, we obtain a processing time estimation that exceeds the threshold marked by the 100 kHz L I A rate.
III. UPGRADE TO THE DT READOUT SYSTEM
As a result of the marginal ROS processing time expected during HL-LHC, a new version of the board is being designed.
In the new architecture, the ROS boards are to be placed in the counting room, with data being converted from copper links to fiber optic in the current ROS location.
The ROS board is being re-designed accordingly, and the first tests of the improved design are being implemented in Two high-integration MTP 12-fiber optical receiver modules and a dual LC receiver module are to be used. The output serialization and laser driving tasks are currently carried out by the CERN's GOL (gigabit optical link) IC [5] , which plugs to the main ROS board in a mezzanine board that also includes the VCSEL.
In the next two sections we discuss our main conclusions regarding the two most critical issues of this new design: link deserialization and multiplexing performance.
A. Asynchronous deserialization
Although the LHC clock is available both at the cavern and the counting room, and it could be used for sampling input data, it was decided to make the deserialization asynchronous, with automatic clock recovery, as is currently the case in the ROS board. This way the reliability of the link is improved since slight frequency variations are allowed between transmitter and receiver.
Fully-automatic serial data reception with clock recovery is restricted in both Virtex-6 and Spartan-6 to the gigabit transceivers, which are available in reduced number and pin, plus some additional logic.
In the Virtex-6 architecture, the transmission clock can be easily recovered from the data received from the ROB, which includes both a stop and a start bit in every 12-bit word. The data stream is fed to a clock buffer which is enabled during the stop bit and is disabled during the start bit, producing a low-duty-cycle signal that is introduced to a mixed-mode clock manager (MMCM) in order to generate the auxiliary clocks. Porting the Virtex-6 deserialization schema previously explained to the Spartan-6 architecture is not possible. The configuration of the input-output blocks (lOBs) in Spartan-6
does not allow outputting the undelayed input signal to the FPGA fabric when it is being used as an input to the SERDES.
Although this problem could be circumvented by introducing one of the links in two different lOBs, we found that the clock recovery was not possible due to limitations on the performance of the Spartan-6 clock buffers and clock management resources.
In the Spartan-6 architecture, each SERDES includes a phase detector that can be used to dynamically adjust the amount of input delay to ensure optimal data sampling, with a clock that is asynchronous to the data's source clock.
Consecutive samples are compared, and when inequality is found, it marks the presence of a data transition in the interval between those samples. Thus, it is safe to assume that, by taking any of the remaining two samples as the bit's value, the chosen sample is no more than 1/4 of a bit away from the center of the eye. This accuracy can be improved to 1/8 of a bit if we are able to identify which one of the two "eye"
samples is closer to the center of the eye. In order to do so, we measure the time elapsed between shifts in the "edge" samples (i.e., the times when the transitions start to be detected between two samples different than the ones it was previously being detected at). This time, together with the time elapsed from the last shift in the "edge" samples, is used to select from the two "eye" sample candidates the one that is actually closer to the center of the eye. A schematic of this process is provided in Fig. 2 . Note that in case the two clocks are exactly the same, there should be no changes in the samples in which the transition is detected, and therefore, there is no way of determining which one of the two "eye" samples is closer to the eye center.
The ability of this scheme to lock to the incoming data stream while varying the difference between the remote clock and the local clock was tested. Two different data patterns were used: a pseudorandom (PR) data stream and also the signal generated by a ROB board in the absence of valid data, which has two transitions in every 12-bit word. The lock was maintained for clock frequency differences up to 7.5 %0 for the case of PR data, 5 %0 for ROB data. This divergence responds to the fact that the generated ROB stream only has one transition every 6 bits, while the PR data has more. Also, it was observed that the lock performance is lost abruptly for the ROB data at 5 %0 clock frequency difference, due to it being a deterministic signal. For the PR data, the period between losses of lock decreases gradually as the frequency difference is increased. As a reference, the LHC signal is expected to vary in maximum ± 3.5 kHz while it is ramping up the energy of the beams, which translates to a relative difference smaller than 0.1 %0 (twice as much if the variation is considered relative to the lowest or highest frequency). Consequently, our system is able to lock to the LHC clock frequency variations data sampling points current ROS (2 BXlhit). The approximate processing time for (b) (c) Fig. 2 . Schematic view of the deserialization process. In (a), the data stream is shown, along with the sampling points, at a rate 4 times that of the data. In (b) the data stream has been wrapped to ease visualization of the transition detection and consequent selection of the two samples closest to the eye center in each cycle. In ( c), the selection of the sample that is within 1/8 bit from the eye center is illustrated: initially (to) the transition is being detected between two samples, and at t, it moves to a different interval. After t.t = h-t, the transition moves again, and t.t/2 after this time, at 4, the optimum eye position is moved to the next sample. The multiplexing performance of the new design will be able to manage the workload expected for the operation of HL-LHC, guaranteeing its proper operation for the expected detector occupancy.
ACKNOWLEDGMENT
We would like to thank Cristina Fernandez Bedoya and Carlos Willmott for their significant contribution to the work presented in this paper.
