The TOTEM experiment at the LHC at CERN will measure the total proton-proton cross section with precision of 1 %, elastic proton scattering and diffractive dissociation at the center of the mass energy of 14 TeV. This article is brief report on the TOTFed motherboard, which is the main part of the on-line Data Acquisition of the TOTEM. The designed firmware is supposed to collect data from the particle detectors, build a consistent event frame and send it to the counting room. The data processing is divided between 7 FPGAs. To accomplish the objective, specialized interfaces such as VME64x, S-Link 64, Gigabit Optical Link (GOL) are used. Since a significant part of the applied electronics is specific only for the LHC machine, the FPGA firmware implementation required a deep understanding of the Data Acquisition System and then intensive studies and debugging of the developed solutions.
INTRODUCTION The TOTEM (TOTal Elastic Scattering and Diffraction Measurement )
12 experiment will measure the total proton-proton cross section and will study the elastic scattering and diffractive dissociation at the Large Hadron Collider (LHC). 3 More specifically, TOTEM focuses on:
• the total cross-section with an absolute error of 1 mb * by using the luminosity independent method. This requires the simultaneous measurement of the elastic pp scattering down to the four-momentum transfer of −t ≈ 10 −3 GeV 2 and of the inelastic pp interaction rate with an adequate acceptance in the forward region;
• elastic proton scattering over a wide range in momentum transfer up to −t ≈ 10GeV 2 ;
• diffractive dissociation, including single, double and central diffraction topologies using the forward inelastic detectors in combination with one of the large LHC detectors.
The TOTEM is the smallest experiment at the LHC hosted by the bigger one -the Compact Muon Solenoid (CMS). From the technical point of view, it means, that data, trigger and commands format should be compatible with the CMS detector. The experiment consists of two tracking telescopes T1 and T2, 5 and so-called "Roman Pots" placed on both sides of an interaction point where accelerated particles collide with the target 14 TeV centre of the mass energy. 
Architecture of the readout chain
The readout chain of the TOTEM is presented in Figure 2 . Each of the sixteen VFATs is connected electrically to a one GOH. Twelve GOHs create a bundle of optic fibres, that goes directly to a OptoRx on the TOTFed. The TOTFed collects data at most from three OptoRx devices and builds an event frame which contains all tracking information from detector region serviced by the TOTFed. The data are buffered on the motherboard and wait for a readout via VME (Versa Module Eurocard ). 7 When it initiates, data are sent to a group of local PCs, that filter and store information.
The data flow starts when the VFATs receive a trigger signal -Level One Accept (L1A). This signal is generated by dedicated system called Timing, Trigger and Control CMS interface (TTCci). 8 The L1A is provided to the VFATs by the Control and Clock System, which, roughly speaking, translates the TTC information to simple electric signals. * mili barn, barn is a unit used in nuclear physics for expressing cross sectional area of nuclei. 1b = 10 −28 m 2 (size of an uranium nucleus)
TOTFED
The TOTFed is a ten-layer motherboard designed according to the VME specification. The size of the board is 402x366.7 mm -it is the biggest one (9U) provided by the standard. Figure 3 presents the TOTFed motherboard. The scheme of the TOTFed is presented by Figure 4 . Each OptoRx, where a data flow starts, is connected directly to the associated Main via two separate data buses: S-Link 64 bus, and Opto-local bus. The S-Link 64 bus is a local implementation of the interface, which allows to send 64 bits of data in a single word. Data are sent in one direction, from the master -an OptoRx, to the slave -a Main, clocked with 40 MHz clock. In the future, the S-Link 64 bus will be routed to an output connector and passed directly to a storage device in the counting room. The Opto-local bus (16 bits) is used to configure the FPGA from the VME level and read the device status.
Between the Mains and the VME there exists a shared 32-bit bus named Local bus controlled by the VME FPGA. It is used to read data from the VFAT chips as well as to configure and obtain the status of the TOTFed components.
The VME and the optional mFEC connector are linked together by a separate bus of 32 bits, which is used by the Trigger version of the TOTFed and is not covered by this document. However, it is possible to extend the Local bus with an additional device on the mFEC connector by writing a specific value to one of the registers of the VME.
The authors implemented and designed the firmware of the VME FPGA (Chapter 3), the Mains FPGAs (Chapter 4) and the OptoRxs FPGAs (Chapter 5), which is the scope of this thesis and is reported in the following chapters.
TTC signals distribution on the TOTFed
The bottom of the Figure 4 presents the TTC path. An optical signal containing Timing, Trigger and Control System (TTC) information goes to the TTCrx. Decoded commands are sent independently to the VME and the three OptoRxs. The TTCrx provides as well 40 MHz reference clock to the QPLL (Quartz Crystal Based Phase-Locked Lop). 9 The QPLL is an ASIC chip, that act as a jitter-filter and a clock multiplier for the LHC bunch-crossing frequency (40.08 MHz). It is conceived to meet the LHC radiation requirements. The QPLL is a source of a clock signal for all synchronous device hosted by the TOTFed : the three OptoRxs, the three Main, the Merger (the FPGA used in Trigger version of the TOTFed) and the VME. The electrical paths to the Mains are specially tuned to obtain a coherent signal. The other devices are nor equipped with such a functionality. Although the QPLL is able to multiply the clock frequency, due to a physical limitations of the motherboard, the distributed clock signal frequency is 40.08 MHz. However, all FPGAs are equipped with internal PLLs so it is possible to generate higher frequency inside the programmable devices.
THE VME
The VME uses one of the smallest FPGA of Altera's Cyclone Family: EP1C4F400C8. Specific information about device can be found in the data sheet provided by the producer. 10 Although the speed grade of the VME is equal to 8, the Cyclone features were enough to satisfy not trimmed objectives:
• acting as a bridge between VME 64 bus and the TOTFed;
• being a controller for simple interfaces like I2C, OneWire or JTag;
• implementing TTS state machine;
• supporting of the TTCrx chip.
The architecture presented in Figure 5 meets enumerated demands. Of course, the major block is a state machine designed accordingly to the VME bus specification.
11 It translates VME data into internal buses (the VME internal bus and the Local bus) and vice versa. What is more, the FSM split the device: on the left of it, each component is synchronized to the clock of 80 MHz provided by the PLL, on the right, signals are not synchronized because the TOTEM uses an asynchronous kind of the VME bus. The most utilized is the path between the Local and the VME bus due to the data acquisition. However, from time to time, there is a need to use the simple interfaces connected to the Internal bus. Figure 5 . The VME architecture.
The TTCrx Receiver is used to decode the event counter (EC) and bunch counter (BC) basing on a 4-bit line controlled by the TTCrx chip. The I2C Controller allows access to the TTCrx registers. The One Wire Controller is utilized to obtain informations from embedded temperature sensors. The JTag controller gives the opportunity to boundary-scan of the TOTFed. It is a very handy feature, because allows program a FPGA remotely. At the bottom of Picture 5 is presented the TTS machine, which based on busy signals from the OptoRxs, generates the busy state to the TTS system.
THE MAIN
The Main uses the FPGA, which belongs to Altera's Startix Family: EP1S20F780C7. This is a well-equipped device, designed for a digital processing, that is capable to satisfy requirements of most applications. The producer characterizes it, as "low-power high-performance".
12 Today its features seem to be a bit archaic, especially if one compares them to the process node of top processors (for Stratix it is 130 nm), however during design development one has not been obstructed by any device timing limitations. The FPGA used as the Main has 18 460 LEs, about 1,6 Mb of an internal memory, 10 DSP blocks, 80 fast embedded multipliers and 6 PLLs. What is important, it allows to use 586 user's pins, what is crucial, for a device, which links several parallel buses.
The Main is a device, which mediate between the VME and the OptoRx. Its main goals are:
• receive data through S-Link interface in the CMS FED Common Data Format and buffer them for a readout via the VME interface;
• allow access to the registers of the OptoRx from the Local bus via so called the Local-opto bus interface;
• generate the TTS signals. Figure 3 shows that each Main has to its disposal 18 MB of the SRAM memory providing No Bus Delay Architecture (NoBL) and the USB2.0 port. This features are not supported by the current version of the firmware, however there are plans to implement needed controllers in next releases. It will give the possibility to enlarge space of the FIFOs and enables the local data readout through the fast interface -usage of the S-Link 64 interface is possible only within the CMS system.
The functional scheme of the Main is introduced in Figure 6 . 
THE OPTORX
The OptoRx uses the most advanced FPGA in the DAQ of the TOTEM experiment: EP2SGX60EF1152C5. The device, as rest of the chips, is manufactured by the Altera and belongs to a third generation of FPGAs: the Stratix II GX family. A hallmark of the EP2SGX is a fact, that it is equipped with high-speed serial transceivers able to work with the speed up to 6.375 Gbps.
13
The objectives of the OproRx are:
• receive data from an optical link with speed of 800 Mbps;
• build the events according to the CMS Common Data Format ;
• buffer data in internal FIFOs;
• support the local S-Link interface;
14
• enable communication with the Main through the Opto-local bus; Figure 7 . The scheme of the OptoRx. Figure 7 presents scheme of the OptoRx. All parts make up on a big processing block. The implementation, thanks to emulator components, gives the opportunity to test each channel path on the TOTFed without a real data source, what is crucial, especially in the laboratory. Also being mounted in the detector, the emulators, in case of problems, allows quickly debug if it is related to the TOTFed or a other part of the experiment. The Opto-local bus controller accomplishes the communication between the OptoRx and the VME bus (throughout the Main). It is used to configure the OptoRx in a correct operational mode. The output from the processing block is routed to the Main via mentioned before, the local S-Link 64 interface.
Data Processing
In the authors opinion the most notable component is the processing component. It consists of two blocks ( Figure  8) . One of them accomplishes the synchronization of incoming data from 12 GX receivers. Second one enables building of the S-Link packet according to the CMS Common Data Format. Each entity is clocked with the same frequency as the LHC machine -40 MHz.
Synchronization block
The synchronization block is a vector of 12 basic cells grouped in four. Each group is called subframe and contains data from four GOHes (in normal mode).
The coming data are buffered in the synchronization FIFO. Its depth is 2k of 17 bits words. The input multiplexer selects a data source for the built-in memory between: a header, a data payload and a trailer. The data from GX receivers are registered. This approach cuts the data paths and allows the fitter to find an optimum placement for a logic close to the FIFO, which is no longer hitched to the GX transceiver. The primitives (flip-flop and 'and ' gate) on the top of the Figure 9 generate from the original and shifted data valid signal a single strobe indicating the end of the current frame. This information is used not only by the state machine from the figure, but also, as it is stored as MSB in the synchronization FIFO, by following block in the processing entity. Additionally, the basic cell provides the Fifo almost full flag, which is high, when available space for the write port of the synchronization FIFO is less than 194 words -equivalent of the one VFAT frame with the header and the trailer. The basic synchronization cell is controlled by the Mealy state machine. Its state transitions are presented in Figure 10 . At the beginning, in the idle state, the FSM checks the data valid, the fiber active and the fifo stop signals. The fiber active is a result of logical 'and ' function between fiber enable, the flag set by user to enable a specific channel in the data acquisition, and the fiber status which indicates if the GX receiver is synchronized to the GOH data source. The fifo stop is asserted if any of the synchronization basic cells has space for less than 194 words (logical 'or ' function of the Fifo almost full flags from each basic synchronization cell ). This allows to pretend the system from acquisition of incomplete events. If the data valid is asserted, the fiber is active and all synchronization cells are able to collect one more GOH frame, the FSM writes the header to the FIFO and goes to the payload state. Here it buffers following words of data, expecting the end of frame or thesynchronization Fifo full flag (asserted when one word for a trailer left). In both cases it goes to the trailer state, writes appropriate data and eventually goes back to the idle.
S-Link packet preparation block
The second block in the processing entity is in charge of preparation of the S-Link packet. Figure 11 presents the diagram of the component. The input multiplexer chooses a data source for an embedded S-Link FIFO, which is capable to to store 4k of 65 bits words. The MSB of the data word from the FIFO is reserved for the UCTL(User control word) signal, according the S-Link 64 specification.
14 The flag is asserted by an Event Builder FSM accompanying a first and a last word of a frame. The rest of the bits is used as data.
The major multiplexer selects between a header, a trailer and three subframes sources related to the synchronization blocks. A part of a last word in the event frame is a TTS status. It is specified by a TTS decoder. If values of the internal Event counter and the Event counter from the TTCrx Receiver differ, the decoder indicates the out of synchronization TTS state. Otherwise, if the busy line from the Main FPGA is asserted, the decoder pass this information to the trailer (TTS state busy). Eventually, if any of previous condition is satisfied, the decoder remain in ready state. The header and the trailer use information from other components (like the TTCrx Receiver ) to fill appropriate data filed ( Table 1 ).
The configuration of the system allows for the skew between data from different GOH frames. For this reason, the Event Builder FSM needs the feedback information, if any subframe is already received. To obtain it, a negation of the empty flag from each basic cell of the synchronization block within a one subframe is compared with the respective Fiber active bit. As transmission of the GOH frame is continuous and both the synchronization and the S-Link preparation block work with the same clock, as soon as all synchronization FIFOs, that corresponds to the enabled channel within one subframe get first word from the GX receiver, the assigned Subframe ready flag is asserted and the FSM starts to copy the data to the S-Link FIFO.
Habitually, the word counter is used to count a number of word in a current event. Its values are stored in a Event Size Fifo capable to buffer 16 results (10 bits each). They are used by the Readout FSM, which is a master of the local implementation of the S-Link 64 interface. 15 The signals driving internal components are generated with Mealy's approach. The default state is the idle. The FSM checks if the Event Size Fifo is not empty. If there are any data in the FIFO, the read strobe of an internal counter is asserted and the FSM goes to loadCounter. In loadCounter the value from the Event Size Fifo is loaded to the counter and on a next rising edge of the clock, the state machine moves to the transfer. Back to the idle state is made when mentioned counter countdowns to 1. Full Flag and the LDOWN(Link Down) , the FSM can move from the transfer to the pause state if either of the signals is low. As soon as their back high, the state machine resumes the data transfer. It allows to satisfy the requirement of the specification, 14 to suspend a transfer at most after two clock cycles (this implementation needs one) since the LFF has been asserted. The state machine preparing the S-Link packet and storing it the FIFO is quite more complicated ( Figure 13 ). It uses Mealy's approach. At the beginning, in the idle state, it waits for any subframe. However, subframes have priorities. The most privileged is the number one. If only the Subframe 1 ready flag is asserted and the S-link FIFO is not full, the FSM drives the read signal of the first four synchronization queues, writes header to the S-Link FIFO, starts counting a number of words and goes to the subframe1 state. If condition is not satisfied, the FSM checks the Subframe 2 ready flag. However, to make any other actions it needs the anyFiberActive(1) signal -logical 'or ' function of the first four denied fiber active. This is to protect the system against scattering data from the same physical collision in several frame events. It means, that even for asserted the Subframe 2 ready flag, if any of the first four fibers has been enabled by the user and a FPGA has managed to get a synchronization for this specific channels, the state machine will wait for data from them. The respective situation occurs for the subframe3 condition (Figure 13) , anyhow, the FSM takes into account the anyFiberActive signal related to first eight fibers.
To accomplish correct reaction at the LFF (Link
The list of actions performed by the FSM in the subrame1, the subrame2 and the subrame3 states is similar.
It writes subsequent data from the appropriate synchronization FIFO into the S-Link FIFO, counting words and waiting until the MSBs (which indicates the end of frame) of the source queues fits to the pattern of the fiber active vector related to the subframe. Rooting the MSBs to the read port of its synchronization FIFOs resulting a flexible solution allowing to store GOH frames with different size (for example because of an error) within the same subframe. As soon as the FSM completes data copying of a one subframe it checks the subFrameReady flags for the rest maintaining the priority. If any other is ready it asserts the appropriate read flags and goes to the specific state. If not, the state machine moves to the trailer state. Here, a trailer is written to the S-Link FIFO with the event and the bunch counter value. A number of words is stored in the event size queue. The FSM asserts the read acknowledge strobe for the TTCrxReceiver and goes back to the idle state. Figure 14) are added by the S-Link preparation block in the OptoRx and contain information needed by the software to manage data. Since the OptoRx handle the readout of 192 VFATs (16 VFATs per GOH) a simple solution is to format data in three data subframes, each containing 64 VFATs. Every subframe is tagged by a header and a trailer of the GOH frame added by the synchronization blocks(marked in yellow) to identify a source of data. The VFAT payload (marked in green) consist of 192 serial bits. Table 1 describes data fields used in Figure 14 As each VME crate has assigned an independent computer which collects data and masters the VME bus, to reducem software development effort, the the TOTFed is polled by the PC to get data. To obtain the continuous and permanent readout, the data path on the TOTFed has several levels of buffering composed of FIFOs using an embedded memory in the FPGAs. It allows to work with the burst trigger, where average frequency of incoming events is 1 kHz with a Poisson distribution, so few of frames can be very close to each other. Table 2 presents the following queues depth on the processing path.
Data processing
The number of the events buffered at once by the TOTFed is limited by depth of the used event size FIFOs as well. Considering this, for the S-Link Packet buffer from the OptoRx, the maximum number of events is 16 and, whilst the Data Fifo from the Main limits this value to 256. However, achieve this level means the a problem of too short data frames and some error in the Data Acquisition.
SUMMARY
The on-line part of the DAQ of the TOTEM experiment, at first stage of commissioning, should be able to work with the 1 kHz trigger. The processing part of the data path is pipelined and needs at most 14.6 us to build a Table 3 . Average speed of the DEBUG COUNTER readout.
To check the theoretical speed under laboratory conditions, the author performed a test by reading in the VME block transfer mode(1000 bytes per one block) value for the DEBUG COUNTER of each FPGA. The obtained results are presented in Table 3 . Furthermore, as counter changes its value with each access, it was a simple way to check the data consistency.
To evaluate if the speed requirement is satisfied, one should take in account the results related to the Mains -the software to collect data reads FIFOs of this devices. The test confirmed that designed solution meets the bandwidth limit. Figure 15 presents the average speed of the Roman Pot motherboard 16 readout. The test was performed under laboratory conditions using the VBT-325C VME Bus Analyzer 17 to measure the performance. The obtained result proves the values from the previous test. In the background, the OptoRx frame beginning together with the header can be seen. The TOTEM has started the regular runs on March 2010 with the center of the mass energy of 7 TeV per collision. For the moment, it is hard to evaluate the performance of the front-end driver in the LHC experiment because of a very low trigger frequency (at most 100 Hz) -the TOTEM experiment is still in the commissioning phase and the T1 telescope is not yet installed in the LHC tunnel., albeit data are consistent. Figure 16 presents one of the first tracks observed in the Roman Pot detector. Each trapezium is related to the silicon detector of the Roman Pot placed in the vicinity of the beam. Blue lines represent the tracks of particles after the collision. 
