A C C E P T E D : J a n u a r y 1 1 , 2 0 0 7 P U B L I S H E D : J a n u a r y 2 2 , 2 0 0 7
Overview
The Global Trigger (GT) is the final step of the CMS Level-1 Trigger [1] , [2] and [3] . It consists of several VME boards mounted in a VME9U crate together with the Global Muon Trigger boards (GMT) and the central Trigger Control System (TCS) [3] , [4] and [5] .
The rightmost 4 VME slots (21-18) contain the GMT boards, slot 17 the Global Trigger Front-End (GTFE) readout board, and slot 16 the TIM board, which broadcasts the clock and fast control signals to all boards. In slots 15-13 Pipelined Synchronized Buffer (PSB) boards receive trigger data from the Global Calorimeter Trigger. Slot 12 is free and in slot 11 the GTL Logic module calculates up to 128 Algorithms, which are combined to Final_OR signals on the FDL board in slot 10. In slot 9 a PSB board receives 'Technical Trigger' signals (see below) and sends them to the FDL board. The Final_OR signals go to the central Trigger Control board (TCS) in slot 8 and are transmitted via 'L1AOUT' output modules in slot 6 and 7 to the TTC system of CMS (see also figure 1 ). Over the same path, the TCS also sends out fast control signals to the subdetectors: bunch counter reset (at start of new LHC orbit), start of run, stop of run, orbit counter reset, event counter reset, resynchronize (to resynchronize subdetectors in case of an error) etc.
For every LHC bunch crossing the GT decides to reject or to accept a physics event for subsequent evaluation by the High Level Trigger. During normal physics data taking the decision is based on trigger objects, which contain information about energy or momentum, location and quality. In addition special trigger signals -so-called Technical Triggersdelivered by the subsystems can also be used. Such triggers may, for instance, be based on cosmic muons, and simply enter the final trigger menu without being processed in the trigger logic. The trigger objects are received from the Global Calorimeter Trigger (GCT) and the Global Muon Trigger (GMT). The input data coming from these subsystems are first synchronized to each other and to the LHC orbit and then sent via the crate backplane to the Global Trigger Logic module, where the trigger algorithm calculations are performed. For each quadruplet of "particle-like" input channels (4µ, 4 non-isolated and 4 isolated e/ γ , 4 central and 4 forward jets, 4 τ -jets) Particle Conditions are applied. A condition for a group of up to 4 particles of the same type may require that E T or p T is above a threshold, that the particles are within a selected window in η or in φ or that the absolute difference in η or/and φ between two particles is within a required range. In addition, so-called 'Delta Conditions' can calculate relations in η and φ between two particles of different kinds. Conditions can also be applied to the trigger objects total E T , missing E T and H T , the sum of the transverse energies of the highest-E T jets. There is also a possibility to trigger on jet multiplicities.
Several Particle and Delta Conditions are then combined by a simple combinatorial logic (AND-OR-NOT) to form Algorithms. Of course, each Particle Condition bit can be used either as a trigger or as a veto condition. Each of the 128 possible algorithms applied during a given data taking period represents a complete physics trigger requirement and is monitored by a rate counter. As a last step, the Algorithms are combined by a final OR function to generate an 'L1_Accept' signal that starts the Data Acquisition System and the Higher Level Trigger software. All Algorithms can be prescaled to limit the overall Level-1 trigger rate. Eight final ORs are provided in parallel to operate subdetectors independently for tests and calibration.
In case of a readout request ('L1A' signal) the Global Trigger is read out like any other subsystem. The L1A signals arrive via the TTC network and are broadcast by the Timing board (TIM) to all other boards, including those of the Global Muon Trigger, where the arrival time of the L1A signal is translated into the corresponding Ring Buffer address. On each board a Readout Processor circuit extracts data from the Ring Buffers, adds format and synchronization words and sends the event record to a readout module, the Global Trigger Front-end board (GTFE). The incoming data are checked there, combined with GMT data to a GMT-GT event record and sent via an S-Link64 interface [7] to the CMS Data Acquisition. 
JINST 2 P01006
-3 -
Synchronisation of input signals
The Global Calorimeter Trigger (GCT) sends calorimeter trigger objects over fast 1.28 Gbps serial links to three PSB input boards.
A PSB board contains four Infiniband connectors [14] each receiving two serial data streams. A DS92LV16 Serializer/Deserializer chip from National Semiconductor [6] converts a serial data stream back to 16 bit parallel data at 80 MHz, corresponding to 32 trigger bits at the CMS running frequency of 40 MHz. Thus, one Infiniband connector receives 64 trigger bits per bunch crossing, containing data of one 'quadruplet' as mentioned above. In total, one PSB board transmits the data of 4 quadruplets to the Logic Board (GTL) via the backplane. Two PSB boards are required to receive from the calorimeter trigger system the 5 quadruplets mentioned above, plus the energy summary information and jet multiplicity numbers, each of them also combined to 64-bit groups per bunch crossing. The last 64 bit group is used to receive data from the TOTEM detector as explained below. A third PSB board is foreseen for future upgrades.
As the precise arrival time of the data bits is unknown, the SYNC chip on the PSB board first samples the input bits 4 times per 12.5 ns tick to find the switching point of the input data. Normally the sample furthest away from the switching time is selected and transmitted [2] . Then the SYNC chip delays the trigger data for a programmable time and sends the data as 80-MHz GTL+ signals over the backplane to the GTL board.
Phase selection and delay adjustment is done separately for each 16-bit stream to compensate for any time skew between cables and link chips. The SYNC chip also writes the input data into Ring Buffers and, in parallel, into SPY memories. The Ring Buffers keep data for some time until a L1A signal arrives. Then the Readout Processor (ROP) moves data belonging to the L1A signal from the Ring Buffer into a Derandomizing Memory and transfers them embedded in a formatted record to the GTFE board.
A counter provides the write address for the Ring Buffer and is reset by the BCRES (bunch-counter reset) signal, which is common for all subdetectors of the experiment. The Ring Buffer has been synchronized correctly to the LHC orbit when the first data word of the first bunch crossing is written into the first memory address. The synchronization procedure uses an 8k SPY/SIM memory running in parallel to the Ring Buffer, to store data of a full LHC orbit. A software program starts this memory to acquire data of a number of orbits and plots the number 
JINST 2 P01006
-4 -of trigger data received over bunch crossings to allow comparison with the LHC gap structure. Thus one can check if the data of the first bunch crossing were really written into the first address. If not, the delay for BCRES has to be adjusted accordingly. In addition, BC0-data (data of bunch crossing zero) are flagged by a special sequence in bit 15 of the trigger objects.
During data acquisition a local monitoring program (which runs not on the data acquisition system but on a separate computer) can force the SPY memory to run continuously and to stop in case of an L1A signal to check the history of the input data.
In test mode, software can load the SPY/SIM memory with test data or simulated input data to send them instead of real data.
A PSB module may accept up to 64 parallel LVDS input signals via RJ45 connectors at 40 MHz frequency instead of using the (serial) data arriving over one of the Infiniband cables. This choice is made by setting a programmable register over VME. Up to 16 bits are reserved for trigger signals of the TOTEM detector to include it into the CMS data acquisition. The parallel data are sampled 4 times per bunch crossing to synchronize them to the local clock signal. Then they are interlaced into an 80-MHz data stream and transmitted and monitored instead of one of the quadruplets. The synchronization circuit exists for each group of 4 parallel input bits.
One dedicated PSB module receives Technical trigger bits as parallel LVDS data and sends them directly to the Final_OR circuit in the Final Decision Logic board (FDL).
Trigger logic
On the 'Global Trigger Logic' board (GTL) the three programmable receiver chips accept the 80-MHz trigger data and distribute them to two Condition Chips (COND). Each Condition chip receives all input data, converts them to 40-MHz objects, applies Trigger Conditions and combines the results to up to 64 Algorithms. The Algorithm bits are sent as parallel signals via short flat cables to the Final Decision Logic board (FDL) located in the adjacent slot.
As each COND chip receives all trigger bits, all kinds of logical relations between the trigger data could be implemented. Only latency requirements and chip resources restrict the 
JINST 2 P01006
-5 -number and type of triggers. If necessary, resources could be increased by replacing the Stratix chip EP1S40 by EP1S60 from Altera [8] .
Algorithms and conditions
To implement the Algorithm logic, small predefined VHDL modules are used to compose more complex trigger requirements. 'Single Particle Templates' and 'Correlation Templates' were defined for 'particle' groups (muons, electron/gamma showers, jets). A Single Particle Template (SPT) compares p T or E T against thresholds and checks if the particle is inside an η and/or φ window. For muons the required Isolation-, MIP-and Quality bits are checked in addition, and another p T threshold can be set for isolated muons. An isolation bit is set when the muon comes from a region where the transverse energy deposition in the calorimeter is below a certain threshold. The MIP bit is set when the muon comes from a region where the energy deposited in the calorimeter corresponds to that of a Minimum Ionizing Particle (MIP). This information is sent from the Calorimeter Trigger to the Global Muon Trigger. Quality bits are set according to the number of redundant muon systems where the muon was detected.
A Correlation Template (CT) compares the differences |∆η| and |∆φ| between two particles of the same type against thresholds and checks the charge bits for muons.
To make a 'Condition', the required SPT is instantiated four times to apply them to all four 'particles' and -if asked for -also the CT is instantiated as illustrated in figure 4 . Then the results go to a combinatorial logic circuit to find 'n out of 4' particles fulfilling the requirements set by the SPTs and CT. Four Conditions types for each 'particle' group are defined: 1s … to find one particle out of 4 2s … to find two particles out of 4 2wsc … to find two particles out of 4, correlated in η and φ 4s … to find four particles out of 4 When three objects are required for a particular algorithm, the unused sub-condition is set to trivial values (e.g.: E T =0 GeV, 0 ° < φ < 360° etc.).
Conditions for the total transverse energy, the hadron transverse energy, the missing transverse energy and 12 numbers of jets above different thresholds consist only of comparators. As a last step the Condition bits are combined by a simple combinatorial logic to form a trigger Algorithm. All Condition bits can be used either as trigger or as veto bits. Single particle thr1, η, φ η, φ η, φ η, φ window1
Single particle thr1, η, φ η, φ η, φ η, φ window1
Single particle thr1, η, φ η, φ η, φ η, φ window1 ieg1 ieg2 ieg3 ieg4
Single particle thr2, η, φ η, φ η, φ η, φ window2 Figure 4 . Algorithm composed by conditions.
JINST 2 P01006
-6 -To run the trigger Algorithms, the p T or E T thresholds of existing Conditions are loaded into registers using VMEbus instructions.
When designing a new trigger setup, first the Algorithms and Conditions are defined with a Java program that runs on all machines. Its output file uses the XML (Extensible Markup Language) format and is used by a C++ program that generates the variable VHDL files for the new conditions and algorithms. The new VHDL files are merged with the fixed code and used by the "Quartus" software (from Altera company) to generate a new firmware version. The new firmware must then be loaded to run the new trigger setup. Several firmware versions will be defined to handle data taking as well as calibration and testing periods.
Final_OR logic
The 'Final Decision Logic' board (FDL) receives 128 ALGORITHM bits from the GTL board and 64 Technical Trigger bits from a dedicated PSB board. Rate counters monitor each trigger bit and pre-scalers reduce the average rate if required.
The CMS data acquisition system (DAQ) can be divided into 8 DAQ-partitions to test and calibrate parts of the readout and trigger electronics in parallel. Therefore the FDL board combines all or a subset of the Algorithm and Technical trigger bits to 8 Final_OR signals, one for each DAQ-partition, to trigger the DAQ-partitions independently from each other. Mask bits are used to include the Algorithm and Technical Trigger bits into the Final OR gates. For the Technical Trigger bits there exist also veto-mask bits to inhibit Final_OR signals. The Final_OR signals go to the central trigger control board (TCS) that forwards them -when allowed -as 'Level 1 Accept' (L1A) signals to the front-end electronics to read the data of the bunch crossing that has generated the trigger signal as well as the data of the specified number of bunch crossings before and after.
The FDL board can be read out like any other front-end electronics module. It also contains Ring Buffer memories, which store all trigger bits. When an L1A arrives, a Readout Processor (ROP) copies data of the correct bunch crossing into a Derandomizing Buffer, embeds them into a formatted record and sends the record via a Channel Link interface [6] and the backplane to the Global Trigger readout (GTFE) board.
As on the other boards, so-called SIM/SPY memories allow either to spy all Algorithm-, Technical Trigger-and Final_OR bits or insert simulated bits for tests. In spy mode the SIM/SPY memories run in parallel to the Ring Buffer so that latency and synchronization to the LHC orbit can be checked and adjusted.
Data acquisition
The FDL and the PSB input boards move all trigger input data into Ring Buffers to keep them until an L1Accept signal arrives. The Ring Buffers are implemented as dual port memories inside the FPGA chips and accept the data of one full orbit. On the one side a constant write enable signal writes the trigger data of every bunch crossing into the memory. At the end of an orbit the write address returns to the first location, overwriting old data but keeping the history until a L1A signal arrives after the local latency. The local latency is the time between trigger data passing through and the time when the L1A generated by these data returns.
A delayed BCRES signal resets the counter that provides the write address to write data synchronously to the LHC orbit into the Ring Buffers. To adjust the BCRES delay correctly, the software can read SPY memories running in parallel to the Ring buffers to see if the data of the first bunch crossing in an LHC orbit were written into the first memory word. On the other side of the Ring Buffers a counter provides the read address which lags behind the write address by the amount of the local latency minus 1 bunch crossing (BC), so that a L1A signal starts to read data words from the BC before the one that has generated this L1A signal. The L1A signal is extended to 3 BC and is applied as a read signal to the Ring Buffers and at the same time as a write-enable signal to the Derandomizing Buffer FIFOs to move data of 3 bunch crossings per event from the Ring Buffers into the Derandomizing Buffer FIFOs. When for debugging purposes 5 BC per event are read, the reset signal for the read counter is delayed by 1 BC less and the L1A signal is extended to 5 BC.
A readout processor (ROP) designed as a state machine reads the Derandomizing Buffer FIFOs' data of one event, wraps them with format words and creates an event record. The ROP is located either in the same chip or in a ROP chip if the board contains multiple FPGAs.
First the ROP sends a 'read FIFO' command to all Derandomizing Buffer FIFOs on the board or in the chip, respectively, to store all data words and their BC-numbers in registers. Then it sends the format words (24-bit event number, board identifier etc.) to the Channel Link [6] , fetches one 16-bit word after the other from the FIFO registers and transmits them also to the Channel Link. The ROP sends then the next 'read FIFO' command to the FIFOs and repeats this procedure for the next two BC data. Finally the ROP sends an 'End_of_Record' word to the Channel Link and switches then to an 'IDLE' code to keep the link alive.
Readout board (GTFE)
The Global Trigger Readout Board (GTFE = Global Trigger Front End) receives event records sent as Channel Link data via the backplane from the boards in the crate. The readout processor chip (ROP_DAQ) receives event records from the GMT, the FDL, the TCS and all PSB boards and sends them as a standard Global Trigger event via an SLINK64 mezzanine board [7] to a Readout Unit (RU) of the CMS data acquisition system [9] . The ROP_EVM chip uses identical control logic and receives event records from the TCS and FDL boards, adds GPS time and 
JINST 2 P01006
-8 -other LHC-beam information [11] and sends the compiled record via a second SLINK64 to the Event Manager [9] of the CMS data acquisition system, which manages the event flow in the Readout Unit Builder where event fragments are combined to complete events.
The GPS time and other information (Turn Count Number, Machine Mode, LHC fill number etc.) from the LHC Beam Control are received via an optical fiber and a TTCrq mezzanine board [12] from the LHC Beam Synchronous Timing System (BST) [13] .
The firmware is implemented in Xilinx Virtex-2 FPGA chips mounted on mezzanine boards [10] as shown in figure 5 .
The GMT and the GT boards send the event records via 28-bit Channel Links [6] to the readout processing chips on the GTFE board. Bits 15-0 carry trigger data going into the input FIFOs, bits 23-16 could carry private monitoring data going into separate Monitoring memories, and bits 27-24 carry control bits going to the control logic that detects the begin and end of records. As long as IDLE data words are being received, the input FIFOs remain inactive, but as soon as data with valid control bits start arriving the input FIFOs store the data until an "End_of_Record" word has been received.
The FIFOs are configured so that the output width is 4 times the input width, reordering the 16-bit trigger data into 64-bit words for the SLINK64 and thus replacing a 4-to-1 multiplexer. The FIFOs can keep more than 20 events, are written at 40 MHz and are read using an 80-MHz clock. When the FIFOs become 75% full, a 'Warning' flag is sent to the Trigger Control board to reduce the trigger rate. When the first active FIFO has received an 'End-ofRecord' word, a Crate Readout Controller (Crate-ROC) implemented as a state machine applies the standard HEADER word to the SLINK64. It then reads the event data of the first input FIFO and sends them also to the SLINK64. Then it reads the event data of next active input FIFO and continues until the last one. To check if the Crate-ROC is going to combine data of the same L1A trigger signal, the common signals L1A, BCRES and Event Counter Reset are used to create for every event a reference Event number and BC-number. A comparator circuit checks if the number of events sent by the GMT and GT boards since the last 'Event Number Reset' signal agrees with the reference number. Any wrong difference causes an error bit to be inserted into the EVENT_STATUS byte of the event record. However, the event transmission continues until the end.
Finally, the Crate ROC appends as last word the Event Status, the updated CRC (cyclic redundancy check) number and the Event length. During the transmission the CRC and Event status are updated but the Event length is preloaded via VME. This is because it is constant and depends only on the number of bunch crossings per event and on the number of boards which contribute data.
The Crate-ROC transmits data as long as there are records in the FIFOs and as long as the SLINK64 is ready. When the SLINK64 returns a 'full' flag, the Crate-ROC simply waits until the SLINK64 becomes ready again. When the off-time is too long, the board FIFOs could overflow. So, when the 75% level is reached, the Crate's ROC sends a flag to the central trigger control system either to reduce the trigger rate or to inhibit triggers altogether. When running with an 80-MHz clock, a normal GT/GMT event (200 words of 64 bits) is transmitted within 2.5 µs. Even when running only at 40 MHz, the event would be transferred within 5.0 µs, still exceeding the required 100-kHz average event rate.
When the SLINK64 for the Event Manager becomes unable to receive events, the EVM_ROP chip sends a status signal directly to the TCS board to stop all DAQ-partitions.
-9 -In both ROP chips a dual port memory spies all event data which are sent to the SLINK64. The SPY memory can also be used to insert test data instead of readout data to test the reliability of the SLINK64. The other side of the SPY memory is accessed by VME-software.
Summary
The Global Trigger boards have been built, and all except the GTFE readout board are being integrated into CMS. Four of the PSB boards have been tested and production of the others has started. The complete trigger chain has been tested with cosmic muon data and the Global Trigger is functioning according to specification. 
List of acronyms

