Abstract-The LHCb experiment is currently being installed at the Large Hadron Collider at CERN (Geneva, Switzerland). In order to reduce the amount of data storage for offline analysis, a trigger system is required. The Level-0 Decision Unit (LODU) is the central part of the first trigger level. It is a full custom 16 layers board using advanced FPGAs in BGA package. The LODU receives information from the Level-0 sub-triggers (432 bits @ 80 MHz) which transmit the data via high speed optical links running at 1.6 Gb/s. The processing is implemented using a 40 MHz synchronous pipelined architecture. It performs a simple physical algorithm to compute at 40 MHz the Level-0 trigger decision in order to reduce the data flow down to 1 MHz for the next trigger level. The internal design of the processing FPGA is mainly composed by a Partial Data Processing (PDP) and a Trigger Definition Unit (TDU). The aim of the PDP is to adjust the clock phase, perform the time alignment, prepare the data for the TDU and monitor the data processing. The TDU is flexible and allows to fully re-configure all the trigger conditions without any re-programming the FPGAs through the Experiment Control System (ECS).
Abstract-The LHCb experiment is currently being installed at the Large Hadron Collider at CERN (Geneva, Switzerland). In order to reduce the amount of data storage for offline analysis, a trigger system is required. The Level-0 Decision Unit (LODU) is the central part of the first trigger level. It is a full custom 16 layers board using advanced FPGAs in BGA package. The LODU receives information from the Level-0 sub-triggers (432 bits @ 80 MHz) which transmit the data via high speed optical links running at 1.6 Gb/s. The processing is implemented using a 40 MHz synchronous pipelined architecture. It performs a simple physical algorithm to compute at 40 MHz the Level-0 trigger decision in order to reduce the data flow down to 1 MHz for the next trigger level. The internal design of the processing FPGA is mainly composed by a Partial Data Processing (PDP) and a Trigger Definition Unit (TDU). The aim of the PDP is to adjust the clock phase, perform the time alignment, prepare the data for the TDU and monitor the data processing. The TDU is flexible and allows to fully re-configure all the trigger conditions without any re-programming the FPGAs through the Experiment Control System (ECS).
I. INTRODUCTION
The Large Hadron Collider (LHC) is the next generation accelerator constructed in the tunnel of the LEP experiment at CERN (Organisation Europeenne de la Recherche Nucleaire) where CP violation study will be done. Proton-proton (pp) collision will occure at a rate of 40 MHz at four interaction point where particle detectors are located.
A. The LHCb spectrometer
The LHCb experiment [1] is designed to study CP violation and other rare phenomena using the production of hadron with b quarks. The geometry of the LHCb detector is specific, it is a single arm spectrometer. The detector layout after the LHCb optimization [2] is given in Fig.l . It consists of the Vertex Locator (VELO) [3] , the Trigger Tracker (TT) [4] [5] , the dipole magnetic [6] , two Ring Imaging Cherenkov detector (Richl, Rich2) [7] , three tracking stations (T1-T3), the Calorimeter system [8] and the Muon system [9] . To avoid major civil construction the detector has been adapted to the existing experimental hall used by the DELPHI experiment of the LEP era. LHCb is 20 m long and 10 m wide.
The LHCb experiment is designed to operate at an average luminosity of 2x 10-32cm-2S 1, much lower than the maximum design luminosity of the LHC, which makes the radiation and Magali Magne damage more manageable. A further advantage is that at this luminosity the number of interaction per crossing is dominated by single interactions, which facilitates the triggering and the reconstruction by assuring low channel occupancy. Due to the LHC bunch structure and the low luminosity, the frequency of crossing with interaction visible by the spectrometer is about 10 MHz which has to be reduced by the trigger system to about 2 KHz, at which rate the events are written to storage for further offline analysis. It is possible to emulate the trigger from the data writen to storage, which will give an additional handle on trigger efficiencies and possible systematics.
B. The trigger system
The trigger system [10] is composed by two trigger levels as shown on the Fig.2 : the Level-0 and the High Level Trigger (HLT). The Level-0 is implemented in full custom electronics, while the HLT is executed on a farm of commodity processors. The information is collected by the Level-0 Decision Unit (LODU) to select events. Events can be rejected based on global event variables such as charged track multiplicities and the number of interactions reconstructed by the Pile-Up system. It allows to ensure that the selection is based on bsignatures and these events will not occupy a disproportional fraction of the data-flow bandwidth.
All the Level-O triggers are fully synchronous. The latency does not depend upon the occupancy nor history. All Level-0 electronics is implemented in full custom boards.
The purpose of the HLT is to reduce the rate down to 2 KHz by using data of all subdetector. The measurement aimed at LHCb requires a very high precision: hence systemic error must be mastered to a very high degree. Amongst the 2 KHz of HLT accepted events, a large fraction is dedicated to a very precise calibration and monitoring of the detector and its capabilities.
The generic HLT algorithm refines candidates found by the Level-0 trigger. It is divided in four independent alleys: one for muons, one for muons and hadron close to each other, one for hadrons and one for electrons, iy and 70. Alleys chosen are steered by the Level-0 decision. Each alley consists of four main steps: * Level-0 confirmation; * fast tracking using VELO and TT information to localize primary vertexes and to find tracks with a large PT and a large impact parameter; * more precise measurement of the PT using the TT stations for tracks found previously and look for secondary vertexes;
* selection criteria specific to the alley. The Level-0 Decision Unit collects all information from Level-0 components to form the Level-0 Trigger decision.
The latency of the Level-0, which is the time elapsed between a pp interaction and the arrival of the Level-0 tigger decision at the front-end electronics is fixed to 4 ,us [11] . This time includes the time-of-flight, cable lenght and all delays in the front-end electronics, leaving 2 ps for the processing of the data in the Level-0 trigger to derive a decision.
The Level-0 Trigger provides a decision for each bunchcrossing with a fixed latency. Therefore, the architecture is pipelined and massively parallel.
The purpose of the Level-0 Decision Unit (LODU) is to compute the LO trigger decision by using the information from the LO sub-triggers in order to reduce the data flow from 40 MHz down to 1 MHz for the next trigger level.
For that purpose, the LODU receives an event summary from the LO calorimeter selection board, the LO Muon trigger processor and the LO Pile-Up system, each with its own latency, at 40 MHz. Then, a physics algorithm is applied to select events and delivers the LODU trigger decision to the Readout Supervisor (ODIN) [12] which takes the ultimate decision (LOAccept) to accept or not the event. The LODU trigger decision is encoded in a 16 bit explanation word (RSDA). At each event, a "LO-Block data" is constructed and sent to the High Level Trigger (HLT) when the LODU has received a LOAccept signal coming from ODIN via the Timing and Fast Control system (TFC).
A strong emphasis has been put on flexility in order to have the possibility to configure different algorithms through the Experiment Control System (ECS) with the same programmed architecture. Special triggers can be implemented with specific arithmetic and logical computations. Downscaling (accepted rate of a trigger channel) of the LO trigger decisions, changing conditions and parameters of the decision (algorithm, threshold, downscaling factors, ...) are possible. Monitoring of performance and statistic analyses is done by the hardware and the software. The motive of the decision is coded in an explanation word (LODUrpt) in the LOBlock data and is sent to the Data Acquisition (DAQ).
Special care of the good running and debugging of this unit is taken. Thus, in addition an internal test bench based on pre-synthetized RAM is implemented and allows to check the behaviour of the LODU at the LHCb pit. A dedicated test bench has been also developed to stress the LODU and test the reliability of the system. It is based on an optical pattern generator board able to send test pattern during 18 LHC cycles.
B. Input and output data
The input data are made up of several "candidates" that corresponds to the few highest transverse energy or momentum (ET, PT) detected particles. Thus, the calorimeter sends one electron, one photon, one neutral pion, one global neutral pion and the highest hadron as candidates. The muon detector sends the two highest muons for each quarter of the detector, while the vertex detector transmits the global condition to make a veto calculation (no trigger allowed in case of multiple collision). In addition, the total energy deposited in the calorimeter and the SPD detector occupancy are also received by the LODU. The data are sent on 32 bits words including a bunch crossing identification number allowing to time align the incoming data since each has its own latency.
A total of 864 bits (Table I ) at a frequency of 40 MHz is expected at input of the LODU. The total latency of the LO trigger is 4 ,us and the LODU latency budget is 500 ns counted from the latest arriving sub-trigger data. The LODU transmits the decision within a 16 bits word at 40 MHz to trigger distribution system ODIN. This word contains the decision, a bunch crossing identification number. An additional bit is used to ask for forced trigger for debugging and slow control purposes. Another bit allows timing tuning functions of the trigger distribution system. It is set when a given pattern of decision is detected over 5 events (the current event and the two events before and after): isolated decision or collision. The last bit is used as a status bit, and indicates to ODIN that the LODU decision could be erroneous.
C. I/O format
All the LO sub-triggers transmit their data via high speed optical links running at 1.6 Gb/s, 17 single optical fibers are expected from the LO sub-triggers and 7 spare optical links have been added. These 17 single optical fibers are connected to 2 patch panels. Each patch panel converts 12 single optical channels into a fiber ribbon which is connected to the LODU.
The LODU implements two optical transceiver HFBR-782BE from Agilent, each connected to a ribbon. Then, 24 TLK2501 from Texas Instrument are used to deserialize the data. A data word is seen as two words of 16-bits at 80 MHz on the reception side.
The decision word is sent to ODIN over a point-to-point 16-bit LVDS link using a 3M pack connector (34-pins) and a twisted pair ribbon cable with 17 pairs. D. LODU setup
The LODU decision unit is implemented as a plug-in module on the standard LHCb DAQ board (TELLI), (see Fig. 4 ). The main data flow does not use the TELLI board which consists in sending at 40 MHz the decision word to ODIN. The LODU uses the TELLI board for DAQ interface, for the ECS access, for the power supply and the JTAG chain access. 
E. Architecture
For each data source, the "Partial Data Processing" (PDP) system performs a specific part of the algorithm. It re-phases the data with the system clock, does the data time alignment and prepares the algorithm data. In this first step, the pipeline architecture is optimized for each LO sub-trigger data and implements pre-processing like searching the three highest PT muon. Then, the "Trigger Definition Unit" (TDU) processes the information to form a set of trigger conditions which are combined to compute the LODU decision (see Fig. 5 ).
Every trigger channels are ORed to obtain the LODU decision after an individual downscaling.
Then, the decision word (16 bits) is sent to ODIN. Each algorithm described for the LODU are based on multiple conditional choices based on physics criteria, which allows to select specific physics event. A trigger channel, like (1) and (2), is made of elementary condition composed by arithmetic and logical operators (+, -, >, <, =, :4). The global decision is obtained by making the logical OR between all defined trigger channels.
B. Flexible architecture
The principle is to use pre-synthesized logic cells selected by a switching matrix and a Programmable Logic Device (PLD) structure. The flexible architecture allows to configure the inputs used in the decision algorithm, to select the logical and arithmetic operators, and to parametrize the threshold of the elementary conditions. A dedicated software with a Graphic User Interface (GUI) allows to configure and parametrize the decision algorithm.
The Partial Data Processing prepares the data for the decision algorithm, it extracts for each sub-trigger source (Calo, Muon and Pile-Up) the candidates and the global variables. Each one is duplicated in order to be used several times in different trigger channels with specific thresholds. In addition to the candidates and global variable available in the sub-trigger, the flexible architecture implements a data production module. It allows to produce localy new variables that can be used in the decision algorithm. An elementary data production bloc is composed by a switching matrix and a selectable arithmetic operator bloc or a logical operator. A data production bloc can produce either a new data or a single bit that indicate, for example, the presence of a candidate in a specific region of the detector.
The Trigger Definition Unit allows to define the decision algorithm. It is composed by a first step which is used to set the elementary condition by configuring the logical operator selection modul and parametrizing the threshold of each elementary condition. The second step forms the different trigger channel via the "AND" network. Then, the decision is obtained by making the logical "OR" between the trigger channel defined in the algorithm.
The architecture is not specific to a particular algorithm and is fully configurable without re-programming the FPGA. The structure and the way to compose step by step the algorithm introduce flexibility: new trigger channels can be added easily. Evolution are possible because unforeseen functionalities can be added with no modification of the global architecture but will need a re-programmation of the FPGA. 
IV. FUNCTIONALITIES A. Synchronisation
All the data coming from each sub-triger must be synchronized with the local TFC clock of the LODU. In fact, each Level-0 trigger send their data synchronized with the global clock delivered by the TFC system but through different path. The way to re-phase the incoming data with the local clock on the LODU is done by a dual port FIFO. The input data are writen with the clock extracted from each optical link and read with the LODU local clock.
B. Time alignment
The LODU receives the information from the Calorimeter, Muon and Pile-Up sub-triggers at 40 MHz, which arrive at different times. The computation of the decision must be done on time aligned data. A specific procedure is implemented and allows to determine and compensate the latency of each LO sub-trigger links.
C. Three highest PT muon searching
Other functionalities are also implemented on the LODU like the three highest PT muon searching. The muon trigger sends the LODU 8 PT muon candidates but only the three highest PT muon are used in the decision algorithm. This functionality is implemented on the PDP.
D. Debugging, monitoring and commissioning tools
The LODU implements a stand alone test mode based on internal memory synthesized in the FPGA. Test pattern can be injected to emulate the acquired data coming from the Level-0 trigger. The computation results in store in an internal memory and compare to the RAM memory which contains the expected results to make the diagnostic. All the internal memory can be used in a spy memory mode that allows to store the data coming from the Level-0 sub-trigger and the corresponding computation result of the LODU. The behaviour of the LODU can be checked by a software simulation.
Many control operations are implemented in each FPGA to monitor the link and the behaviour of the LODU: error detection mechanism, errors counters, time alignment monitoring, demultiplexing error detection and snooping mechanism. All the trigger channel and the global decision are monitored in order to tune the trigger rate of each trigger channel and of the global decision.
Other specific tools are also implemented for the commissioning of the LODU at the LHCb pit. It is mainly based on qualifying all the link connected to the LODU with a Bit Error Rate measurement (BER) and be able to test the DAQ path and the time alignment.
V. LODU BOARD A first prototype has been assembled and tested at the begining of the year 2002 [13] [14] . It was a simplified version that had neither ECS nor Timing Trigger Control connection and had a reduced number of inputs and outputs. The inputs and outputs were in LVDS format at 40 MHz via Ethernet CAT 5+ cables and RJ45 connectors.
Due the number of interconnexion between the LO subtriggers and the LODU, it has been decided to use optical links. Furthermore, it has been decided to implement the LODU as a TELLI mezzanine in order to benefit of the ECS and DAQ paths. A new prototype has been designed, and all the functionalities required for the LODU are now implemented.
A. Design
The processing of the LODU is done by two Stratix FPGA in BGA package due the high number of interconnexion between the optical design part of the LODU. Each, FPGA receives the data output bus of 12 TLK2501 deserializers. One of the two procesing FPGA centralizes the information coming from the LO sub-triggers and it is that FPGA which implements the definition of the LODU algorithm. A dedicated FPGA is implemented on the LODU to control the board. This control FPGA is connected to the TTC mezzanine, the ECS from the TELLI, an USB interface and the processing FPGA in order to deliver the fast control signals and the ECS.
The FPGA used for the processing are Stratix EPIS25-F1025-C7 in BGA package, and the control FPGA is a Stratix EPIS10-F780-C7 in BGA package. 
P:.
qQ Fig. 7 . Synoptic of the LODU board.
B. Board layout development The board layout (Fig.8 ) is very complex due to the high density and the high frequency of signals up to 1.6 GB/s (tr, tf: 110 ps) or 80 MHz (tr, tf: 1 ns). Special care has been taken to root the board by taking into account the simulation results (signal integrity checking fig.9 , reflexion, crosstalk and impedance line qualification) of each critical part of the layout:
. wires between the optical transceiver and the deserializers;
wires between the deserializers and the processing FPGA; . interconnexion bus between the processing FPGA; . clock network distribution. An example of simulation done with Specctra Quest by using the IBIS model of the TLK2501 and FPGA buffer is shown Fig. 9 . . keeping the trace as short as possible; . using controlled impedance; * keeping the trace identical between the differential signals to prevent signal skew; . power supply and ground plane.
C. The LODU PCB
The PCB is 16 layers board class 6 (lenght: 366.7 mm, width: 150 mm, depth: 2 mm). The optical design is integrated in only 32% of the global area of the board (lenght: 200 mm, width: 100 mm). 
D. LODU processing
The first part of the internal design of the processing FPGA is composed by an optical module that allows to resynchronise the data to the local system clock and converts for each optical link the input data flow 16 bits at 80 MHz into 32 bits at 40 MHz. This function implements errors synchronisation detection and the demultiplexing data.
Then, a specific step is dedicated to the latency compensation and time alignment between the different sub-trigger data. Data production, data formating and internal patch panel are introduced to prepare and to select the data that will be used in the Trigger Definition Unit.
Thresholds are applied on data to constitute the elementary condition that will be used in algorithm. Each elementary condition are combined, by a AND network, to form the trigger channel that are after downscaled. The decision of the LODU is taken by applying a OR network between the downscaled trigger channel.
E. ECS and Software
The actual prototype allows to access to the LODU through the 12C bus of the embedde PC of the TELLI board. A software control system for LODU is being designed to be integrated in the ECS architecture of the experiment. According to the LHCb rule, the software interface in based on the PVSS and use the JCOP control framework for the integration of the hardware. The role of the software consists in sending commands and settings to the LODU system and reading back information: configuration and parametrization the LO decisions processing, online monitoring of the LODU behaviour and debug operations.
VI. LODU TEST BENCH
The test of the LODU and the test bench are also complex than the unit itself. A Specific pattern injection (GPL) has been developed to be able to emulate the LO sub-triggers and to characterize the links used. This board allows to emulate physics run and failure like loss of synchronisation of the optical link or erroneous data.
A. GPL functional description
The GPL board allows to send test pattern on 24 optical fiber running at 1.6 GB/s. The pattern can be either a fixed pattern, a counter or a RAM content. The board is also able to acquire the LO decision word sending at 40 MHz. The GPL registers and GPL RAM can be accessed with an USB interface or an ECS access through the LODU plugged on the TELLI board.
The GPL is compatible with the TTC system for synchronisation and with the standard LHCb crate and can be installed in the pit.
B. GPL board design
The GPL board (see Fig. 11 ) relies on three Stratix FPGA, two are used for the processing and one is used for the control of the board. The GPL implements 12 external memories able to strore test pattern during 18 LHC cycle, 2 clock networks and two optical channels. The GPL board has an emulated ODIN input and an USB interface for the control of the board. Special cares have been taken concerning the clock network, the jitter filtering and the board place and route.
C. Jitter budget and qualification
The deserializers need a clock reference with a jitter less than 40 ps. A special care has been taken concerning the jitter budget of the clock network and particulary with the clock network with the additionnal delay chip which has required to filter the jitter at its output. Each clock reference of each deserializers have been qualify in order to verify if the jitter was in the specification (18 ps < jitter < 30 ps), Fig.12 . 
D. Eye diagramm measurement
To qualify the optical design and the optical link, the effect of the optical attenuation on the quality of the tranmission has been measured by using 6 dB and 9 dB of attenuation, (Fig. 13 and Table II) . Fig. 11 . The GPL board.
The GPL board is a 16 layers board class 6 with dual characteristics to the LODU.
E. Test bench set up
The test set up (Fig.14) is composed by the GPL, the LODU, the TTC system for clock distribution and 2 PC for the slow control. One PC is used to control the GPL board via the USB, and the other one is used to control the LODU via an Ethernet cable and make the data acquisition via a GiGabit Ethernet board. 
VII. TEST AND RESULTS

A. Link qualification
All the interfaces and the functionalities of the LODU have been tested and validated: DAQ interface, ECS and JTAG access, ODIN link and all the optical link. The Bit Error Rate (BER) of each optical link has been qualify and is below 10-12 as required for the LHCb experiment. The ODIN link has been also qualified with a BER below 10-12.
B. Processing
Differents types of algorithm have been tested and implemented on the board. This prototype, closed to the final version, allows to evaluate the resources and the limation of the board.
C. Final version of the LODU
The limation of LODU the prototype was due to the logical resources of the FPGA implemented on the board. The final version implements FPGA with more logic resources with the same package, EPIS60F1020. It allows:
. to add some interconnexion between the two processing FPGA; . to add spare connexion with the TELLI board; . to increase the dynamic of the monitoring counter; . to increase the possibility of the flexible architecture. VIII. CONCLUSION The PCB, the interfaces and the functionalities of the LODU board have been tested and validated either at the laboratory by using the dedicated test bench or at the LHCb pit (Fig.16 ) during the commissioning of the LHCb detector. The LODU provides a new tool to the particle physics. Its flexible architecture allows to select different types of events with the same programmed architecture. The board layout is very complex due to the high frequency and the high density of signals. Many Specctra Quest simulations have been done before manufacturing and allow to make a such design without any hardware failure. The board is being used for the LHCb detector commissioning phase and the LODU system is operational for the engineering run. 
