Level-1 jet trigger hardware for the ALICE electromagnetic calorimeter
  at LHC by Bourrion, O. et al.
Preprint typeset in JINST style - HYPER VERSION
Level-1 jet trigger hardware for the ALICE
electromagnetic calorimeter at LHC
O. Bourriona∗, R. Guernanea, B. Boyera, J.L. Boulya and G. Marcottea
aLaboratoire de Physique Subatomique et de Cosmologie,
Université Joseph Fourier Grenoble 1,
CNRS/IN2P3, Institut Polytechnique de Grenoble,
53, rue des Martyrs, Grenoble, France
E-mail: olivier.bourrion@lpsc.in2p3.fr
ABSTRACT: The ALICE experiment at the LHC is equipped with an electromagnetic calorimeter
(EMCal) designed to enhance its capabilities for jet measurement. In addition, the EMCal enables
triggering on high energy jets. Based on the previous development made for the Photon Spec-
trometer (PHOS) level-0 trigger, a specific electronic upgrade was designed in order to allow fast
triggering on high energy jets (level-1). This development was made possible by using the latest
generation of FPGAs which can deal with the instantaneous incoming data rate of 26 Gbit/s and
process it in less than 4 µs.
KEYWORDS: L1 trigger; EMCAL; ALICE.
∗Corresponding author.
ar
X
iv
:1
01
0.
26
70
v3
  [
ph
ys
ics
.in
s-d
et]
  8
 D
ec
 20
10
Contents
1. Overview 1
1.1 Supermodule electronics 1
1.2 TRU L0 algorithm 2
2. Trigger requirements and hardware development motivations 2
3. Solution implemented: the Summary Trigger Unit (STU) 2
3.1 Custom serial protocol implementation 3
3.2 Trigger algorithms description 4
4. Conclusion 5
1. Overview
The ALICE detector at the LHC (A Large Ion Collider Experiment) will carry out comprehensive
measurements of high energy nucleus-nucleus collisions, in order to study the phase transition
between confined matter and the Quark-Gluon Plasma (QGP). For this purpose, ALICE has been
upgraded with a large acceptance electromagnetic calorimeter providing the neutral portion of the
jet energy measurement and an efficient and unbiased trigger for high energy jets.
The calorimeter consists of 12288 towers of layered Pb-scintillator which are arranged in 2x2
towers (one module). EMCAL is composed of 10 regular supermodules and is completed with 2
thirds of supermodule. One regular SM is made of 24 strips of 12 modules and the thirds will be
equipped with 24 strips of 4 modules.
1.1 Supermodule electronics
Each tower features a Charge Sensitive Preamplifier (CSP) coupled to an Avalanche PhotoDiode
(APD) that collects the light created during the interaction. The electronics necessary to read out
one supermodule are distributed in 2 crates, see fig. 1. They are mostly occupied by front-end
(FEE) cards [2], whose primary purpose is to perform tower signal readout, thanks to the ALTRO
chips [3]. Their secondary purpose is to build fastOR signals. Each of those is an analogue sum
of the 4 CSP signals generated by a module that is fed through a fast shaper in order to minimize
latency. The analogue fastOR signals are transfered to 3 Trigger Region Unit (TRU) [4] which
continuously digitize them at the machine bunch crossing rate (40 MHz) and compute the local L0
trigger. The Readout Control Unit (RCU) [5] is in charge of performing the data readout of both
boards and transferring the data to the DAQ.
– 1 –
36 FEE
3 TRU
2 RCU
32 towers
/FEE
GTL bus
crate
1152 towers L
Figure 1. One supermodule electronics
1.2 TRU L0 algorithm
After digitization, each fastOR is digitally integrated over a time sliding window of 4 samples.
Then, the results of these operations are constantly fed to 2×2 spatial sum processors that compute
the energy deposit in patches of 4× 4 towers (or 2× 2 fastOR), see fig. 2. Each patch energy is
constantly compared to a minimum bias threshold; whenever it is crossed and the maximum of
the peak has been found, a local L0 trigger is fired. In preparation for the L1 algorithm, the time
integrated sums are also stored in a circular buffer for later retrieval.
2. Trigger requirements and hardware development motivations
In order to meet the objective of recording, the required rejection is in the order of 10-20 for Pb+Pb
and ∼3000 for p+p (smaller event size but higher collision rate). In the Pb+Pb case most of the
rejection will be done by the High Level Trigger (processor farm) as opposed to the p+p case,
where it is ineffective and most of it must be done at L0/L1 level by hardware. A new hardware
development was mandatory, first in order to build the global L0 trigger which is an OR of the 32
L0 locally calculated by the TRUs. Secondly, for computing two kinds of L1 triggers: the L1-
gamma trigger which uses the same patch size as L0, but without the inefficiencies displayed by
the L0 (i.e. 2× 2 patches across several TRU regions can be computed), and the L1-jet trigger,
which is built by summing energy over a sliding window of 2× 2 subregions (1 subregion = 4x4
fastOR = 8x8 towers) (see fig. 2). For that, data aggregation is necessary. Also, it is desirable to
maintain the L1 triggers selectivity against collision centrality in order to discriminate interesting
signal from the background noise. The thresholds must therefore be corrected by the multiplicity
information which is made available by the V0 detector [6].
3. Solution implemented: the Summary Trigger Unit (STU)
The STU hardware (fig. 3) is FPGA-based and is designed to allow data concentration, thanks
to the custom serial link connections with all TRUs. STU also features a Trigger, Timing and
Control (TTC) interface [7] for receiving the machine reference clock and the trigger messages.
The interface with V0 is there to allow event by event threshold computation according to a second
order fit of EMCAL energy as a function of V0 information (which is itself dependent on PMT
– 2 –
TRU #1
TRU #2
Energy deposit
Subregion size
TRU #3
L0 patch
L1 photon patch
L1 jet patch
8 towers
=
4 fastor
48 towers=24 fastor
phi
eta
Figure 2. Flat view of one supermodule, constituted of three neighboring TRU regions, showing the size
and arrangement of possible trigger patches. Each square depicts a module (fastOR) signal.
DCS interface
TTCRq
V0 interface
DDL interface
4
 T
R
U
 i
n
p
u
t s
32 TRU inputs
4
 T
R
U
 i
n
p
u
t s
T
r i
g
g
e
r  
o
u
t p
u
t s
T0-B1
T2-B3
T38-B39
T36-B37
T
3
4
- B
3
5
T
3
2
- B
3
3
T
3
0
- B
3
1
T
2
8
- B
2
9
T
2
6
- B
2
7
T
2
4
- B
2
5
T
2
2
- B
2
3
T
2
0
- B
2
1
T
1
8
- B
1
9
T
1
6
- B
1
7
T
1
4
- B
1
5
T
1
2
- B
1
3
T
1
0
- B
1
1
T
8
- B
9
T
6
- B
7
T
4
- B
5
T0-B1 = Top is input 0, Bottom is input 1
Figure 3. Picture of the Summary Trigger Unit equipped with the xilinx XC5VLX110FF1153.
HV): A.V02+B.V0+C. When a confirmed L2 trigger is received, STU data (TRU time sums,
triggering patch position, thresholds used) are read out via a Detector Data Link (DDL) [8] on
a per event basis. For all of those, multievent buffering is implemented. The Detector Control
System (DCS) interface is based on a transformer-less Ethernet interface board allowing remote
FPGA configuration and experiment configuration (thresholds, delays, ...).
3.1 Custom serial protocol implementation
The main motivation for the development of this custom serial link was the desire to reuse the
TRU design made for the PHOton Spectrometer (PHOS) which was equipped with a spare RJ45
connector directly linked to its FPGA. Also, the trigger timing constraints drove the design in the
same direction, because the transmission latency minimization and some functional requirements,
– 3 –
such as having the STU used as a low jitter reference clock distributor for TRUs and the fact
that the local L0s had to be forwarded to STU for feeding its global OR required a proprietary
solution. Thus, the choice was made to use a 4-pair LVDS link transported over CAT7 Ethernet
cables because they have the appropriate impedance and feature low signal attenuation and low
skew between pairs. Pair usage is as follows: one pair is dedicated for the LHC reference clock
transfer to the TRU, another is used by the TRUs to forward their local L0 candidates and the
2 remaining are used for synchronous serial data transfer without any encoding. Each data pair
running at 400 Mb/s and the clock used for transfer is the LHC clock multiplied by 10. With this
very light protocol, the latency is only the sum of the cable delay and bit transmission time. Each
TRU sends simultaneously its 96 values of 12 bit coded time integrated fastOR data to the STU at
800 Mb/s; the latency is thus 1.44 µs.
The link synchronization is done before each start of run by Finite State Machine implemented
in the FPGA. It is done in 2 steps, first the data phase alignment takes place, it relies on the
individual data path fine granularity delaying feature available in the Virtex 5 FPGA (up to 64
steps of 78 ps). A scanning of all tap values is made in order to obtain the zone where data
reception is stable and then the central value is applied. Secondly, character framing is performed
for associating individually each incoming bit to the good deserialized word.
3.2 Trigger algorithms description
The L1 trigger processing starts at the confirmed L0 reception on the RCU side. The trigger in-
formation is then passed to the TRUs which select the appropriate time integrated value in their
circular buffers and transmit them to the STU. Meanwhile, the V0 detector transfers the A and C
plate charge information to the STU via the direct optical link. The thresholds for photon and jet
patches are immediately processed and made available before the actual patch processing starts.
In the STU, one photon processor (fig. 4) is implemented by TRU reception link. For the A side,
the data are mirrored as they are written in the reception buffer, i.e. fastOR #95 is written in first
memory position and fastOR #0 in the last position. This compensates the actual physical super-
module mirroring due to their opposite sides of insertion.
Only 2 columns of 4 processors are needed per region, because there are 4 possible overlapping
patches in φ and 2 in η . Each processor is an object doing 4 accumulations and one threshold
comparison right after the fourth loading. The processing of all regions is done synchronously and
in parallel and data are dispatched to the photon processors according to their geographical position
in the map. Also, some processors (orange in the fig. 4) are fed with the neighboring region data.
For instance, patch processor E0 process [0, 1, 4, 5], [8, 9, 12, 13], ..., [88, 89, 92, 93] and O0 does
[4, 5, 8, 9], ... ,[92, 93, 0(A), 1(A)].
In parallel to this processing, and for preparing the jet patch processing, data are also fed to a 4×4
integrator which builds the 6 subregions per region. This operation is basically only carrying out 6
times 16 successive accumulations followed by a memory write. Once all subregion informations
are built, EMCAL can be represented as a rectangle of 12 rows × 16 columns of subregions. At
this stage only one jet processor is necessary. It is very similar to one of those used for photon
processing in a region, due to the new geometry, 2 columns of 11 processors of 2× 2 objects are
needed (here the object being subregion instead of fastOR). 165 different patches are computed.
– 4 –
Row/col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Row/col 0
0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 76 80 84 88 92 0 0
1 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 1 1
2 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 2 2
3 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 3 3
Row/col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Row/col 0
0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 76 80 84 88 92 0 0
72
72
R
R
e
g
i o
n
 o
f  
i n
t e
r e
s
t
A+1,R
A
d
e
s
e
r i
a
l i z
e
r s
2
* 4
0
0
M
b
/ s
D
a
t a
 i
n
p
u
t
D
u
a
l  
p
o
r t
 
R
A
M
9
6
 w
o
r d
s
Write pointer
1
2
 b
i t
Can be a reordering pointer
A side data are mirrored
Even index g  patches
100 read
(96+1 col)
To primitive data saving
Read pointer
D
a
t a
 
d
i s
t r
i b
u
t i
o
n
 
F
S
M
LD_even(3..0)
LD_odd(3..0)
Above_96
Data_avail
Start_processing
r e
c
e
p
t i
o
n
 
F
S
M
Data_ram R
4 Acc +
1 comp
E3
Data_ram
LD_even3 
V0_thres
L1g<i> 
LD_even0R 
4 Acc +
1 comp
O0
Data_ram
LD_odd0 
V0_thres
L1g<i> 
Above_96
LD_even0A
Data_ramA
4 Acc +
1 comp
O1
Data_ram
LD_odd1 
V0_thres
L1g<i> 
Above_96
LD_even1A
Data_ramA
4 Acc +
1 comp
O2
Data_ram
LD_odd2 
V0_thres
L1g<i> 
Above_96
LD_even2A
Data_ramA
LD_even3A
Data_ram A
1
0 LD_odd 0R
Data_ram R
1
0
Above_96
Data_ram A+1, R
LD_even0A+1,R
4 Acc +
1 comp
O3
Data_ram
LD_odd3 
V0_thres
L1g<i> 
4 Acc +
1 comp
E0
LD_even0 L1g<i> 
Data_ram
V0_thres
4 Acc +
1 comp
E1
LD_even1 L1g<i> 
Data_ram
V0_thres
4 Acc +
1 comp
E2
LD_even2 L1g<i> 
Data_ram
V0_thres
8 Acc+
1 saveData _ram
Data_avail 
12
16
Subregion processor
RAM 6 
words
WR 
To jet trigger 
processor
Data_ram
Every 4 accumulations, each g processor compares result 
to V0 threshold  and generate indexed trigger is required
Odd index g  patches
Figure 4. On top, one TRU region with fastOR numbering, surrounded by its neighboring region. Below,
one photon patches processor, containing deserializer, mirroring system, data distribution FSM, 2 columns
of 4 photon processors and the subregion integrator.
4. Conclusion
The STU has been installed for a year, all the system interfaces have been validated and the custom
serial protocol has been demonstrated to operate in real condition with intensive readout. The
L0/L1 triggers are commissioned and data validation with the runs done in p+p at fixed threshold
is in progress. The good trade-off between parallel and serial computation has allowed to use only
∼ 50% of the FPGA internal logic and left a timing margin of ∼700 ns for the processing of 2961
photon 2× 2 and 165 jet 64× 64 patches processing. These provisions allow future algorithm
upgrades or modifications.
References
[1] EMCAL Technical design report, CERN-LHCC-2008-014
[2] H. Muller et al., Front-end electronics for the ALICE calorimeters, NIM A, vol. 617, Issues 1-3,
p369-p371, doi:10.1016/j.nima.2009.09.022
[3] Altro documentation http://ep-ed-alice-tpc.web.cern.ch/ep-ed-alice-tpc/altro_chip.htm
[4] H. Muller et al., Hierarchical trigger of the ALICE calorimeters, NIM A, vol. 617, Issues 1-3,
p344-p347, doi:10.1016/j.nima.2009.06.097
[5] Gutierrez et al., The ALICE TPC readout control unit, NSS Conference Record, 2005 IEEE, p575-p579
[6] Technical Design Report on Forward Detectors FMD, T0 and V0, CERN-LHCC-2004-025
[7] TTC documentation web site http://ttc.web.cern.ch/TTC/intro.html
– 5 –
[8] DDL documentation web site http://alice-proj-ddl.web.cern.ch/alice-proj-ddl/
[9] Central Trigger Process web site http://epweb2.ph.bham.ac.uk/user/krivda/alice/
– 6 –
