CMS regional calorimeter trigger high speed ASICs by Smith, W H et al.
CMS REGIONAL CALORIMETER TRIGGER HIGH SPEED ASICs
W. H. Smith, P. Chumney, S. Dasu, M. Jaworski, J. Lackey
Physics Department, University of Wisconsin, Madison, WI 53706 USA
Abstract
The CMS regional calorimeter trigger system detects
signatures of electrons/photons, taus, jets, and missing
and total transverse energy in a deadtimeless pipelined
architecture. This system contains 20 crates of custom-
built electronics. Much of the processing in this system is
performed by five types of 160 MHz digital ASICs. These
ASICs have been designed in the Vitesse submicron high-
integration gallium arsenide gate array technology. The
five ASICs perform data synchronization and error
checking, implement board level boundary scan, sort
ranked trigger objects, identify electron/photon candidates
and sum trigger energies. The design and status of these
ASICs are presented.
1. CMS CALORIMETER L1 TRIGGER
The CMS level 1 trigger decision is based in part upon
local information from the level 1 calorimeter trigger
about the presence of physics objects such as photons,
electrons, and jets, as well as global sums of ET and
missing ET  (to find neutrinos) [1].
For most of the CMS ECAL, a 5 x 5 array of PbWO4
crystals is mapped into trigger towers. In the rest of the
ECAL there is somewhat lower granularity of crystals
within a trigger tower. There is a 1:1 correspondence
between the HCAL and ECAL trigger towers. The trigger
tower size is equivalent to the HCAL physical towers,
.087 x .087 in η x φ. The φ  size remains constant in ∆φ
and the η size remains constant in ∆η out to an η of 2.1,
beyond which the η size increases.
The electron/photon trigger uses a 3x3 trigger tower
sliding window technique which spans the complete
coverage of the CMS electromagnetic calorimeter [2].
Two independent streams are considered, non-isolated and
isolated electron/photons. The non-isolated identification
requires a large energy deposit in one or two adjacent
ECAL trigger cells, a narrow lateral shower profile (the
energy spread in η strips of 5 crystals in the central ECAL
cell of 3x3 trigger tower window) and small H/E in the
central trigger cell of 3x3 window. The isolated
electron/photons additionally require small energy in
ECAL cells surrounding the central cell of 3x3 window
and small energy in HCAL cells surrounding the central
cell of 3x3 window.
The jet trigger uses the transverse energy sums (ECAL
+ HCAL) computed in calorimeter regions (4x4 trigger
towers). Jets and τs are characterized by the transverse
energy ET in 3x3 calorimeter regions (12x12 trigger
towers. For each calorimeter region a τ-veto bit is set if
there are more than two active ECAL or HCAL towers in
the 4x4 region. A jet is defined as ’tau-like’ if none of the
9 calorimeter region τ-veto bits are set.
2. CALORIMETER TRIGGER HARDWARE
The calorimeter level 1 trigger system, shown in
Figure 1, receives digital trigger sums from the front-end
electronics system, which transmits energy on an eight bit
compressed scale. The data for two trigger towers is sent
on a single link with eight bits apiece, accompanied by
five bits of error detection code and a “fine- grain” bit for
each trigger tower characterizing the energies summed into
it, i.e. isolated energy for the ECAL or an energy deposit





















4 Highest isol. Et e/γ
4 Highest non-isol. Et e/γ
4 Highest jets, taus
E
x











4K 1 Gb/s serial Cu links with:
2 x (8 bits EM or HAC Energy
plus 1 bit "fine structure")
+ 5 bits error detection code
72 φ x 56 η
Towers
every 25 ns.
Figure 1.  Overview of Level 1 Calorimeter Trigger
The calorimeter regional crate system uses 20 regional
processor crates covering the full detector. Eighteen crates
are dedicated to the barrel and two endcaps. These crates
cover the region |η|<3. One special crate covers both HF
Calorimeters that extend missing ET and jet finding
coverage to |η|<5. The remaining crate collects regional
information from these 19 trigger crates and clusters their
regions to find jets and taus. It also continues the
summation tree to provide sums of ET in various φ
regions.
Each calorimeter regional crate transmits to the
calorimeter global trigger processor its 4 highest-ranked
isolated and non-isolated electrons. The cluster crate sends
its 9x4 highest energy central and forward jets and tau
candidates along with information about their location and
sum ET for the 18 φ regions covered by it. The global
calorimeter trigger then forms Ex and Ey using look-up
tables and sums the energies, separately sorts the
electrons, jets and taus, and sends the top four calorimeter-
wide candidates, as well as the total calorimeter missing
and sum ET to the CMS global trigger. The muon
isolation and identification bits formed using the HCAL
information are passed to the global muon crates via the
global calorimeter trigger.
Eighteen crates of the Calorimeter Regional Trigger use
three custom board designs that are dedicated to receiving
and processing data from the barrel and endcap
calorimeters. In these crates there are seven rear mounted
Receiver cards, seven front-mounted Electron Isolation
cards, and one front-mounted Jet Summary card for a total
of 15 processor cards per crate. These cards and an
additional clock and control card are plugged into custom
“backplane” which provides 160 MHz point-to-point links
between the cards. A VME bus is also provided to these
cards using high-density connectors in the top 3U section
of the backplane. In addition there are two slots with
standard VME backplane connectors for crate processor and
monitoring cards.
 The 19th crate covering the forward calorimeter houses
special cards that use portions of circuitry of the Receiver
and Jet Summary cards to drive the signals out for
forming jets and ET sums. The 20th cluster crate is
similar to the 18 barrel and endcap crates but uses a
different backplane and a set of cluster processor cards that
implement the jet and tau finding algorithms and ET
sums. These cards and backplane are based on the same
technology used in the other crates.
 The regional calorimeter trigger crate, shown
schematically in Figure 2, has a height of 9U and a depth
approximately of 700 mm [3]. The front section of the
crate is designed to accommodate 280-mm deep cards,
leaving the major portion of the volume for 400 mm deep
rear mounted cards
The Receiver Card synchronizes the input data and
passes it through look-up tables to separately linearize the
energies into the number of bits needed for electron
identification and energy triggers.  Data in parallel form is
shared with the neighboring crates at 80 MHz.  The entire
system operates in lock step after this stage at 160 MHz.
The energies are then summed in 4 x 4 trigger tower
regions.
 The data for the electron identification logic, which
includes both that received on the serial cables and that
received on inter-crate cables, are transferred to the
Electron Identification cards plugged into the front side of
the backplane.  The 4 x 4 sums are transferred to the
Jet/Summary card plugged into the center of backplane on
















Figure 2. Schematic view of a typical Calorimeter
Level 1 Regional crate.
The Electron Isolation card implements its algorithm in
the Isolation ASIC. The candidate electrons are ranked and
top candidates are passed to the Jet/Summary card. The
Jet/Summary card sorts the electron and jet candidates in
the crate to output the top four candidates of each kind on
a cable to the global trigger. It also calculates sums of Ex,
Ey and Et for transmission to the Global Calorimeter
Trigger (GCT) cards.  The GCT sorts objects and sums
energies to obtain the final output of the calorimeter
trigger which is used by the Global Trigger together with
the muon trigger data to provide the final trigger decision.
3. DIGITAL ASICS
The five digital ASICs developed for the regional
calorimeter trigger are, Phase ASIC, the Adder ASIC
Boundary Scan ASIC, Sort ASIC and Isolation ASIC.
They were produced in Vitesse FX™ and GLX™ gate
arrays utilizing their sub-micron high integration Gallium
Arsenide MESFET technology. Except for the 120 MHz
TTL input to the Phase ASIC, all ASIC I/O is at 160
MHz ECL.
The Phase ASIC is designed to receive four channels of
parallel data from a Vitesse 7216 4-channel Serial to
Parallel 1.2 GBaud copper receiver. Each channel of data
arrives at 120 MHz eight bits wide in 3 cycles for each 25
ns bunch crossing. This provides a 24-bit frame at 40
MHz that contains the 18 bits of data described above and
5 bits of error detection code, with one bit in reserve. A
block level diagram of the ASIC is shown in Figure 3.
The single clock for four channels is derived from the
Vitesse Receiver. Data is transmitted from each Receiver
channel along with two status bits and an error bit.  The
status can be used to determine whether the link is in
setup mode or data transmission mode. The input stage of
the Phase ASIC is a 44 bit wide FIFO that is six frames
deep. The FIFO accommodates minor phase shifts
between the transmitter and local clocks.
DOUT(4:7)DOUT(0:3) ERR(1:4)







DIN(0:1) DIN(2:3) DIN(4:5) DIN(6:7)CLK CLK CLK CLK



















8 8 818 18 18 18








Clock3Phase Cntrl Phase Cntrl Phase Cntrl
Figure 3. Block level diagram of the Phase ASIC.
The FIFO is followed by a circuit, which sets the
proper phase between the incoming data and the local
bunch-crossing clock. This circuit makes use of status
information from the VSC7216 to set the final phase.
Once properly phased, the data and error bits can be
separated into 18 bits of data (two channels) and 5 bits of
Hamming code. The Hamming code is recomputed from
the data and compared with the received Hamming code
bits. This Hamming code catches all single and double bit
errors and most other multi-bit errors. The data leave the
Phase ASIC at 160 MHz in two data channels with 9 bits
apiece, and one error channel, also 9 bits. The error bits
are made up of the transmitted EDC along with a subset
of the status bits from the VSC7216 and an overall error
indicator. The status bits from the VSC7216 provide
sufficient information to determine the state of the serial
links at any point in time
As we have four input channels, each handling two
towers per crossing, the two output channels produce four
towers of information per crossing. The outputs are
clocked at 160MHz.
The last storage element of the Phase ASIC is
implemented as a loadable counter. During normal
operation the counter will be loaded with data each 6.25ns.
During testing the counter can be reset and enabled to
count synchronously with the rest of Phase ASIC
outputs. The counter outputs will address the look-up
tables just as detector data would. The combination of
these counters and look-up tables can be used to provide
any data pattern necessary to test the remainder of the
Trigger Processor system. The error outputs will be idle
during testing.
The Phase ASIC has a JTAG controller and scan cells
on all the outputs. The data on link errors is zeroed so that
loss of individual links does not inhibit data taking. The
broken links will be re-synchronised periodically.
However, link error flags from the Phase ASICs are
counted and the counts are readable by VME by the local
crate processor for monitoring.
The Adder ASIC is designed to add 8 11-bit numbers
(including the sign) in 25 nsec, while providing bits for
arithmetic and input overflows. Vitesse has produced it in
0.6 µH-GaAs technology. The Adder ASIC consists of
approximately 11,000 cells, uses 4 W and has been tested
to 200 MHz, considerably above the 160 MHz
requirement.
The Adder ASIC provides a 4-stage pipeline with eight
input operands and 1 output operand. There are three
stages of adder tree, with an extra level of storage added to
ensure chip processing is isolated from the I/O. The ASIC
uses 4 bit adder macro cells to implement twelve bit wide
adders. Eleven bits are wired, left justified, to each operand
of an adder. The LSB of each adder is internally set to
ZERO. The MSB is treated as a sign bit. Therefore,
although the adder tree may be constructed from three 4 bit
adders, the width of the operand data paths has been
limited to eleven bits. An Adder ASIC chip is designated
as “master” if it is in the top rank of the adder tree and as
“slave” if it is further down. Masters can generate Tower
overflow (TOV), but slaves can only propagate TOV.
Both masters and slaves can generate and propagate
arithmetic overflow/underflow (AOV). These bits are
appended to each input and output operand, making all
operands 13 bits wide. TOV becomes the twelfth bit of
the output result and AOV the thirteenth bit.
A block diagram of the Adder ASIC is shown in Figure
4. The top of the adder tree is composed of four 12-bit
adders and includes the logic required to detect and
propagate TOV and AOV. All eight of the TOV bits are
ORed together and all four of the AOV bits are ORed
together to form two separate overflow bits that are
forwarded with the data in the pipeline.




INPUT CELLS  w/Boundary Scan 









Figure 4. Adder ASIC.
The second stage contains two more 12-bit adders and
includes the logic needed to propagate TOV and to detect
and propagate AOV. From this point on, TOV is
forwarded down the pipeline from register to register.
AOV is generated in the same manner as in the first stage
and the resulting two bits are ORed with the AOV from
the previous stage.
The third stage contains the final adder as well as a
continuation of the TOV/AOV circuitry. The register at
this level is the last storage element before the ASIC
output. TOV and AOV are stored along with the operand.
The last register is presented to one input of a 2:1
multiplexer before leaving the chip through the boundary
scan cells and pads. The other side of the multiplexer is
fed by an 8:1 multiplexer which passes any one of the
eight input operands, less the two overflow bits, to the
output of the ASIC.
The Boundary Scan ASIC has several functions.
Firstly, it provides control for board level boundary scan
functions. Secondly, it provides drivers for sending data
over the point-to-point links on the backplane and inter-
crate cables. Thirdly, it provides simple algorithms needed
for manipulating data, e.g., to reduce the corner tower data
from 7 bits to 3 bits while ensuring that the setting of
any upper bits in input saturates the 3-bit scale.
The Isolation ASIC, shown in Figure 5, handles four
electromagnetic energies on a 7-bit scale along with the
corresponding Veto bit, every 6.25 ns. Nearest neighbors
are also included in the data flow. During the first cycle of
every crossing the four neighboring energies from the
adjacent 4 x 4 region are also be strobed into the ASIC.
The neighbors along either edge of the 4 x 4 region are
also included, two at a time during each 6.25 ns period.
Finally, the last cycle strobes in the four neighboring
towers of the bottom edge. Thus, in one bunch crossing
time, a total of 36 towers are clocked into the Isolation
ASIC.






























A(1:4)MAX B(1:4)MAX C(1:4)MAX D(1:4)MAX
TEABEA BEB BEC BEDTEB TEC TED
88888888888888
Figure 5. Isolation ASIC logic
The main data flow of the Isolation ASIC processes the
data through three separate blocks. The purpose of the first
of these, the Input Staging, is to receive the data at the
time when it is available and change the time relationship
to one suitable for the processing that follows. At the
beginning of a crossing, the first row of the 4 x 4 array is
available, along with the top edge. The signal Cycle 1
selects the Top Edge input on the right hand multiplexer.
After the first 6.25 ns clock, the first rank of registers
contain one of the towers in the 4 x 4 array (a reference
tower) along with its top neighbor. The left most register
in the top rank is undefined at the beginning of the
sequence. After a second clock cycle, the reference tower is
in the middle register of the bottom rank of registers and
its top neighbor is in the right hand register. The left-
most register in the bottom rank contains the next
successive reference tower, as does the middle register in
the top rank. This value is the bottom nearest neighbor
for the first reference tower. The sequence continues
through to the cycle where the last reference tower in a
column of 4 towers is clocked into the middle register in
the bottom rank. During the same cycle the Bottom Edge
data is available from the neighboring card. It is clocked
into the bottom left register during Cycle 1 at the
beginning of the next sequence.
The Input Staging block places each reference tower and
its neighbors in the same time frame. The remaining
blocks in the chip can now handle the processing in
parallel. The function of the Add/Compare block is to
form four sums between a reference tower and its top,
bottom, left and right neighbors. At the same time the
sums are being formed, four compares are made to
determine for each pair of towers whether the reference
tower is larger than or equal to its neighbor (equality
check). When a reference tower and its neighbor satisfy the
equality check the sum of the pair is enabled to the Find
Max block. When the sum is disabled, a value of zero is
passed on to the next block.
The next to last stage in processing the electromagnetic
information is the Find Max block. The four sums are
presented, in parallel, to two comparators. The outputs of
these comparators are used to select the maximum of each
pair, which are placed in intermediate storage. These two
maxima are presented to a single comparator during the
next clock cycle. The output of this comparator is the
maximum two-tower sum for an individual reference
tower. The single maximum from the original four values
is stored in a register. The Veto bits are stored with each
of these sums. A final stage of logic sorts through all 16
maxima generated over a bunch crossing time and places
that value, along with its Vetoes, on the outputs of the
ASIC. The total latency for the electromagnetic data path
is 12 x 6.25 ns or 3.0 bunch crossing times.
The Sort ASIC finds the four largest of eight 6-bit
values. Six bits is sufficient to handle both the ET sums
and the electron candidates. Figure 6 is an illustration of
the major functional blocks that make up the ASIC.
Rather than try to design an ASIC that handles eight 6-bit
operands in parallel, it was decided to shift the data in,
four operands at a time, over two 6.25 ns cycles.
 The algorithm implemented within the Sort ASIC is
based on a simple rotation of operands. The eight operands
are divided into two groups of four. The operands are
compared in pairs between the two groups, with the larger
of the two taking over the position of the left-hand
member of the pair. This comparison is performed in four
stages with a rotation of compared pairs occurring between
each stage. By the end of the fourth stage a sufficient
number of comparisons have been made to ensure the four
largest values are in the left-hand group. In order to save
steps, and thus minimize the total latency, these four
values are not placed in any rank order.
18 18 18 18
MAX 4
0 1 2 3 4 5 6 7
(4 x 10 bits)
Register
10 10 10 10
Register/DeMux
Clock160





Figure 6. Sort ASIC Logic.
4. CONCLUSIONS
The CMS regional Calorimeter high-speed ASIC
prototypes have been submitted for manufacture. They
implement revised trigger algorithms. The Adder ASIC
has been tested and its production finished. The 160 MHz
ECL I/O of these ASICs enables the construction of a
compact Level 1 calorimeter trigger with low latency.
This work is supported by the United States
Department of Energy and the University of Wisconsin.
5. REFERENCES
[1] J. Varela et al., Preliminary specification of the
baseline calorimeter trigger algorithms, CMS-TN/96-010.
[2] CMS HCAL TDR, CERN/LHCC 97-31, 20 June
1997; CMS ECAL TDR, CERN/LHCC 97-33, 15
December 1997.
[3] J. Lackey et al., CMS Calorimeter Level 1
Regional Trigger Conceptual Design, CMS NOTE-
1998/074 (1998).
