Readout Driver for the ATLAS Liquid Argon calorimeters by Cleland, W E
READOUT DRIVER FOR THE ATLAS LIQUID ARGON CALORIMETERS
W. E. Cleland, for the ATLAS Collaboration
Abstract
 The Readout Drive (ROD) for the Liquid Argon
calorimeter front-end electronics of the ATLAS
detector is described. The ROD receives triggered data
from 256 calorimeter cells. It must derive the precise
energy and timing of calorimeter signals from discrete
samplings of the pulse. In addition, it performs
monitoring and formats the digital stream for the
succeeding element in the readout chain. Data arrive
over two 1.28 Gbit/s fiber optics links at a 100 kHz
event rate (25Kbit/event). Principals of the design are
discussed, along with simulations of data processing.
1.  INTRODUCTION
In ATLAS [1] there are three basic types of liquid
argon calorimeters: the em calorimeters (barrel and
end-cap), the hadronic end-cap calorimeter and the
forward calorimeters. The front-end electronics for all
of these calorimeters is essentially identical, the
differences being confined to the amplification stage
upstream of the shaping amplifier. After shaping, the
signals are stored in a switched capacitor array (SCA),
and upon receipt of a Level 1 trigger, the samples
relevant to the event are digitized on the front-end
board (FEB). These digitizations are transmitted to the
Readout Driver (ROD) module, whose function is to
extract the parameters of interest for each calorimeter
cell and pass these data to the Readout Buffer (ROB)
module, the first element in the data acquisition chain.
A simplified diagram of this part of the readout chain
is shown in Figure 1.
Figure 1: Simplified diagram of the portion of the
readout chain involving the ROD.
In the ATLAS FEB, which treats 128 calorimeter
channels, the signals are amplified, shaped and then
stored  as analog levels in the  SCA. Upon receipt of a
Level 1 trigger signal, a 12-bit ADC digitises the
appropriate samples. Three gain scales are employed,
requiring three shaper channels and three SCA
channels for each calorimeter cell. All samples are
digitized on a common gain scale, which is chosen
event-by-event by examining the amplitude of the
sample closest to the peak of the signal. Each ADC
digitises the signals from 8 calorimeter channels, and
the results are sent over an optical fiber to the ROD
module. The fiber contains data from all 16 ADCs in
the FEB; 32 bits (2 bits/ADC) are sent every 25 ns,
giving a transmission rate of 1.28 Gbit/s.
2. THE ROD MODULE
2.1 Overview
A single ROD module receives data from two front-
end boards, consisting of  (typically) 5 samples from
256 channels. The processors in the module calculate
energy and timing information from these data, and in
most cases, discard the raw data. The ROD also
performs monitoring tasks, and during calibration runs,
it executes a signal averaging task and sends averaged
data to a local processor, which then calculates
calibration constants for the channels belonging to that
module. The ROD modules will be housed in a
Readout Crate, which will in all likelihood will be a 9U
VME crate with a dedicated host processor.  The ROD
system will be a highly specialized distributed
computing resource for the ATLAS detector. It will
consist of about 800 modules, each of which services
up to 256 calorimeter channels. The total computing
power of this resource will be approximately 4x1015
arithmetic operations per second.
2.2 The Basic Algorithm
 The most important function of the ROD is to
determine the energy E and time T (relative to the
nominal bunch crossing timing) from the digitized
samples, along with a parameter Q (such as chi-
square), which indicates how closely the samples
follow the known waveform. Secondary functions
include updating histograms of these quantities and
performing certain monitoring functions for some
small fraction of the events.
A typical waveform of the shaped liquid argon
waveform is shown in Figure 2, along with samples
spaced by 25 ns. A general technique to estimate E and
T in an accurate and computationally efficient manner
is that of optimal filtering [2], in which the desired
quantity is expressed as a sum of the samples
multiplied by predetermined weights. The weights are




quantity be minimized while satisfying certain
constraints. In our case, where there are two quantities
to be determined, the procedure involves
simultaneously minimising the uncertainty in both
quantities. The expressions are:
where the sum extends over all of the samples Si, and
where ai and bi are the weights.  From the structure of
these formulae, one sees that the error in the amplitude
is amplitude independent, whereas the error in the time
varies inversely with the amplitude. For this reason it
only makes sense to calculate T only for those channels
with E above some threshold value Eth. The quality-of-
fit parameter will most likely be a simplified
expression for chi-square (i.e., one that ignores
correlations between the different terms):
in which gi is the expected waveform normalized to
unity. Since this calculation involves knowing both E
and T, it will also be performed only for the case where
E is above the threshold value. Once E, T, and Q are
found, the corresponding bins for general histograms
of these quantities are calculated and incremented. In
addition, if special histograms are required for specific
calorimeter cells, which are being monitored, these
histograms are also incremented.
 Thus the basic algorithm is as follows:
• calculate E for all channels
• if  E > Eth, calculate T and Q
• update general histograms
• calculate quantities for monitoring specific
channels
In simulation, these steps have been implemented for
the C6202 DSP, and execution times for each stage of
the algorithm are given in Section 5 below.
2.3 Design Considerations
The maximum Level 1 trigger rate for ATLAS is
currently specified as 75 kHz, but an upgrade to 100
KHz is considered a strong possibility, and hence we
use the latter figure as a design parameter for the ROD.
Since the processing time per event depends on the
fraction of cells with energy above the threshold, there
can be considerable fluctuations from module to
module. For this reason, a derandomizing buffer will
be required to reduce the system dead time. One model
of the process indicates that in order to keep the dead
time below 0.5%, it is necessary to buffer 7 events [3].
We plan for each ROD to serve two FEBs, or 256
calorimeter channels. With a Level 1 trigger rate of
100 KHz, the average processing time per event cannot
exceed 10 microseconds. Fortunately only a small
fraction of the cells contain significant energy deposits
in each event, which reduces considerably the
computing power required. As mentioned above, the
energy is calculated for each channel, but the time
value is computed only for events with energy
significantly above the noise, since it is only for these
channels that the measurement is meaningful.
Given the design criteria listed above, there are several
possible approaches to the ROD design. One could
imagine using a multiply-accumulate chip, which is
optimized to perform the calculation we require. Or
one can even imagine designing and building a special-
purpose ASIC to carry out the task. Our preference,
however, is to evaluate solutions using programmable,
commercially available processors which can perform
our algorithm efficiently but have limited general
computing capability. A natural choice is the Digital
Signal Processor (DSP), a device whose technology is
advancing at a rapid pace. We plan to study such
devices in detail to ascertain if they are able to perform
the task in the required time before examining more
ambitious solutions. Likewise, we plan to use off-the-
shelf components for all ancillary elements in the
ROD, in order to minimise design effort.
2.4 Integer vs. Floating  Point DSP
Since we are dealing with quantities which cover many
orders of magnitude in size (E, for example can range
from tens of MeV to several TeV), it is both natural




















Figure 2: Typical shaped calorimeter signal with
samples (dots) spaced by 25 ns.









detailed investigation of the system of digitization we
plan to use (12 bit ADC operating on 3 gain scales)
indicate that integer DSPs, which are in general both
faster and cheaper, are also adequate in this case.   As
long as 16-bit constants are used in the calculation, the
effects of rounding in the determination of both the E
and T are completely dominated by ADC quantization.
The division of E×T by E to obtain T is an
inconvenience of course, but this can be handled to
adequate accuracy by table lookup. Thus we plan to
investigate both integer and floating point DSPs for
possible use in the ROD.
3. ROD  DEMONSTRATOR PROJECT
3.1 Purpose and Scope
In order to demonstrate the capability of candidate
DSPs and to understand more clearly the design
problems of the ROD, ATLAS has decided to pursue a
ROD Demonstrator Project. The project involves the
construction of a motherboard in the 9U VME64x
format, into which can be plugged up to four
daughterboard processing units (PUs). These
processing units will contain one or more DSPs and
will be used to process calorimeter data, either fed in
from an artificial source or from a calorimeter module
running in the test beam. The plan calls for
implementing the FEB-ROD-ROB chain, so that all of
the required functionality can be tested
3.2 Hardware
The ROD Demonstrator Board is a VME64x 9U board
with a custom P3 backplane. It accepts up to four
Programming Units, which may be of different types.
Input data may be supplied to the board from up to two
FEBs or through the VME backplane.  An input is also
provided for the timing and trigger information (TTC)
in the format that will be used in ATLAS. Output can
be to a ROB module or through the VME backplane.
In Figure 3 a shematic diagram of the board is shown.
The Processing Unit (PU) is a small (85x185 mm)
daughterboard containing one or more DSPs plus any
external memory required, input and output buffers,
and an interface to the motherboard. The tasks
performed by the PUs are expected to be very similar
to the tasks required of the ROD in ATLAS. Currently
two PUs are being designed, one based on a floating
point DSP (the SHARC of Analog Devices) and the
other based on an integer DSP (the C6202 of Texas
Instruments).  As newer DSPs, which look promising,
become available, we expect to add them to the
project.
Figure 3: Schematic diagram of the ROD Demonstrator board.
4. A DESIGN EXAMPLE
4.1 Motivation
In order to illustrate the type of studies that are needed
to qualify a DSP for adoption for the ROD, we discuss
a concrete example in some detail. For this we choose
the design of the PU based on the Texas Instruments
C6202 DSP.  It is not unlikely that another DSP will
eventually be used, given the time scale for the
ATLAS experiment, which begins in 2005.
4.2 The TI C6202 DSP
The C6202 is an integer processor with many of the
features that are required by the ATLAS ROD. It can
operate at clock speeds up to 250 MHz and has eight
independent functional units (6 ALUs and 2 16-bit
multipliers), permitting it to execute eight 32-bit
instructions per cycle. The internal 128 Kilobyte data
memory is somewhat limited for our purposes, since
we need to (a) buffer the input data, (b) carry out
calculations, and (c) store histograms, so we plan to
augment it with an external dual-port memory. The
unit has both an external memory interface and an
expansion bus, which offers a convenient interface to
an FPGA.
4.3 Design Sketch of the Processing  Unit
Figure 4: Simplified schematic diagram of the
Processing Unit based on the Texas Instruments
C6202 DSP. The items shown in dashed boxes are
external to the Processing Unit. The interface to the
VME system is not shown.
In Fig. 3 we show a block diagram of the processing
unit. The data from the FEB and the TTC are brought
in on the left, where they enter an FPGA. This device
performs certain routine checks on the input data and
transfers them into a fast dual-port memory, which is
connected to the external memory bus of the DSP. This
configuration permits the data to be processed in the
DSP without performing a transfer to its internal
memory.
4.4 Ancillary Circuitry
There are four logical units in the Processing Unit in
addition to the DSP: (1) Input FPGA, (2) Dual-port
Memory, (3) Output FPGA, and (4) FIFO. The Input
FPGA receives input data from both the FEB and the
TTC (Trigger, Timing and Control) modules. The
former contains the digitized data from the calorimeter
whereas the latter contains information about the
trigger (bunch crossing, trigger type, etc.). The two
types of data are combined into one record by the
FPGA, which also performs parity checks, and stored
in the dual-port memory (the input buffer) until it is
processed by the DSP. Once the DSP finishes the
processing of the event, the results are written to the
output FPGA, which formats the output data stream
and puts it into the FIFO, which is the output buffer
memory for the ROD. Because this unit will be used in
a test situations, where there may be large fluctuations
in event size, the input and output buffers are larger
(corresponding to about 100 events of average size)
than will probably be required for ATLAS.
5. QUANTITATIVE STUDIES
5.1 Benchmark Code
We have developed a set of tasks which each of the
DSPs should perform in order to be able to make
comparisons between them and also establish their
viability as candidates for the final ROD system. This
code does not include all of the tasks that will be
performed by the DSPs in ATLAS, only the most time-
critical ones. First, the event header is checked for
errors in data format or parity. Then the basic
operations performed are:
• check event header
• read five input samples and gain scale
• fetch weights
• calculate E
• if E > Eth, calculate T and Q
The above steps are performed for each channel
assigned to the DSP. In our current model of the ROD,
four DSPs are used in each module, so 64 calorimeter
channels are assigned to each DSP.
5.2 Timing Studies for TI C6202
The benchmark code described above has been
implemented for the TI C6202 processor, using the
simulator provided by Texas Instruments.  To improve
the calculation time, it was found necessary to break
the code into small loops, minimising branching and
conditional statements. Hand coding was used to













NE  represent the number of cells which have E>Eth
(for this study Eth was chosen to be twice the noise of
the calorimeter cell, or about 100 MeV). In Table 1 are
given the number of DSP cycles taken for each section
of the algorithm, as a function of NE, and in Table 2 we
give an estimate of how this translates into execution
times for events with different numbers of cells above
threshold. Here we see that even if one-third of the
cells are above threshold (a rare occurrence), the
execution time is equal to the average spacing between
events at the design trigger rate. This indicates that the
TI C6202 would in fact meet the requirements for the
ROD. More recent results, arising from further
optimisation of the algorithm, show that the execution
times given here are overestimates, which strengthen
this conclusion.
Table 1: Number of cycles required by TI C6202 for
each stage of the algorithm.
Calculation Cycles




General  histograms 39+16*NE
Monitoring  selected cells 48+13*NE
Table 2: Execution times of the TI C6202 DSP  (in
microseconds) as a function of number of cells
above threshold, out of a total of 64 cells, and
whether or not monitoring functions are being
performed.









5.3 Monte Carlo Studies
The ATLAS Monte Carlo chain is used to produce
artificial output data for the FEB, and this output has
been fed to the benchmark code for the ROD described
above. The first case to be studied is for the shower of
a 50 GeV electron, for the case of only thermal noise
(zero luminosity) and for pileup noise at low and high
luminosity.  It was found that the energy and width of
the reconstructed distribution are consistent with the
expected values for each case, and that the introduction
of integer arithmetic has no measurable effect on the
results.  We expect to use this program for a variety of
purposes, such as checking the algorithm in the
reconstruction of different types of showers, estimating
with more precision the parameters, which enter into
the execution time of the algorithm, and to understand
how these parameters depend upon event type.
6. SUMMARY
We have described the technical requirements for the
Readout Driver for the liquid argon calorimeters in
ATLAS. From our studies to date, it appears that
commercial DSPs can meet the needs for this device.
We are carrying out a demonstration project in which
several DSPs, both integer and floating point will be
evaluated for their suitability in ATLAS. We give an
example of the conceptual design and the simulation
results for one of the processing units being built for
this project, which is based on the Texas Instruments
C6202 DSP. Simulation of benchmark code indicates
that the speed of this processor is close to meeting our
requirements, and initial Monte Carlo results indicate
that the algorithm used produces acceptable results for
the case studied.
7. REFERENCES
1. ATLAS Liquid Argon Calorimeter TDR,
N/LHCC/96-41 ATLAS TDR 2, 15 Dec. 1996
2. W. E. Cleland and E.G. Stern, Nuclear Instruments
and Methods A338 (1994) 467
3. ATLAS LAr TDR, ibid, p. 425
