Off-detector electronics for a high-rate CSC detector by Dailing, J et al.
Off-Detector Electronics for a High-Rate CSC Detector
Dailing, J.1 Drego, N. 1 Gordeev, A.2  Gratchev,V. 2  Hawkins, D. 1 Kandasamy, A. 2 Lankford, A. 1
Li, Y.1  O'Connor, P. 2 Pier, S. 1 Polychronakos, V. 2 Schernau, M. 1  Stoker, D. 1                                    
Tcherniatine, V. 2  Toledano, B. 1  Vetter, K.2
1
 University of California, Irvine, CA, USA
2
 Brookhaven National Laboratory, Upton, NY, USA
Abstract
The off-detector electronics system for a high-rate
muon Cathode Strip Chamber (CSC) is described. The
CSC's are planned for use in the forward region of the
ATLAS muon spectrometer. The electronics system
provides control logic for switched capacitor array
analog memories on the chambers and accepts a total
of nearly 295Gbit/s of raw data from 64 chambers. The
architecture of the system is described as are some
important signal processing algorithms and hardware
implementation details.
I. THE ATLAS CSC SYSTEM
The ATLAS CSC system is designed to measure
high momentum muons in the forward regions (2.1 <
|η| < 2.7) with high resolution in a high radiation
environment.  The CSC system consists of 64
chambers with half of the modules in each direction.
Each CSC module has four layers, providing a
precision measurement in the (radial) bend direction
and a coarser measurement of the transverse
(azimuthal) coordinate. Each module has 768
“precision coordinate” channels and 192 “transverse
coordinate” channels.  The total channel count for the
CSC system is 61,440 [1].
Due to severe radiation levels in the CSC
environment, a minimum of the CSC electronics will
be located on the detector. The on-detector electronics
amplifies and shapes the cathode strip signals, and
stores the pulse height information during the level 1
trigger latency. Upon receipt of a “level 1 trigger
accept” (LVL1 Accept), multiple time samples are
digitized and transmitted via high-speed fiber-optic G-
Links to off-detector electronics. Sampling and
digitization are performed on-detector and are
controlled by the off-detector electronics. The off-
detector electronics receives the digitized samples,
rejects out-of-time hits, and suppresses hits below
threshold, except those adjacent to hits that exceed the
threshold. Data from hit clusters are assembled and
processed by the off-detector electronics. The
processed data are transmitted to the ATLAS
Trigger/DAQ System for further processing.
II. THE CSC ELECTRONICS SYSTEM
A. The on-detector electronics
The CSC on-detector electronics [2] resides on
Amplifier-Storage Module (ASM) boards. Each strip is
connected to a Preamplifier and Shaper (P/S) which
makes a bipolar pulse with a 140 ns shaping time to
mitigate pile-up effects. The shaped pulses are sampled
every 50 ns, and the analog pulse height information is
stored in a Switched Capacitor Array (SCA) for the
duration of the level 1 trigger latency.  Only data close
in time to valid LVL1 Accepts are digitized.
The on-detector electronics for each CSC module
consists of five ASM boards, each handling data
collection for 192 strips. The digital data on each ASM
board are transmitted via two “down” data stream
fiber-optic G-Links to the off-detector electronics for
data processing. Clock and control signals are sent to
the ASM board via one “up” G-Link connection.
Upon receipt of a LVL1 Accept, four time samples for
each strip are digitized and read out. The two “down”
data stream G-Links will each run at 40 Mword/s with
16 bit/word.
B. The off-detector electronics
The off-detector electronics, shown in Figure 1,
consists primarily of Sparsifier and Readout Driver
(ROD) modules. The total digitized data collected from
the 64 CSC modules is 295 Gbit/s at a trigger rate of
100 kHz. The Sparsifiers reduce the raw data stream
reaching the Readout Drivers (RODs) by suppressing
strip signals below a threshold cut and by rejecting out-
of-time signals. The four time samples retrieved from
each strip provide pulse shape information which
allows rejection of signals not centered in the timing
window.  The total suppression factor is expected to be
in the range 70  175, based on beam test and Monte
Carlo studies.
Each Sparsifier module contains one SCA
Controller, implemented in a large FPGA, and 10
digital signal processors (DSPs), each of which
sparsifies the data from one ASM board. Thus each of
the 32 Sparsifiers receives data from two CSC
modules. On each Sparsifier there are 20 G-Link
receivers for data from the 10 on-detector ASM boards,





5 DSP’s per chamber (P=precision strips, T=transverse strips)
ROL
VME bus




































1 DSP per chamber + 1 Host DSP
TIM
TIM
Figure 1: Block diagram of the data acquisition system for the Cathode Strip Chambers. The Sparsifier and Readout Driver (ROD)
modules are located off-detector.
signals, generated by the SCA controller, up to the
ASM boards. The sparsified data are assembled for
each CSC, and are sent to the RODs via backplane
connections.
Each ROD module contains eight DSPs which
process data from eight CSC modules, as shown in
Figure 1. An additional host DSP manages the overall
operation of the ROD and provides an interface to the
ROD Crate Controller (RCC). The ROD checks data
integrity, applies calibration constants to the data,
performs further out-of-time rejection, and applies a
neutron rejection algorithm to the data stream. The
processed data is then sent via Readout Links (ROLs)
to Readout Buffers (ROBs) in the ATLAS
Trigger/DAQ System for level 2 trigger processing.
The ROD also checks data for errors, maintains
statistics and fills histograms  for detector monitoring.
C. The VME Crate Configuration
The off-detector electronics is housed in three
VME crates.  Two VME crates are needed for the 32
Sparsifier modules. The VME crate housing the ROD
modules is sandwiched between the two Sparsifier
crates. This configuration is convenient for
implementing the backplane connections between the
Sparsifier and ROD modules. A Timing Interface
Module (TIM) in each VME crate receives and
delivers clock and control signals to and from the
corresponding modules. Each crate contains a VME
bus extension board (MXI) which connects the VME
buses between crates. A single ROD Crate Controller
(RCC) is then sufficient to control and coordinate data
acquisition for the whole CSC system. Alternatively,
each crate could contain an RCC with ethernet
connections between the RCCs.  The RCC is an off-
the-shelf VME single-board computer which, by
communicating with the ATLAS Trigger/DAQ
System, is responsible for starting and stopping data
acquisition and for sampling events from the CSC
detector for monitoring purposes.
III. READOUT DRIVER ARCHITECTURE
A. Features of Sparsifers and RODs
The Sparsifiers perform zero-suppression and
rejection of wrong-time signals. They also control the
SCAs on the on-detector ASM boards and initiate the
digitization of the signals. The RODs, on the other
hand, build the event fragments, apply calibration
constants, and handle the ATLAS-wide timing, trigger,
and control signals. Additionally, the RODs manage
electronics calibration, diagnostics, and detector
monitoring. The ROD and the Sparsifier have in
common the requirement to process large amounts of
data in limited time. Both have to respond to errors in
the data in real time. Consequently, with firmware and
software appropriate to each, the ROD and Sparsifier
can share similar hardware.
The ROD assembles data from several Sparsifiers
into one event fragment. Data from the Sparsifiers are
first buffered in the ROD's input FIFOs. The captured
data stream is interpreted and checked for errors.
Histograms are accumulated for detector monitoring.
Pedestals are subtracted and gain constants are applied.
The data are reformatted and stored in the output
buffer, from where it can be accessed to apply
additional algorithms to reject wrong-time pulses and
neutron hits. Finally, the remaining data are sent out of
the ROD via the Readout Link.
B. The DSP Module
The main data processing and storage element of
the ROD and Sparsifier is the “DSP module”, a small
plug-in board containing a DSP, off-chip memory, and
an FPGA.  We have selected the Texas Instruments
TMS320C6202 DSP, which contains a large on-chip
data RAM for buffering and runs at a clock rate of
250MHz. Its instruction set contains bit manipulation
instructions which are ideally suited to interpreting the
raw data streams. The DMA controllers of the DSP can
move data into or out of data memory with little or no
impact on the CPU performance.
Figure 2 shows a block diagram of the ROD
architecture. Sparsified data from an entire CSC
module is processed in one of eight “decoders”. Each
decoder is implemented as one DSP module. An
additional DSP module, the “host”, manages overall
operation and provides an interface to the ROD Crate
Controller (RCC) via the VME bus. The processed data
is passed onto the Data Exchange, a bus that connects
all DSPs with the Readout Link. Data on this bus flows
only to the Readout Link, not from one DSP to
another. In fact, the decoder DSP modules do not
communicate with each other, but only with the host
DSP. The host DSP executes commands issued by the
RCC, and the decoders execute commands given by
the host DSP.
The FPGA in the DSP module converts the serial
bit stream from the Sparsifier into 32-bit words that are
transferred to the input FIFOs in the DSP's memory.
After receiving the level 1 ID from the host DSP, each
decoder DSP processes the data and stores the
processed data in its output buffer. The host DSP
creates a header and a trailer for the current event and
starts a DMA sequence when all decoders have
finished processing their part of the event fragment.
The DMA sequence transfers the header, the processed
data, and the trailer onto the Data Exchange. A FIFO













normal data flow direction.

















Figure 2: Block diagram of the ROD. Dataflow direction is indicated by the large arrow.
Readout Link. The processing power of the DSPs
makes it possible to perform algorithms to apply
calibration constants and to further reduce the data
volume. Since each decoder processes data from an
entire four-layer CSC, pattern recognition algorithms
may be used to reject isolated neutron hits. In addition,
rejection of wrong-time signals may be improved by
cutting on the smallest drift time of hits associatedwith
a track.
During data taking, the decoder DSPs accumulate
histograms to monitor the CSCs. The host DSP has
access to histograms stored in the decoder DSPs'
memories and makes them available to the RCC upon
request. The decoders also maintain error counts,
which are copied by the host DSP into VME-readable
memory.
The host DSP initiates decoding by sending the
event ID to the decoder DSPs, then checks their
progress and builds a header and a trailer for the event.
When all decoders have finished an event, the host
DSP starts a DMA process during detector calibration,
the host DSP issues commands for the generation of
calibration pulses and trigger signals. Histograms of
detector response can be reduced by the decoder DSPs
or transferred verbatim to the RCC.
The host DSP coordinates the decoding efforts,
provides a command interface to the RCC and
manages the detector configuration. This part of the
prototype software has been benchmark-tested, and
further development efforts are continuing.
We have completed a preliminary layout of the
DSP module plug-in board which satisfies physical
space constraints and signal integrity requirements.
The layout is currently being modified to accommodate
the new Spartan-II FPGA, which will improve data
transfer to the DSP memory by providing more FIFO
buffer space. Additionally, the larger FIFOs make it
possible to use the DSP modules in the Sparsifier.
IV. SPARSIFICATION ALGORITHM STUDIES
We are performing studies of readout of the
ATLAS CSCs. The first goal of the present studies is
to demonstrate that data volume can be reduced by
simple algorithms implemented in the Sparsifier. Data
must be sparsified both in time, suppressing signals
that are not coincident with the trigger, and in space,
suppressing channels with signals below threshold. The
second goal is to demonstrate that large backgrounds
of neutrons can be rejected by pattern recognition
algorithms running in the RODs. Neutrons should be
suppressed before data from muon tracks is transmitted
to the level 2 trigger. We are also studying algorithms
for calibration and detector performance monitoring.
Data from the CSC RODs consist of digitized
signals from clusters of adjacent strips which are
coincident with the trigger and which exceed a
threshold cut. Below-threshold strips adjacent to a strip
passing the threshold cut are also included in the
clusters.  These clusters, typically five strips wide, are
used by the off-line analysis to reconstruct the
positions of incident tracks. Signals from strips
between the clusters are digitized as well, but
suppressed by the Sparsifier logic. This data reduction
is necessary to transmit the meaningful data within the
available bandwidth. Even at an average flux of 1500
Hz/cm2, five times higher than expected, the
probability for a hit cluster per beam crossing in one
layer is only 3/8.
In addition to suppressing channels without hits, the
Sparsifier also suppresses signals which are not in time
with the trigger. Rejection of these wrong-time pulses
is essential for limiting the bandwidth of the data
stream.
The sparsification algorithm operates on the four
consecutive time samples of each channel. These
samples are spaced 50ns apart and called A, B, C and
D. The peak of an in-time signal of average drift time
lies between samples A and C. Triggers can occur in
any beam crossing (25 ns spacing). If the trigger occurs
in phase with the 50 ns sampling clock, then the
sampling is called “even”.  Otherwise, it occurs
between two sampling periods and is called “odd”.
Information about this sampling phase is used by the
Sparsifier to correct for the resulting 25ns signal shift.
The delay between the trigger and the start of sampling
is adjusted until the signal peak occurs at the time of
the B sample for even sampling. For odd sampling, the
peak occurs halfway between B and C. The even and
odd sampling is illustrated in Figure 3.  In order to
reject channels without hits, a threshold is applied to
the biggest sample B. This threshold is adjustable to
optimize selection efficiency and rejection rates.
Figure 3: CSC waveforms for even (top) and odd
(bottom) sampling of beam test data.
For even sampling, wrong-time pulses can be
rejected by requiring a rising slope between samples A
and B and a falling slope between samples B and C.
This requirement (B>A and B>C) results in an
acceptance window that is two beam crossings wide.
This width of 50 ns is somewhat larger than the
maximum drift time of 35 ns, ensuring high selection
efficiency over the entire range of drift times.
For odd sampling, the acceptance window has to be
shifted by 25 ns to compensate for the shift in the data.
The requirement becomes (C>A and B>D) and selects
waveforms which peak between the times of the B and
C samples.
A study of beam test data shows that the selection
algorithm is 98.4% efficient within a 35ns window for
a hit rate of 5000Hz/cm2, five times the expected rate at
|η|=2.7.  The effectiveness of this algorithm results
from the excellent noise performance of the chambers.
Studies of algorithm execution time in the ROD
versus flux have been initiated. These studies are
performed by running candidate algorithms on a DSP
evaluation module using simulated data.  Studies
include algorithms for inter-strip calibration and
pedestal subtraction, and for neutron rejection.
Random hits due to neutrons are expected to be about
as numerous as muon track hits despite the low neutron
sensitivity (<10-4) of the CSCs due to their small gas
volume and the absence of hydrogen in the Ar/CO2/CF4
operating gas mixture.  Rejection of neutron hits by the
ROD would significantly reduce its data output rate
and the data processing needed downstream.
The simulation used to study prototype code
generates muon tracks which cross all four layers of a
CSC module, as well as single-layer neutron hits. The
number of hits and tracks depends on the mean flux
and the muon-to-neutron hit ratio, currently fixed at
1:1. The simulation generates the bit stream that would
be transmitted from the Sparsifier to the ROD for 5000
events. This data is stored in a buffer from which it is
fed at a rate of 320 Mbps via DMA into the input
buffer of the DSP.
The calibration algorithm is written in assembly
language and the neutron rejection algorithm is written
in C.  The calibration routine checks the data in the
input buffer for errors and extracts the ADC values. It
loads the calibration constant and pedestal value for the
current channel from memory and applies them to the
ADC value. The result is reformatted and written as a
16-bit half-word into the output buffer.
The neutron rejection algorithm reads these half-
words from the output buffer and fills a binary map of
the hit channels in each layer. Clusters which are not
spatially associated with clusters in other layers are
assumed to be due to neutrons. These hits are removed
from the output buffer. At 1500 Hz/cm2 the algorithm
retains all of the simulated muon hits and rejects 94%
of the neutron hits.  This reduces the output link
bandwidth by almost 50%. Work is continuing to tune
this algorithm by refining the neutron cluster sizes and
pulse sizes and by including photon backgrounds.
V. CONCLUSIONS
The off-detector electronics of the ATLAS CSC
system is described.  The conceptual design of the CSC
off-detector electronics was approved by the review
committee on August 7, 2000. The DSP module
prototype has been developed. Other design entries are
in development.
REFERENCES:
[1] Atlas Muon Spectrometer Technical Design Report,
CERN/LHCC/97-22.
[2] Performance and Radiation Tolerance of  the
ATLAS CSC On-Chamber Electronics, LEB 2000.
