Abstract-The Liquid Argon calorimeters play a central role in the ATLAS experiment. The environment at the LHC collider imposes challenging tasks to their read-out system. To achieve measurements of particles and trigger signals at high precision, the detector signals are processed at various stages before reaching the Data Acquisition system (DAQ). Signals from the calorimeter cells are received by front-end boards, which digitize and sample the incoming pulse. Read-out Driver (ROD) boards further process the data at a trigger rate of up to 75 kHz. An optimal filtering procedure is applied to optimize the signal-to-noise ratio. The ROD boards calculate precise energy, time and quality of the detector pulse, which are then sent to the DAQ. In addition, the RODs perform a monitoring of the data. The architecture of the ATLAS Liquid Argon detector read-out is discussed, in particular the design and functionality of the ROD board. Performance results obtained with ROD prototypes as well as experience from complete test setups with final production boards are reported.
I. INTRODUCTION
T HE ATLAS [1] [2] experiment is a general purpose detector designed to exploit the full physics potential of the Large Hadron Collider (LHC) at CERN (Geneva, Switzerland). When completed, the detector will measure proton-proton collisions at a centre-of-mass energy of 14 TeV, as well as heavy ion collisions. Central components of the ATLAS detector are the Liquid Argon (LAr) detectors, which consist of four sub-detectors [3] : the electro-magnetic barrel (EMB), the electro-magnetic end-caps (EMEC), the forward calorimeters (FCAL) and the hadronic end-caps (HEC). In total, about 196000 calorimeter cells are to be read out. A high signal frequency of 40 MHz, the large dynamic range of energies between 50 MeV and 3 TeV, as well as a very good energy resolution are some of the main challenges for the LAr calorimeter read-out electronics [3] - [6] .
II. READ-OUT ARCHITECTURE
When a charged particle traverses the liquid Argon of the calorimeters an ionization current is measured on the copper electrodes of the read-out cells. The pulse height is proportional to the energy deposit of the particle and it is a linearly decreasing function of time. The read-out processes the detected signal at different stages, as shown in Fig. 1 . The analog signal is received by the Front-End Boards (FEBs) which are mounted directly on the detector in a radiation environment. Each of the 1536 FEBs sends the digitized pulse via an optical link to the Read-Out Drivers [7] , which are installed in a radiation-free room next to the detector cavern. Each Read-Out Driver (ROD) is connected to 8 FEBs and processes the signals of 1024 detector cells. Transition Modules (TMs) send the data through another optical link to the PC-based DAQ. Here, the data fragments are collected, the event-building is performed, and the data are finally recorded.
III. FRONT-END ELECTRONICS
The FEBs contain the electronics for the first step of the signal processing. Each FEB receives the signals of 128 LAr detector cells. The triangular-shaped pulse of the ionization current is first amplified by a preamplifier array. Only in the HEC the preamplifiers are installed directly on the detector and not on the FEBs. The preamplifier output is connected to shaper chips with a filter. The shaper amplifies the pulse, splits it into three gain scales in the ratio and applies a bi-polar shaping function to each scale. The signal is then sampled in Switched Capacitor Array (SCA) chips at 40 MHz, which store the samples in analog form during the trigger-level-1 latency of 0018-9499/$20.00 © 2006 IEEE Fig. 1 . Layout of the LAr calorimeter read-out. The signal pulses measured in the electro-magnetic barrel (EMB), the electro-magnetic end-caps (EMEC), the forward calorimeters (FCAL), and the hadronic end-caps (HEC) are sent by the Front-End Boards (FEB) and Read-Out Drivers (ROD) via optical links to the Data Acquisition system (DAQ). up to 2.5 s. Fig. 2 shows the resulting pulse shape after shaping and sampling.
When the trigger-level-1 decision arrives, the optimal gain scale is selected on event-by-event basis. The signal is then digitized in a 12-bit Analog-to-Digital Converter (ADC), which, together with the gain selection procedure, fulfills the required 16-bit dynamic range to cover the whole energy interval. The FEB data are finally sent via a 1.6 Gbit/s optical link to the RODs.
The time constant of the FEB shapers are chosen to minimize the overall noise level. The two main contributions are electronic noise, that is decreasing with an increasing time constant, and the so-called pile-up noise, which is increasing, as can Fig. 3 . Noise level as function of peak shaping time t . The electronic noise is decreasing with larger t , while the pile-up noise is increasing. The optimal shaping time is varying with the LHC luminosity, L.
be seen in Fig. 3 . The latter is noise due to overlapping signals from background events. Since the duration of a shaped pulse is about 300-600 ns, there are up to 24 overlapping (pile-up) events, taking into account the bunch crossing rate of LHC of 40 MHz.
After careful optimisation, a shaping time constant of 13 ns has been chosen, which leads to a peaking time between 30 and 45 ns in the different sections of the calorimeters [3] . This yields an optimum response when sampling at the peak for the nominal high luminosity of cm s . At lower luminosities, down to cm s , a digital filtering performed by the ROD boards allows keeping an optimum performance.
IV. READ-OUT DRIVER SYSTEM

A. System Aspects
Like the FEBs, the Read-Out Driver boards are uniformly designed and the same RODs are used for all LAr detectors. Each of the 192 RODs receives the digitized raw data from 8 FEBs, which correspond to 1024 detector cells.
A modular design is chosen for the back-end system. Each ROD mother-board is equipped with 4 Processing Units (PUs) on separate daughter-boards, each housing 2 Digital Signal Processors (DSPs). Highly performant DSPs, which are described in more detail below, are chosen to achieve a processing time per event smaller than 10 s, required by the level-1 trigger rate of 75 KHz.
In order to decouple the ROD system from the DAQ, a Transition Module (TM) is handling the data transfer to the DAQ system. The TMs are installed in the back of the ROD VME crates. The VME bus and a custom designed P3 backplane transfer the data between the two modules.
The amount of data sent by the ROD to the DAQ is about four 16-bit words per detector cell at a level-1 trigger rate of 75 KHz. This corresponds to a total data rate of about 100 GB per second for the whole calorimeter read-out system.
B. ROD Functionality
During a physics run the main task of the ROD is the digital filtering of the cell pulse and the calculation of energy, time and quality of the signal. If the buffers of the read-out chain are not emptied on time, the ROD sends a Busy signal to the trigger system, so that no new triggers are launched. In parallel, the ROD is monitoring the data in selected read-out channels and the resulting histograms are read by a PC-based online monitoring system.
When performing detector calibration runs, signals of known intensity are created at the LAr detector cells by a dedicated pulser system. During these runs, the RODs compute the averaged first and second moments over all signal samples and over many events, typically a hundred, as well as the average of the cross-products between samples. The latter are used to compute the noise autocorrelation function.
C. Optimal Filtering
In physics mode, a fixed number of samples, usually five, are received per signal pulse. The calculation of energy and time of the signal is performed using an Optimal Filtering algorithm [8] . This allows a computationally efficient calculation of these quantities. Energy, , and time, , of the pulse are determined from a weighted sum of the samples,
where is the number of samples and ped the pedestal value of the corresponding read-out channel. The Optimal Filtering weights, and , are determined in an optimization procedure that reduces the noise contribution to a minimum. In this way, non-optimal noise reduction on the detector and in the FEBs, e.g. a non-optimal shaping time, can be corrected and optimized especially for different levels of pile-up noise.
The normalized pulse shape without noise, , and its derivative, , at the sampling time steps, , can be calculated from measured calibration pulse and the read-out design parameters. The following constraints on the Optimal Filtering weights can be derived [8] :
To completely determine and , also the noise autocorrelation function is used and assumed to be known precisely. A Lagrangian multiplier method is applied to take all constraints in the minimization procedure into account.
Since is inversely proportional to , it makes sense to calculate only for channels above a given energy threshold. This also reduces computing time because a division operation is necessary to calculate , which, on the DSP chosen for the ROD, requires more computing cycles than a multiplication.
Finally, a quality parameter is calculated for the cell signal. A simplified parameter, ignoring correlations between the sampling points, compares the measured pulse shape to the ideal pulse (5)
D. Rod Module
The ROD module is a 9U VME64x board installed in a 21 slot VME crate. A final ROD is shown in Fig. 4 and its functional layout in Fig. 5 .
Serial data (16 bits at 80 MHz) from the FEBs are received by the ROD mother-board through eight optical receivers and are de-serialized by G-link chips [9] . Since the bit error rate is expected to increase with increasing chip temperature, a radiator plate is mounted on the G-links, which is water-cooled.
Four Field-Programmable Gate Array (FPGA) chips, called staging-FPGAs [10] , route the data from the G-link chips to the PU boards. They also monitor the G-link temperature. Two DSPs are mounted on each PU to perform the Optimal Filtering calculations, as shown in Figs. 6 and 7.
The DSP chosen for the PUs is the 720 MHz TMS320C6414GLZ DSP from Texas Instruments [11] . The Central Processing Unit (CPU) of the DSP executes up to eight 32-bit instructions per cycle and has 64 32-bit general-purpose registers. The eight functional units contain two multipliers and six Arithmetic-Logical Units (ALUs), which all work in parallel. Only fixed-point calculation is currently used for performance reasons, although the DSP is able to perform also floating-point operations.
The DSP has a 64-bit input bus connected to the input-FPGA [12] and a 16-bit output bus. Through the so-called Host Port Interface (HPI) and via the output FPGA the memory zones of the DSP can be accessed. This is used to read status flags, counters and histograms. Trigger information, DSP commands and interrupt vectors are transferred through Multi-Channel Buffer Serial Ports (McBSP). The DSP has a 1 MB memory, however relatively small cache memories, a program cache of 16 kB and a data cache of 16 kB. The latter has to permanently keep the Optimal Filtering coefficients, which requires careful programming of the DSP.
The input-FPGA of the PU converts the serial data to parallel, checks for data transmission errors and re-arranges the data to optimize the input to the DSP. The output FPGA [12] is used to interface the DSP to the Trigger, Timing and Control (TTC) FPGA [10] , that receives information about trigger type, event number and trigger number, all sent by the ATLAS central trigger processor. The output FPGA serves also as interface to the VME-FPGA [10] of the mother-board. It boots the DSP and configures the input-FPGA. Monitoring histograms prepared in the DSP are as well accessible through the output FPGA.
The output data of the DSP calculations are stored in two FIFOs in the PU. The four Output Controller FPGAs [10] on the ROD mother-board get the data from the FIFOs and send them to the Synchronous Dynamic Random Access Memory (SDRAM) for monitoring purposes and to the serializer chips [13] for the DAQ system. The latter serialize and send the data in LVDS signals at 280 MHz to the TM. The VME-FPGA interfaces the ROD with the VME controller and deals with Busy and Interrupt signals.
The input-and output-FPGAs on the ROD mother-board are inter-connected by 32-bit buses, which allows also a data transfer through only 2 PUs instead of 4. This so-called staging-mode requires a special data treatment in the PU and DSP because data from 4 FEBs are processed. In this condition the data transfer rate is reduced by half to 50 kHz. Originally foreseen as a contingency measure, this will allow an adaptation of the ROD system to a reduced number of input optical links to the DAQ system in the start-up phase of ATLAS data-taking.
V. ROD PERFORMANCE AND SYSTEM TESTS Several prototypes of the different read-out modules were built to test as many functionalities of the calorimeter read-out as possible before the start of the final module production. A main point is the computing performance of the DSP, which is the central component of the ROD. The processing time per event increases with the number of channels that are monitored in parallel. If 30% of the channels are above a given energy threshold and fed into the monitoring histograms, the processing time per event is 5.5 s. This value increases to 7.8 s when 80% of the channels are histogrammed. This shows that the timing stays well below the required 10 s. The test was performed on a prototype board using a DSP code written in Assembler language. This allows an explicit optimization of the DSP calculations, exploiting the eight parallel functional units of the CPU. The latest software package that included all DSP tasks, like calibration and interrupt handling, is based on the C language. A C implementation of the Optimal Filtering calculation showed however a reduced performance, even after automatic optimization by the C compiler, so that the core calculation needs to be re-integrated into the latest framework.
A challenging and very complete test of the LAr calorimeter read-out was performed at the ATLAS combined test-beam in 2004. A complete slice of the ATLAS detector combining several sub-detectors was operated. A module of the LAr barrel calorimeter was read out with prototype FEBs and two RODs. The read-out was operated successfully both at normal datataking and during detector calibration. An example of the output data taken with the complete read-out chain is shown in Fig. 8 .
More complex tests are currently being performed with final production ROD boards. A dedicated test-stand is set up, which is shown in Fig. 9 . Injector boards are used to send data through optical links into the RODs. This allows the injection of special test events to verify a proper data transfer.
The data are received by many ROD boards simultanously, processed and sent via the SDRAM of the mother-board and the VME bus to a read-out PC. This is not the default transfer mechanism, but a useful extension to the read-out, which can then also be operated without being connected to the usual DAQ system.
The bandwidth of the VME-based read-out is, however, limited and an event rate in the order of 10-100 Hz is achieved. The system tests show that data are processed properly and the read-out is synchronized with the trigger system.
VI. PRODUCTION AND INSTALLATION ASPECTS
The series production of the LAr read-out components is currently ongoing and will be completed in July 2005. Each board is tested individually and the production quality is controlled. The ROD and PU boards are tested, for example, in a similar test set-up as the one shown in Fig. 9 . All test results are stored in a production database to be able to trace back possible future failures of the components. As regards the back-end electronics, 250 ROD and TM boards, and 932 PUs are being produced.
The installation and commissioning of the first part of the back-end system in the ATLAS underground area is planned to finish in August 2005. It is foreseen to read out the LAr barrel calorimeter. Using this system, the front-end electronics and the detector will be commissioned. The remaining read-out will then be installed shortly after.
VII. CONCLUSION AND OUTLOOK
The read-out system of the ATLAS Liquid Argon calorimeters consists of a large number of read-out channels operated at high speed with a large dynamic range. Many dedicated tests as well as the operation of the LAr read-out at the ATLAS combined test-beam have shown that the system requirements can be fulfilled.
The production of the read-out components will be completed by mid 2005 and the installation of the system in the ATLAS underground area is in preparation.
