An ATCA-based High Performance Compute Node for Trigger and Data Acquisition in Large Experiments  by Xu, H. et al.
 Physics Procedia  37 ( 2012 )  1849 – 1854 
1875-3892 © 2012 Published by Elsevier B.V. Selection and/or peer review under responsibility of the organizing committee for TIPP 11. 
doi: 10.1016/j.phpro.2012.02.509 
TIPP 2011 - Technology and Instrumentation in Particle Physics 2011 
An ATCA-based High Performance Compute Node for 
Trigger and Data Acquisition in Large Experiments 
H.Xua*, Z-A.Liua, Q.Wanga, J.Zhaoa, D.Jina, W.Kuhnb, S.Langb, M.Liub
aInstitute of High Energy Physics, Chinese Academy of Sciences, 100049, Beijing, China 
b II.Physikalisches Institut, Justus-Liebig-Universitaet, 35392, Giessen, Germany 
Abstract 
Many next-generation particle and nuclear physics experiments will be built to run at high luminosity to collect large 
amounts of data and are integrated with millions electronics channels. Trigger and data acquisition systems with high 
performance and  high bandwidth are necessary. An Advanced Telecom Computing Architecture (ATCA) compliant 
Filed Programming Gate Array (FPGA) based compute node is developed for this purpose. This compute node has 
powerful compute FPGA chips and large memory for data buffering. Event selection based on real time feature 
extraction, filtering and high level correlations can be implemented on it. The latest prototype is being used for 
developing trigger and data acquisition systems for the PANDA and Belle II experiments.  
© 2011 Elsevier BV. Selection and/or peer-review under responsibility of the organizing committee for 
TIPP 2011. 
Keywords: compute node, DAQ,  ATCA,  Trigger, PANDA, Belle II 
1. Introduction 
Next-generation experiments in hadron and collider physics will be built to run at high luminosity to 
collect unprecedented amounts of data to reach the physics goals. For an experiment such as the PANDA 
experiment [1], the trigger and data acquisition system needs to handle interaction rates of the order of 
107/s and data rates of 200 GB/s [2]. Real-time event selection with high performance is imperative to 
retain the interesting events while rejecting the huge amount of background. Thus, a trigger and data 
acquisition system with high bandwidth and high performance is necessary for the future experiments. In 
this paper, we introduce an Advanced Telecom Computing Architecture (ATCA) based high performance 
E-mail address: xuhao@ihep.ac.cn. 
Available online at www.sciencedirect.com
© 2012 Published by Elsevier B.V. Selection and/or peer revi w under responsibili y of the orga izing committee for 
TIP  11. Open access under CC BY-NC-ND license.
Open access under CC BY-NC-ND license.
1850   H. Xu et al. /  Physics Procedia  37 ( 2012 )  1849 – 1854 
compute node (CN), which is a generic concept that significantly advances the state-of-the-art for trigger 
and data acquisition (TDAQ) system in large experiments. 
2. Hardware Implementation and performance test 
2.1. ATCA features 
ATCA is a modular electronics platform in industry. It is developed by the PCI Industrial Computer 
Manufacturers Group (PICMG). The specification is described in PICMG 3.0 document [3]. The most 
prominent features are listed here: 
x High availability up to 99.999 percent. 
x A high throughput capacity (up to 2.5 Terabits/s). 
x Highly scalable, switched fabric architecture, full mesh backplane providing 2.5 Gb/s serial 
connections across modules (Fig.1).   
x 14 module slots in one shelf, each slot supporting up to 200W of power. 
x Redundant -48V power supplies and fans.  
x The shelf managers, controllers, application modules, power supplies and fans are all hot-swappable. 
                     
Fig.1. Full-mesh network topology of compute nodes over full mesh backplane (only 8 nodes shown). 
2.2. High Performance Compute Node 
A compute node has been designed which can be a generic solution for trigger and data acquisition 
design for various HEP experiments. This CN is built with flexible connectivity to the front-end 
electronics employing optical links, as well as to conventional PC farms via Gigabit Ethernet. Inter-board 
connectivity within one shelf is provided by full mesh backplane. Each CN module is equipped with 5 
Virtex4 FX60 FPGA chips, four chips for trigger algorithms and one chip for data switching. The main 
features of CN are listed as following: 
x 5 Virtex4 FX60 FPGA chips, 2 Embedded PowerPC in each FPGA 
x 10 GB DDR2 RAM (2GB per FPGA) and 512Mb FLASH memory for each FPGA. 
x 8 optical fiber channels, the bandwidth of each channel is up to 6.5Gbps. 
x GBit Ethernet for each FPGA 
Fig.2 shows the block diagram and the prototype of CN [4]. FPGA0 serves for data switching of high 
speed serial connections to the full mesh backplane. It also has two Gbit Ethernet ports for data output to 
the PC farm, one is connected to the backplane and the other is provided on the front panel. FPGA1-4 is 
for data input and trigger algorithms. Each of the four algorithm FPGAs features two optical links for data 
input and one Gbit Ethernet for data output.  Parallel data buses and high speed serial connections are 
 H. Xu et al. /  Physics Procedia  37 ( 2012 )  1849 – 1854 1851
provided between the algorithm FPGAs. A parallel data bus is provided between algorithm FPGAs and 
switching FPGA. Each of the five FPGA has 2GB DDR2 memory for data buffering and one 512Mb 
Flash memory for FPGA stream bit and operation system kernel storage. A customized intelligent 
platform management controller (IPMC) fulfills the TDAQ system requirements on power negotiation, 
voltage monitoring, temperature sensor, and FPGA configuration check. IPMC talks to the ATCA shelf 
manager via two I2C buses [5]. 
IPMC
Gbit Ethernet (x5)
Optical link(x8)
Ethernet PHY(x5)
FLASH(x10) 
Algorithm
FPGA (x4)
Switch
FPGA
Power
Supply
Full mesh
Neighbor link
UART 
Power
SupplyJTAG 
Ver.2.0
Fig.2. a) Block diagram view of the CN.  b) Photo of prototype. Gbit Ethernet, optical transceivers and UART connectors are assembly 
on the front panel. DDR2 memory slots are placed on the backside. 
2.3. Performance Test 
A small series of the boards has been produced for performance tests and firmware development since 
last year. Some tests including FLASH memory access, optical link stability, DDR2 SDRAM throughput 
and Ethernet function were made. Here the test results of optical links and the DDR2 SDRAM are listed.  
2.3.1. Optical link test 
Pseudo-random data were transmitted between two CN modules to test the stability of the optical link. 
At the transmitter side, a pseudo-random data generator was implemented in FPGA to generate 16 bits 
1852   H. Xu et al. /  Physics Procedia  37 ( 2012 )  1849 – 1854 
parallel data in pseudo-random bit sequence. The data were serialized into one data stream and sent out at 
6.25Gpbs. At the receiver side, an equivalent pseudo-random data generator was generating the same 
pseudo-random bit sequence as the transmitter in the FPGA. The serial data stream were deserialized to 
parallel and compared with the data generated by the pseudo-random generator using last received data as 
the seed. An un-match result would be accumulated in a register and could be readout when the test 
finished. Fig. 3 shows the eye-diagram of the optical signal. Test has been done for one day and no error 
bit was recorded. This indicates that the design can provide reliable data transmission for the trigger and 
DAQ system. 
Fig.3. Eye-diagram of the optical link signal. The wide eye opening shows good signal integrity of the received data. 
2.3.2. The Throughput test of the DDR2 memory 
The write and read speed is the bottle neck for high bandwidth data processing. The Virtex 4 Multi-
Port Memory Controller (MPMC) [6] is directly interfaced with the DDR2 SDRAM to improve the 
memory access capability. The throughput of the memory with MPMC has been tested. First, the test data 
were generated in the FPGA. Second, the test data were written to the memory with the control of MPMC. 
Then, the test data were read out from memory and transmitted to PC. Counting the clock cycle for 
writing and reading, we calculated the write and read speed of the operation. As shown in Fig. 4, the 
throughput can reach to about 550MByte/s with a 128 bits data bus between MPMC and DDR2 memory. 
Fig.4. Throughput of DDR2 SDRAM test. The write and read speed can reach to about 550MBytes/s with 128bits data bus (red 
cycle), 470MBytes/s with 64 bits data bus (blue box).
3. Applications in Large Experiments 
The first application of the high performance compute node is the trigger and Data acquisition system 
of the PANDA experiment at FAIR. The system does not employ fixed hardware based triggers but 
 H. Xu et al. /  Physics Procedia  37 ( 2012 )  1849 – 1854 1853
features a continuously sampling system where the various subsystems are synchronized with a precision 
time stamp distribution system [7]. Optical links are provided to receive data from front-end electronics, 
and full mesh backplane are used to transmit data across different CN boards. Event building, feature 
extraction and high level trigger will be processed on the processor FPGA [8]. Fig.5 shows the prototype 
test setup for the electromagnetic calorimeter (EMC). The test data generated on one data source PC were 
written to an interface card. Then the data were transmitted to CN over two optical links. Channel 
combining, cluster finding and other algorithms were implemented in FPGA on the CN. Finally, the 
processed data were sent to a PC farm via Gbit Ethernet for analysis.  
MGT
Transceiver
Physical Layer
Frame
Check
FIFO
Channel
Combiner
FIFO
PLB
MasterIF
MPMC DDR2
Link Layer Network Layer
MGT
Transceiver
Frame
Check
FIFO
NFS
(data source)
Com pute NodeInterface card
Opt ical links
Ethernet
Fig.5. The prototype test setup for EMC. Top: the prototype set up. Bottom: data flow in processing FPGA. The input data from 
different optical links was combined and written to DDR2 with MPMC. 
The Pixel Detector (PXD), based on the depleted field effect transistor (DEPFET) technology [9], is a 
vertex detector for the BELLEII experiment. The estimated total data rate is up to 58GBit/s for the 
complete PXD. The hardware platform for the PXD data reduction is the CN. The structure of the PXD 
readout is shown in Fig. 6 [10]. The drain current digitizers (DCD) performs an immediate digitization of 
the current difference switched from pixel row and sends the data serially to the data handling processor 
(DHP), which buffers and analyzes the digital data stream and performs a zero suppression. The 
remaining data are then sent to the off-module data handling hybrid (DHH), where the data stream is fed 
via optical fiber to CNs. The track finding and reconstruction will be executed on CNs by using hit 
signals of the PXD and other detectors. The CNs send out only the PXD hits associated with tracks to the 
event builder over the GBit Ethernet. 
REPEATER
Compute
Nodes
Flex Capton
Cable
PXD
DCD
DHP
DHH
DATA OUT
PWR Supply
Fig.6. Architecture of PXD readout. Detector signals will be routed via shielded twisted pair cables to the DHH, then sent to  CN via 
optical fibers.  
1854   H. Xu et al. /  Physics Procedia  37 ( 2012 )  1849 – 1854 
We also propose to use CN as a data concentrator in the super silicon vertex detector (SuperSVD) for 
the BELLE II experiment [10]. A huge amount of data are read out by the pixel detector (PXD). It is 
planned to use the SuperSVD data (which, in contrast to the PXD, has a very fine time granularity but less 
position sensitivity) to discard the majority of off-time hits in the PXD. Thus, a data concentrator is 
intended to fulfill this propose. Fig. 7 shows a possible way of approaching this application. After 
processing and sparsification, the SuperSVD data will be sent via optical links to the common BELLE II 
data acquisition system and data concentrator. In data concentrator, the data will be combined and 
transmitted to PXD online data reduction. 
BELLE II
DAQ
Switch
Data
Concentrator
PXD
Data 
Reduction
DHH
SVD/Copper
SVD data
SVD data
Fig.7. A data concentrator in SuperSVD data acquisition. Data concentrator collect the SVD data then send to PXD data reduction.
4. Conclusion 
A generic proposal that significantly advances the state-of-the-art for trigger and data acquisition 
system for future high energy physics experiments using ATCA technology is presented. It has been used 
in the development of some large particle experiments, such as PANDA and BELLE II experiment. Due 
to the modular and scalable approach, it is also suitable for other applications inside and outside high 
energy physics field. 
References 
[1] PANDA Letter of Intent, PANDA Technical Design Report, <http://www-panda.gsi.de> 
[2] W. Kuehn, FPGA-based compute nodes for the PANDA experiment at FAIR, Proceedings of the IEEE NPSS 14th Real Time 
Conference, Batavia 
[3] <http://www.picmg.org> 
[4] H.Xu, et al., WS-3 paper, Proceedings of the IEEE NPSS 15th Real Time Conference, Beijing 
[5] J. Lang, et al., TCA-5 paper, Proceedings of the IEEE NPSS 15th Real Time Conference, Beijing 
[6] <http://www.xilinx.com> 
[7] W. Kuehn, N57-1, Proceedings of the 2008 IEEE NSS Conference, Dresden 
[8] H.Xu, et al., Introduction to PANDA Data Acquisition System, Proceedings of the TIPP11 Conference, Chicago 
[9] < http://twiki.hll.mpg.de/twiki/bin/view/DEPFET/WebHome> 
[10] Belle II Collaboration, BELLE II Technical Design Report, http://belle2.kek.jp
