Real-time data processing by Kopmann, A.
Andreas Kopmann (KIT)




Real-time Data Processing: Overview
    Selected commercial hardware Hardware independant programming (e.g. OpenCL)
    Standardized custom hardware Library of common algorithms
        Programmable hardware                            „Parallel Computing“:
                          





































Task 1: Dedicated Hardware
Actions:
● Survey of available solutions for programmable signal 
processing hardware 
● Adopt existing solutions to the needs of PNI or join a 
running HGF or other development
● Introduce the solutions in PNI 
4
Flexible high-throughput FPGA platform 
- 64-bit Linux drivers 
5
PCIe DMA IP-core




The informaton of commonly available resources is collected on 
the wiki page 
 The sources will not be available on the wiki page, they can be get 
via the contact person 
 The structure of the entries in the databases is: 
o Name of the project 
o Type (IP core, FPGA project, FMC card, …) 
o Descripton 
o Platform (e.g. VERTEX 4) 
o Status 
o Contact person 
Collection of IP-Cores
The informaton of commonly available resources is collected on 
the wiki page 
 The sources will not be available on the wiki page, they can be get 
via the contact person 
 The structure of the entries in the databases is: 
o Name of the project 
o Type (IP core, FPGA project, FMC card, …) 
o Descripton 
o Platform (e.g. VERTEX 4) 
o Status 




● North-West DMA → 20K€ (only 
netlist) and 35k€ (Source code)
● PLDA →  10K€ (for netlist) and 
60k€ for (source code) 
EZDMA/QuickPCIe
● Limited to unique FPGA family 
(e.g. Virtex 6, speed grade -2)
Portfolio detector technology starts 
project database





Application: Grating-based phase-contrast imaging
HZG + KIT
9
Application: Hot-electron bolometer (HEB)
Concept
- Signal splitter → 4 signal
- 4 fast ADC + Pico sec delay
Characterization of 1 branch with 
500MHz Pulses 
10
Application: HEB –  ANKA long time bunch behaviour
Results with YBCO – terahertz detector:
New tool for machine analysis
Contributes to portfolio ARD in program matter + technology
11
Flexible high-throughput FPGA platform
Summary:
● Data rates up to 2GB/sec 
Requirement for the next generation 
● Camera  5GB/sec 
● Hot electron bolometer 3GB/sec
● Involved IP-cores: 
● Serializer/Deserializers 
● DDR Memory
● PCI Express w DMA
● Preliminary version of next generation of DMA IP core ~3GB/sec
● 64bit Linux drivers
Everybody is invited to join the 
Development of the ultrafast DAQ readout system
12
Real-time Data Processing
    Selected commercial hardware Hardware independant programming (e.g. OpenCL)
    Standardized custom hardware Library of common algorithms
        Programmable hardware                            „Parallel Computing“:
                          





































Task 2: Real-time Data Assessment 
Actions:
● Computation system based on GPU co-processors
● Online tomographic reconstruction
● Prototype adoption of a complete crystallography 
data flow using the HDRI standard format
● General environment for parallel image processing 
a) Independent from available hardware (e.g. OpenCL)
b) Library of standard algorithms
c) Easy adaption to new problems
● Further applications
14
GPUs for Ultrasound computer tomography
15
GPUs for Ultrasound computer tomography
16
GPUs for Ultrasound computer tomography
17
Scaling performance with multiple GPUs
18
8-core GPU box: Optimization of air flow
19
Scaling performance with multiple GPUs
Hardware selection is crucial for high-performance GPUs
More on computing hardware → Talk: S Chilingaryan 
20
Computing and power efficiency
21
Computing and power efficiency
Energy footprint of tomographic reconstruction, PyHST:
CPU: 22Wh, 455s 
GPU:   5Wh,   58s
A. Anzt et at., ENA-HPC,  12-14.9.2012
Cooperation w Exa2Green project
22
Parallel Computing Framework
Core functions, hardware access
(data transfer, file storage, camera buffering, ROI, …)
Core functions, hardware access
(data transfer, file storage, camera buffering, ROI, …)
Primitives for image processing
(2d FFT, filters, Radon transform, image conversion, …)
Pri itives for i age processing
(2d FFT, filters, Radon transform, image conversion, …)
Load balancing + management
(CPU  GPU  frame grabber / Single  double precision)
Load balancing + anage ent
(CPU  GPU  frame grabber / Single  double precision)
Applications
(E.g. tomographic reconstruction, ...)
Applications
(E.g. tomographic reconstruction, ...)
23
Framework: Basics
Image processing = 
composition of fi lter nodes 
● One thread per node
● Mapping to CPU or GPU
● Encourages recycling of  
tested components
Core system is built on top of 
OpenCL and GLib/GObject 
















→ Talk: M. Vogelgesang
25
Framework: Algorithms
● Filtered back projection
● Precise calculation of the center of rotation
● Laminography
● De-noising filters
● Simulation of test data
Under development:
● Algebraic reconstruction (better results)
● Precise forward transform model and compressive sampling 
● DFI-based reconstruction (faster processing)
→ Talk: Xiaoli
26
Application: ROFEX – Rossendorf Fast 
Electron beam X-ray tomograph 
DAQ frame rate: < 10.000 fps




● Result: 256 x 256
● 600 frames → 0.35 sec
● Frame-rate: 1.7 kfps (1 GPU)
Further task “Post processing“:
● Bubble detection
● Velocity measurementU. Hampel, et al., Flow Meas. Instrument. 16, pp. 85-90, 2005





● Generalized access to streaming cameras (C-API)
● 64-bit linux support for PCO cameras








● Linux drivers for 
● High-throughput DAQ platform (KIT)
●  for PCO cameras (libuca by KIT)
● Discussion with LImA developers (Soleil)
● Development + Optimization of GPU platforms
● Cooling, power consumption
● Parallel processing framework for streamed data (KIT, HZG, Soleil)
● Simplifies GPU programming
● Possible standard interfaces 
for GPU algorithms ?





- Matter and Technology 
   ARD, Detectors, LSDMA
