Ultra-fast streaming camera platform for scientific applications by Caselle, M. et al.
The main features, implemented and tested, include: 
Fully configurable camera  to adapt the pixel response at any experiment condition.  
Full streaming data acquisition architecture  continuous data acquisition without dead readout time.   
On-line image-based self-event trigger (Fast reject) and Region-of-Interest readout architecture  
intelligent selection of the Region-of-Interest of the frame for reducing the amount of output data and, 
significantly increase  the camera frame-rate.   
Easily extendable to most available CMOS-image sensor 
Readout chain performance  
(PC memory system – DMA – DDR3 device) 
D
M
A
 d
ri
ve
r 
 
User bank 
register 
FPGA temp. & 
voltage control 
C
u
st
o
m
 P
C
Ie
 –
 D
M
A
 
In
te
rf
ac
e 
PCI Express – DMA   
O
p
ti
ca
l/
El
ec
tr
ic
al
 
X
4
 la
n
es
 @
 5
G
b
/s
 
4 
4 
D
at
a 
d
ec
o
d
in
g 
an
d
 
co
n
si
st
en
cy
 c
h
ec
k 
G
U
I 
G
TX
 T
ra
n
sc
ei
ve
rs
 
X
4
 @
 5
G
b
/s
 X
ili
n
x 
 In
te
gr
at
ed
 
b
lo
ck
 f
o
r 
P
C
I E
xp
re
ss
 G
EN
 2
 
DMA 
Engines 
North-West 
IPCore 
Data in [0..63] 
WR_EN 
Clock_in 
Software layers User applications 
WR - Control 
packet FSMs 
Back-pressure 
IN
 p
o
rt
 
Data out [0..63] 
Data valid 
Clock_out 
FIFO 
RD - Control 
packet FSMs 
Busy_logic 
O
U
T 
p
o
rt
 I/O interface logic 
FIFO 
I/O interface logic 
Three Intellectual Property (IP) logic cores have been developed 
A PCI Express with theoretical bandwidth of 20Gb/s combined with 
a Bus Master DMA architecture to benefit fully from the PCI 
Express link.  
A DDR3 memory interface operating at 800MHz. The architecture 
is able to work in both half and full-duplex modes with a bandwidth 
of 51Gb/s and of 25 Gb/s, respectively. 
A SerDes input stage operating up to 800MHz with a 
reconfigurable parallel data width up to 16bit.  
0
2
4
6
8
10
12
14
64 128 256 512 1024 2048 4096 8192 16384 32768
P
ay
lo
ad
 d
at
a 
th
ro
u
gh
p
u
t 
 
m
e
an
 v
al
u
e
 (
G
b
p
s)
 
Packet size (Byte) 
TX (PC ->DMA ->DDR3)
RX (DDR3 -> DMA ->  PC)
A bandwidth of 16 Gb/s in both directions is achieved with the GPU-
server, only limited by the current FPGA speed-grade.  
Simultaneous  
write and read  
operations 
2.      First UFO camera prototype 
IODELAY 
Alignment 
FSM  
SerDes input stage 
C
u
st
o
m
  S
er
D
es
 
lo
gi
c 
I/O clock 
Buffer  
IBUF 
LVDS data [n] + 
- 
+ 
- 
LVDS data clock 
IBUF 
I/O 
Buff 
Parallel data  
output 
Bit-slip 
Data lock 
Regional clock 
buffer 
Clock  
division 
Clock to Data 
Time  
tuning  
Training 
pattern 
Parallel data  
width  
FPGA user defined 
Data CLOCK 
80 ps step 
Performance 
achieved with a 
standard PC 
PC camera control 
CPU 
memory 
memory 
memory 
C
h
ip
se
t 
ro
o
t 
p
o
rt
 
P
C
I E
xp
re
ss
 +
 D
M
A
  
(K
IT
_I
P
C
o
re
) 
User bank 
register 
FPGA 
control 
FSM Master 
control 
FIFO 
FIFO 
DDR3 Memory 
(800MHz @ 64bit) 
CMOS 
image-
sensor 
On-line parallel 
data processing 
CMOS 
Control 
Interleaving 
FSM 
FPGA internal architecture 
Se
rD
es
  
 in
p
u
t 
st
ag
e 
 
(K
IT
_I
P
C
o
re
) 
16x lanes @ 
480Mbit/s 
Frame req. 
Ext. exposure time 
SPI chip 
configuration 
O
p
ti
ca
l/
El
ec
tr
ic
al
 
X
4
 la
n
es
 @
 5
G
b
/s
 
P
C
I E
xp
re
ss
 c
ab
le
 
DDR3 interface 
(KIT_IPCore) 
... 
4 
4 
Lentgh ~ 30m 
FIFO 
PCI Express  
connection 
Mother readout 
and data 
processing card 
Virtex 6 FPGA  
DDR3  
memory 
Cooling system 
based on a  
Peltier cell 
Daughter card 
CMOSIS sensor & 
optical lens 
Heat sink + fan 
DDR3 memory device interface 
P
H
Y-
 D
D
R
 X
ili
n
x 
IP
C
o
re
  
Data_in [0..N] 
WR_EN 
Clock_in 
Data_out [0..M] 
Data_valid 
Clock_out 
DDR Busy Arbiter  
FSM 
RD 
FIFO 
WR 
FIFO 
WR DDR  
FSM 
RD DDR  
FSM 
D
D
R
3
 M
em
o
ry
 
(8
0
0
M
H
z 
@
 6
4
b
it
) 
Fr
e
q
u
en
cy
 
d
o
m
ai
n
 c
h
an
ge
 
Loop back 
block address _rd pointer 
block address _wr pointer 
Data frame 
segment 
On-line data  
process 
segment 
256bit @200 MHz 
256bit @200 MHz 
Clock_usr 
Read_en 
O
U
T 
p
o
rt
 
IN
 p
o
rt
 
UFO – Ultra Fast Streaming Camera Platform for Scientific Applications 
M. Caselle, S. Chilingaryan, A. Herth, A. Kopmann, U. Stevanovic, M. Vogelgesang,  M. Balzer, M. Weber 
IPE  (Institute for Data Processing and Electronics) - author email: michele.caselle@kit.edu 
Abstract: Synchrotron-based X-ray computed tomography (CT) is a method for non-destructive investigation of materials. A prototype of a high data throughput visible light camera based on 
commercial CMOS sensor with embedded processing implemented in the FPGA is developed. The camera has achieved a frame rate of 340 frames/s with 2.2 Mpixel @ 10 bits and a data rate up 
to 1 GB/s. A novel architecture for a self-event trigger signal has been implemented to increase the original frame rate to several kilohertz and to reduce the transmitted data volume. Applications 
from life and materials science underline the high potential of this high-speed camera in hard X-ray micro-imaging. 
IEEE - 18th REAL-TIME 2012 conference, June 11-15, Lawrence Berkeley National Laboratory   
3.     Self-event trigger (fast reject) and an intelligent 
Region-Of-Interest readout 
Ev
en
t-
tr
ig
ge
r 
FS
M
s 
User bank 
register 
FIFO 
CMOS 
image-
sensor 
Interleaving 
FSM 
FPGA self-event trigger architecture and ROI readout 
Se
rD
es
  
 in
p
u
t 
st
ag
e 
 
(K
IT
_I
P
C
o
re
) 
Interleaving mechanism  
for event-trigger 
ROI readout 
D
D
R
 in
te
rf
ac
e
 
(K
IT
_I
P
C
o
re
) .. 
D
D
R
3
 M
em
o
ry
 
Data frame 
Reference  
frame 
CMOS 
Control 
To DMA 
R
o
w
 
co
m
p
ar
is
o
n
 FIFO 
ROI readout signal 
Data frame 
… 
Frame N Frame N+1 
Roll-over in interleaving 
mechanism 
… 
Fast processes which cannot be controlled by external signals require a data recording at a high frame rate. Unpredictable physical 
events could be lost or partially acquired during this limited observation time. The intelligent image-based self-trigger for applications 
with unpredictable occurrence of events has been integrated in the current camera. 
0
500
1000
1500
2000
2500
3000
3500
4000
0 20 40 60 80 100 120 140 160
Fr
am
e
 r
at
e
 (
fr
am
e
s/
s)
 
Number of rows skipped 
Frame rate estimation with self-event trigger and ROI 
Small region detection  
(20 rows) 
Large region detection  
(100 rows) 
Self-event trigger and  
Region-Of-Interest  
readout architecture 
Performance: The architecture allows us to keep a 
high-spatial and time resolution and the full point of view 
of the scene. This method increases the original CMOS-
image sensor frame rate up to a factor of 10. 
1.      Ultra Fast X-ray imaging of scientific processes with On-line assessment and data-driven process control 
A novel ultra-fast & high-resolution X-ray imaging experimental station will 
be installed at the ANKA synchrotron machine. The project 'Ultra Fast X-ray 
imaging of scientific processes with On-line assessment and data-driven 
process control' (UFO) aims to develop the next generation of X-ray 
computed tomography stations optimized to perform 3D and 4D X-ray 
imaging.  
(1) Smart high data throughput camera with a (2) fast optical data link based on PCI Express. (3) GPU server and on-line data 
processing and evaluation for accelerating the 3D data reconstruction processing. The speed-up, for the first time, will enable real-time 
image processing that will use 2D and 3D image reconstruction for (4) on-line feedback loop for sample manipulations and optical 
system. 
C
M
O
S 
se
n
so
r 
GPU server 
S
a
m
p
le
 a
n
d
  
d
e
te
c
to
r 
m
a
n
ip
u
la
to
r 
FP
G
A
  
O
n
-l
in
e
 
p
ro
ce
ss
in
g 
Fast data 
reconstruction 
Data  
Evaluation 
Sample 
set-up 
Fa
st
 D
at
a 
 L
in
k 
X-ray 
beam line 
S
c
in
ti
lla
to
r 
&
 
o
p
ti
c
 l
e
n
s
 
UFO experimental station 
Storage 
Memory 
O
p
ti
ca
l l
in
k 
Self-trigger 
(Fast-reject) 
Fast HW loop 
SW control loops  2D and 3D image-
based control loops 
2 
3 
4 
Thousands of radiographies have been acquired at full speed (340 frames/s) in 
streaming mode. Satisfactory SNR level has been achieved with a spatial resolution in 
the micrometers range. 
Radioscopy with Spatio-temporal micro-resolution 
Time-resolved micro-tomography 
Smart high data 
throughput camera  
The camera prototype was 
successfully tested at ANKA 
with a moderate X-ray flux 
density.  
CMOSIS sensor 
Fast & high resolution  
crystal converter 
screen (scintillator) 
Optical system 
Sample 
X-ray 
Beam pipe 
Sample 
Sample 
goniometer 
X, Y, Z, φ  
Sample manipulation 
motors The novel concepts employed in the UFO experiment station are: 
1 
Camera  
prototype 
Al 
Fe 
Mo 
Sn 
e-
/h
+  
p
ro
d
u
ce
d
 
ADC mean (cluster counts)  
500 
Total dark noise vs. integration time 
Integration time (sec) 
Camera setting for long 
integration time @ 14°C 
Camera default  
setting @ 37°C 
10 bit mode 
Camera setting for short  
integration time @ 14°C 
5 15 25 35 45 
100 
300 
M
e
a
n
 &
 S
T
D
 p
ix
e
l 
A
D
C
 v
a
lu
e
 (
c
o
u
n
ts
)
 
Default settings @ 14°C 
Fast frame rate setting  
(red) @ 14°C 
4.       Camera characterization & adaptation to 
experiment conditions  
The limited density of the photon flux in the synchrotron light source application sets the fundamental limit on image sensor 
performance in the high frame rate acquisition (short integration time). The temporal noise components are dominant in these 
conditions. A fully programmable camera is key for an adaptive camera setting at the different X-ray experiment conditions.  
Excellent linearity of the  
CMOS-sensor, conversion factor 
measured   21 e-/ADC count 
Frames are acquired at 
full-speed with 100 µs 
integration time with 
lower illumination level . 
Two different camera settings 
have been found to optimize 
the SNR for high frame rate 
(red) and high frame 
resolution (blue), respectively.  
Efficiency characterization Noise characteriz i n 
Low noise camera, total noise 
contribution estimated   
87 e-/s @ 14°C 
