High-Level Data Flow Description of FPGA Firmware Components for Online Data Preprocessing by Engel, H. et al.
High-Level Data Flow Description of FPGA Firmware Components for Online
Data Preprocessing∗
H. Engel1, F. Gru¨ll1, and U. Kebschull1
1IRI, Institut fu¨r Informatik, Johann Wolfgang Goethe-Universita¨t Frankfurt, Frankfurt, Germany
FPGA firmware for detector read-out is commonly
described with VHDL or Verilog. Data processing on
the algorithmic level is a complex task in these lan-
guages and creates code that is hard to maintain. There
are high level description frameworks available that
simplify the implementation of processing algorithms.
A sample implementation of an existing algorithm and
the comparison with its VHDL equivalent show promis-
ing results for future online preprocessing systems.
Field Programmable Gate Arrays (FPGAs) are widely
used in high energy physics detector read-out chains due
to their flexibility. The protocols and interfaces are usu-
ally implemented with hardware description languages like
VHDL or Verilog. With FPGAs getting bigger and faster
they become more and more suitable for performing com-
plex data processing tasks. This can reduce the data volume
and significantly ease demands on later software based pro-
cessing steps. The drawback of the commonly used hard-
ware description languages is that they are mostly working
on the Register Transfer Level. This is perfect for high
performance protocol and low level interface implementa-
tions. However, using these languages to implement data
processing on an algorithmic level requires experienced de-
velopers and usually involves customized IP cores and la-
tency matching of components. This creates a rather com-
plex and static design. There are several high level hard-
ware description frameworks available that provide their
own languages to describe data processing steps on an al-
gorithmic or data flow level. Some of them also come with
an own framework including building blocks for PCIe or
DRAM interfaces. This significantly speeds up the devel-
opment compared to a description in plain VHDL or Ver-
ilog.
The underlying framework of this work is made by Max-
eler Technologies. The platform generates a pipelined ver-
sion of the algorithm after its data flow graph has been de-
scribed in a Java-like programming language [1]. The com-
piler manages the scheduling of the design, inserts latencies
in the generated pipelines wherever needed to keep the data
in sync, and instantiates interfaces to PCIe or DRAM if re-
quired. A software environment with a device driver and
C API provides easy to use stream interfaces to the hard-
ware. The compiler translates the data flow description into
VHDL code which is then run through the vendor tools.
The algorithm described in this way is the FastClus-
terFinder that was used as a VHDL core in the readout of
the ALICE Time Projection Chamber during LHC run pe-
∗Work supported by HGS-HIRe, HIC4FAIR
Channel
Decoder
Channel
Mapping
Channel
Processor
Gain
Correction
Channel
Merger
Merger
FIFO
FloatingPoint
Division
Figure 1: Schematic picture of the ALICE TPC FastClus-
terFinder algorithm.
riod 1 [2]. A simplified overview of the algorithm is shown
in Fig. 1. The incoming raw data is decoded into a data
stream with time and location information. The center of
gravity and the deviation of peaks are calculated in time
direction. In a second step neighboring cluster candidates
are merged to get the center of gravity and the deviation of
the full cluster in pad direction. The last step is a floating
point division. The VHDL implementation is a rather com-
plex design due to its data flow control structures and the
number of fixed point and floating point arithmetics.
A functionally identical version of the FastClusterFinder
algorithm has been described with the Maxeler data flow
description language. The behavior of this implementation
has been verified in simulation using recorded detector data
from ALICE TPC. The output of the data flow simulation is
directly compared to the output of a Modelsim simulation
of the original VHDL code. In comparison to the VHDL
implementation the number of lines of code is significantly
reduced for the data flow description. Especially the com-
puting intensive parts of the design are very easy to under-
stand and to maintain. The resource usage of the generated
design in its current state is slightly different in details but
overall in the same order of magnitude as the VHDL im-
plementation.
This implementation shows that there are tools available
to describe processing algorithms on an algorithmic or data
flow level that are able to generate hardware with compara-
ble resource usage but significantly reduced code volume.
This greatly improves maintainability of the code. A next
step will be to implement and test the code in actual hard-
ware. Furthermore, the generation of VHDL code out of
the data flow description allows the processing elements to
be extracted from the vendor framework and integrated as
IP core into an own firmware environment.
References
[1] Maxeler Technologies, Programming MPC Systems, White
Paper, June 2013
[2] T. Alt and V. Lindenstruth, Status of the HLT-RORC and the
Fast Cluster Finder, GSI Scientific Report 2009
IT-14 GSI SCIENTIFIC REPORT 2013
292 doi:10.15120/GR-2014-1-IT-14
