A HYBRID MULTI-CORE ARCHITECTURE FOR REAL-TIME VIDEO TRACKING by Enno Lübbers
A HYBRID MULTI-CORE ARCHITECTURE FOR REAL-TIME VIDEO TRACKING
Markus Happe 
International Graduate School
University of Paderborn
33098 Paderborn - Germany
email: markus.happe@uni-paderborn.de
Enno L¨ ubbers 
EADS Innovation Works
Technical Capability Center 5
81663 Munich - Germany
email: Enno.Luebbers@eads.net
ABSTRACT
In this paper, we present an implementation of real-time
video tracking on a novel reconﬁgurable multi-core archi-
tecture capable of reacting to changing workload while min-
imizingthenumberofactivecores. Thesystemiscomprised
of multiple processor cores executing sequential software
threads, and hardware cores implemented in an FPGA ex-
ecuting dynamically reconﬁgurable hardware threads. SW
and HW threads interact using a uniﬁed multithreaded pro-
gramming model, which allows on-the-ﬂy reconﬁguration
to shift workload between hardware and software compo-
nents. Ourself-adaptationtechniqueeffectivelyre-partitions
threads across hardware and software cores to keep the per-
formance of a video object tracking application within a pre-
deﬁned budget while minimizing the number of used pro-
cessing elements and, thus, saving power consumption.
1. INTRODUCTION
Modeling system components as threads that interact us-
ing strictly deﬁned system services is a popular approach
for software development, and is increasingly gaining pop-
ularity in the domain of hybrid HW/SW systems [1]. Here,
software functions and hardware modules can be commonly
thought as threads using identical system services such as
semaphores or message queues for communication. As a re-
sult, managing the complex problem of adaptively changing
the HW/SW partitioning of a system at run-time becomes a
matter of instantiating or terminating HW and SW threads.
Thread re-mapping can improve the power efﬁciency of
multi-core systems where only as many cores are used as are
required for a deﬁned performance. Inactive cores are as-
sumed to consume less power than active cores. Our novel
self-adaptation technique re-partitions threads onto differ-
ent cores and even across the HW/SW boundary to keep
the performance of a video object tracking application [2]
The research leading to these results has received funding from the
European Union Seventh Framework Programme under grant agreement
n 257906.
despite changing input data within an user-deﬁned perfor-
mance budget while minimizing the number of active cores.
2. HYBRID MULTI-CORE ARCHITECTURE
Ourarchitectureandalgorithmdependonauniﬁedprogram-
ming model for both SW and HW components. The oper-
ating system ReconOS [1] extends the multithreaded pro-
gramming model to the reconﬁgurable hardware domain.
ReconOS promotes HW coprocessors to independent hard-
ware threads and treats them equally to SW threads running
on the system. In particular, ReconOS allows HW threads to
use the same operating system services for communication
and synchronization as SW threads, providing a transparent
programming model across the HW/SW boundary.
FPGA
master CPU
OS kernel
sampling 
HW thread 
importance
HW thread
OSIF
O
S
I
F
worker CPU
importance
OSIF
OSIF-
adapter
slots
observation
CPU-HW thread
SW
thread
SW
thread
delegate
thread observation
importance
resampling
SW
thread
SW
thread
SW
thread
Fig. 1. Hardware architecture of the video object tracking
system based on ReconOS
ReconOS takes advantage of the dynamic partial recon-
ﬁguration capabilities of Xilinx FPGAs to reconﬁgure hard-
ware threads during run-time. This allows multiple hard-
ware threads to transparently share the reconﬁgurable re-
sources. Figure 1 shows the hardware architecture of a typ-
ical ReconOS system. The reconﬁgurable area is divided
into multiple slots holding the individual hardware threads.
A dedicated hardware OS interface (OSIF) handles the hard-
ware thread’s OS requests and forwards them to its cor-
responding delegate thread running on the CPU. ReconOS
treats a worker CPU executing a software thread (which wecall a CPU-HW thread) in the same way as a hardware slot
executing a hardware thread. CPU-HW threads are thus also
represented by delegate threads in ReconOS.
3. REAL-TIME VIDEO OBJECT TRACKING
For video object tracking, a histogram-based particle ﬁlter
is used that can be divided into the stages sampling, obser-
vation, importance, and resampling. The observation stage
calculates the histograms of the particles and the importance
stage compares them to the object’s histogram. Each of the
ﬁlter stages can have an arbitrary number of software and
hardware threads. In histogram-based video object track-
ing systems the object size strongly inﬂuences the compu-
tational complexity. Thus, many real-time video tracking
systems track ﬁxed-sized objects. When considering self-
adaptive hybrid multi-core systems, however, we can allow
changing object sizes by activating or deactivating cores.
We have implemented the video object tracker prototype
on a Virtex-4 XC4VFX100 FPGA. The system is designed
following the system architecture shown in Figure 1 and
includes one master processor (running the OS kernel and
housekeeping tasks of the particle ﬁlter framework [2]) and
one worker processor and two hardware slots, each of which
can execute one thread at a time. Both processors (PowerPC
405 CPUs) run at 300 MHz, while the hardware slots exe-
cute at 100 MHz. In our experiments, we track 100 particles,
and measure the raw particle processing time.
For self-adaptation, we apply an add/remove strategy.
Initially, the application executes entirely on the master pro-
cessor. The master processor also measures the total appli-
cation performance at user-deﬁned time intervals. In case
the performance drops below a lower threshold, the master
creates an additional instance for the thread on the core that
promises either meeting the desired performance budget, if
possible, or the largest increase in performance, else. When
the performance exceeds an upper threshold, the master ter-
minates the thread instance that will lead to the reduction
which is as close to the desired performance budget as possi-
ble, effectively reducing the dynamic power consumption of
the system by suspending execution on the respective core.
Figure 2 shows an exemplary run of our self-adaptive
video object tracking system for an exemplary video. The
application’s performance is measured in frames per second
(FPS) and the desired average performance range is set to
8 FPS, where the budget is set to be 33% faster or slower
than the deﬁned average performance. In this example, we
execute the self-adaptation algorithm every 20 frames with
an initial offset of 8 frames. The time interval for running
the self-adaptation algorithm is set to keep the overhead in-
curred by partial reconﬁguration reasonably low. Using our
proposed self-adaptation algorithm, the power consumption
can be reduced by deactivating up to 3 of 4 cores.
cores:
master
worker
hw slot 1
hw slot 2
obs.
sampling, observation, importance, resampling
importance
importance
observation
importance
0
5
10
15
20
25
0 200 400 600 800 1000
f
r
a
m
e
s
 
p
e
r
 
s
e
c
o
n
d
frame
self-adaptive
sw
(62%)
(52%)
(24%)
Fig. 2. Self-adaptation exemplary run: Resulting perfor-
mance in FPS (upper part) and thread assignment (lower
part). Re-partitioning points are represented by vertical
dashed lines. The performance target is highlighted by a
horizontal bar. [3]
4. CONCLUSION
In this paper, we present a novel thread-based self-adaptive
task partitioning technique based on a reconﬁgurable hy-
brid multi-core architecture. By adaptively changing the
HW/SW partitioning in reaction to data-dependent varia-
tions in application performance, our video object tracking
system is able to maintain a predeﬁned performance enve-
lope while minimizing the number of required processing
resources and, thus, lowering power consumption.
5. REFERENCES
[1] E. L¨ ubbers and M. Platzner, “ReconOS: Multithreaded Pro-
gramming for Reconﬁgurable Computers,” ACM TECS Spe-
cial Issue (CAPA), 2009.
[2] M.Happe, E.L¨ ubbers, andM.Platzner, “AnAdaptiveSequen-
tial Monte Carlo Framework with Runtime HW/SW Partition-
ing,” IEEE International Conference on Field Programmable
Technology (FPT), 2009.
[3] M. Happe, E. L¨ ubbers, and M. Platzner, “A Self-adaptive Het-
erogeneous Multi-core Architecture for Embedded Real-time
Video Object Tracking,” Journal on Real-Time Image Process-
ing (JRTIP), 2011, to appear.