1 research outputs found
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
Data fusion plays an important role in many technical applications that
require efficient processing of multimodal sensory observations. A prominent
example is audiovisual signal processing, which has gained increasing attention
in automatic speech recognition, speaker localization and related tasks. If
appropriately combined with acoustic information, additional visual cues can
help to improve the performance in these applications, especially under adverse
acoustic conditions. A dynamic weighting of acoustic and visual streams based
on instantaneous sensor reliability measures is an efficient approach to data
fusion in this context. This paper presents a framework that extends the
well-established theory of nonlinear dynamical systems with the notion of
dynamic stream weights for an arbitrary number of sensory observations. It
comprises a recursive state estimator based on the Gaussian filtering paradigm,
which incorporates dynamic stream weights into a framework closely related to
the extended Kalman filter. Additionally, a convex optimization approach to
estimate oracle dynamic stream weights in fully observed dynamical systems
utilizing a Dirichlet prior is presented. This serves as a basis for a generic
parameter learning framework of dynamic stream weight estimators. The proposed
system is application-independent and can be easily adapted to specific tasks
and requirements. A study using audiovisual speaker tracking tasks is
considered as an exemplary application in this work. An improved tracking
performance of the dynamic stream weight-based estimation framework over
state-of-the-art methods is demonstrated in the experiments