2 research outputs found
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
Data fusion plays an important role in many technical applications that
require efficient processing of multimodal sensory observations. A prominent
example is audiovisual signal processing, which has gained increasing attention
in automatic speech recognition, speaker localization and related tasks. If
appropriately combined with acoustic information, additional visual cues can
help to improve the performance in these applications, especially under adverse
acoustic conditions. A dynamic weighting of acoustic and visual streams based
on instantaneous sensor reliability measures is an efficient approach to data
fusion in this context. This paper presents a framework that extends the
well-established theory of nonlinear dynamical systems with the notion of
dynamic stream weights for an arbitrary number of sensory observations. It
comprises a recursive state estimator based on the Gaussian filtering paradigm,
which incorporates dynamic stream weights into a framework closely related to
the extended Kalman filter. Additionally, a convex optimization approach to
estimate oracle dynamic stream weights in fully observed dynamical systems
utilizing a Dirichlet prior is presented. This serves as a basis for a generic
parameter learning framework of dynamic stream weight estimators. The proposed
system is application-independent and can be easily adapted to specific tasks
and requirements. A study using audiovisual speaker tracking tasks is
considered as an exemplary application in this work. An improved tracking
performance of the dynamic stream weight-based estimation framework over
state-of-the-art methods is demonstrated in the experiments
Audio Surveillance: a Systematic Review
Despite surveillance systems are becoming increasingly ubiquitous in our
living environment, automated surveillance, currently based on video sensory
modality and machine intelligence, lacks most of the time the robustness and
reliability required in several real applications. To tackle this issue, audio
sensory devices have been taken into account, both alone or in combination with
video, giving birth, in the last decade, to a considerable amount of research.
In this paper audio-based automated surveillance methods are organized into a
comprehensive survey: a general taxonomy, inspired by the more widespread video
surveillance field, is proposed in order to systematically describe the methods
covering background subtraction, event classification, object tracking and
situation analysis. For each of these tasks, all the significant works are
reviewed, detailing their pros and cons and the context for which they have
been proposed. Moreover, a specific section is devoted to audio features,
discussing their expressiveness and their employment in the above described
tasks. Differently, from other surveys on audio processing and analysis, the
present one is specifically targeted to automated surveillance, highlighting
the target applications of each described methods and providing the reader
tables and schemes useful to retrieve the most suited algorithms for a specific
requirement