2,660 research outputs found
Audio-Visual Speaker Tracking with Importance Particle Filters
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for the asymmetrical integration of AV information in a way that efficiently exploits the complementary features of each modality. Audio localization information is used to generate an importance sampling (IS) function, which guides the random search process of a particle filter towards regions of the configuration space likely to contain the true configuration (a speaker). The measurement process integrates contour-based and audio observations, which results in reliable head tracking in realistic scenarios. We show that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
A multimodal approach to blind source separation of moving sources
A novel multimodal approach is proposed to solve the
problem of blind source separation (BSS) of moving sources. The
challenge of BSS for moving sources is that the mixing filters are
time varying; thus, the unmixing filters should also be time varying,
which are difficult to calculate in real time. In the proposed approach,
the visual modality is utilized to facilitate the separation for
both stationary and moving sources. The movement of the sources
is detected by a 3-D tracker based on video cameras. Positions
and velocities of the sources are obtained from the 3-D tracker
based on a Markov Chain Monte Carlo particle filter (MCMC-PF),
which results in high sampling efficiency. The full BSS solution
is formed by integrating a frequency domain blind source separation
algorithm and beamforming: if the sources are identified
as stationary for a certain minimum period, a frequency domain
BSS algorithm is implemented with an initialization derived from
the positions of the source signals. Once the sources are moving, a
beamforming algorithm which requires no prior statistical knowledge
is used to perform real time speech enhancement and provide
separation of the sources. Experimental results confirm that
by utilizing the visual modality, the proposed algorithm not only
improves the performance of the BSS algorithm and mitigates the
permutation problem for stationary sources, but also provides a
good BSS performance for moving sources in a low reverberant
environment
AudioāVisual Speaker Tracking
Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multiāspeaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in realātime tracking and localization of speakers. However, speaker tracking is a challenging task in realālife scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multiāmodal information, as it conveys complementary information about the state of the speakers compared to singleāmodal tracking. To use multiāmodal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a stateāofātheāart overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field
Bayesian-based techniques for tracking multiple humans in an enclosed environment
This thesis deals with the problem of online visual tracking of multiple humans in an enclosed environment. The focus is to develop techniques to deal with the challenges of varying number of targets, inter-target occlusions and interactions when every target gives rise to multiple measurements (pixels) in every video frame. This thesis contains three different contributions to the research in multi-target tracking.
Firstly, a multiple target tracking algorithm is proposed which focuses on mitigating the inter-target occlusion problem during complex interactions. This is achieved with the help of a particle filter, multiple video cues and a new interaction model. A Markov chain Monte Carlo particle filter (MCMC-PF) is used along with a new interaction model which helps in modeling interactions of multiple targets. This helps to overcome tracking failures due to occlusions. A new weighted Markov chain Monte Carlo (WMCMC) sampling technique is also proposed which assists in achieving a reduced tracking error.
Although effective, to accommodate multiple measurements (pixels) produced by every target, this technique aggregates measurements into features which results in information loss.
In the second contribution, a novel variational Bayesian clustering-based multi-target tracking framework is proposed which can associate multiple measurements to every target without aggregating them into features. It copes with complex inter-target occlusions by maintaining the identity of targets during their close physical interactions and handles efficiently a time-varying number of targets. The proposed multi-target tracking framework consists of background subtraction, clustering, data association and particle filtering. A variational Bayesian clustering technique groups the extracted foreground measurements while an improved feature based joint probabilistic data association filter (JPDAF) is developed to associate clusters of measurements to every target. The data association information is used within the particle filter to track multiple targets. The clustering results are further utilised to estimate the number of targets. The proposed technique improves the tracking accuracy. However, the proposed features based JPDAF technique results in an exponential growth of computational complexity of the overall framework with increase in number of targets.
In the final work, a novel data association technique for multi-target tracking is proposed which more efficiently assigns multiple measurements to every target, with a reduced computational complexity. A belief propagation (BP) based cluster to target association method is proposed which exploits the inter-cluster dependency information. Both location and features of clusters are used to re-identify the targets when they emerge from occlusions.
The proposed techniques are evaluated on benchmark data sets and their performance is compared with state-of-the-art techniques by using, quantitative and global performance measures
- ā¦