2,660 research outputs found

    Audio-Visual Speaker Tracking with Importance Particle Filters

    Get PDF
    We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for the asymmetrical integration of AV information in a way that efficiently exploits the complementary features of each modality. Audio localization information is used to generate an importance sampling (IS) function, which guides the random search process of a particle filter towards regions of the configuration space likely to contain the true configuration (a speaker). The measurement process integrates contour-based and audio observations, which results in reliable head tracking in realistic scenarios. We show that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    A multimodal approach to blind source separation of moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

    Audioā€Visual Speaker Tracking

    Get PDF
    Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multiā€speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in realā€time tracking and localization of speakers. However, speaker tracking is a challenging task in realā€life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multiā€modal information, as it conveys complementary information about the state of the speakers compared to singleā€modal tracking. To use multiā€modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a stateā€ofā€theā€art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

    Bayesian-based techniques for tracking multiple humans in an enclosed environment

    Get PDF
    This thesis deals with the problem of online visual tracking of multiple humans in an enclosed environment. The focus is to develop techniques to deal with the challenges of varying number of targets, inter-target occlusions and interactions when every target gives rise to multiple measurements (pixels) in every video frame. This thesis contains three different contributions to the research in multi-target tracking. Firstly, a multiple target tracking algorithm is proposed which focuses on mitigating the inter-target occlusion problem during complex interactions. This is achieved with the help of a particle filter, multiple video cues and a new interaction model. A Markov chain Monte Carlo particle filter (MCMC-PF) is used along with a new interaction model which helps in modeling interactions of multiple targets. This helps to overcome tracking failures due to occlusions. A new weighted Markov chain Monte Carlo (WMCMC) sampling technique is also proposed which assists in achieving a reduced tracking error. Although effective, to accommodate multiple measurements (pixels) produced by every target, this technique aggregates measurements into features which results in information loss. In the second contribution, a novel variational Bayesian clustering-based multi-target tracking framework is proposed which can associate multiple measurements to every target without aggregating them into features. It copes with complex inter-target occlusions by maintaining the identity of targets during their close physical interactions and handles efficiently a time-varying number of targets. The proposed multi-target tracking framework consists of background subtraction, clustering, data association and particle filtering. A variational Bayesian clustering technique groups the extracted foreground measurements while an improved feature based joint probabilistic data association filter (JPDAF) is developed to associate clusters of measurements to every target. The data association information is used within the particle filter to track multiple targets. The clustering results are further utilised to estimate the number of targets. The proposed technique improves the tracking accuracy. However, the proposed features based JPDAF technique results in an exponential growth of computational complexity of the overall framework with increase in number of targets. In the final work, a novel data association technique for multi-target tracking is proposed which more efficiently assigns multiple measurements to every target, with a reduced computational complexity. A belief propagation (BP) based cluster to target association method is proposed which exploits the inter-cluster dependency information. Both location and features of clusters are used to re-identify the targets when they emerge from occlusions. The proposed techniques are evaluated on benchmark data sets and their performance is compared with state-of-the-art techniques by using, quantitative and global performance measures
    • ā€¦
    corecore