3,435 research outputs found
Particle Filter Design Using Importance Sampling for Acoustic Source Localisation and Tracking in Reverberant Environments
Sequential Monte Carlo methods have been recently proposed to deal with the problem of acoustic source localisation and tracking using an array of microphones. Previous implementations make use of the basic bootstrap particle filter, whereas a more general approach involves the concept of importance sampling. In this paper, we develop a new particle filter for acoustic source localisation using importance sampling, and compare its tracking ability with that of a bootstrap algorithm proposed previously in the literature. Experimental results obtained with simulated reverberant samples and real audio recordings demonstrate that the new algorithm is more suitable for practical applications due to its reinitialisation capabilities, despite showing a slightly lower average tracking accuracy. A real-time implementation of the algorithm also shows that the proposed particle filter can reliably track a person talking in real reverberant rooms.This paper was performed while Eric A. Lehmann was working
with National ICT Australia. National ICT Australia
is funded by the Australian Government’s Department of
Communications, Information Technology, and the Arts,
the Australian Research Council, through Backing Australia’s
Ability, and the ICT Centre of Excellence programs
A multimodal approach to blind source separation of moving sources
A novel multimodal approach is proposed to solve the
problem of blind source separation (BSS) of moving sources. The
challenge of BSS for moving sources is that the mixing filters are
time varying; thus, the unmixing filters should also be time varying,
which are difficult to calculate in real time. In the proposed approach,
the visual modality is utilized to facilitate the separation for
both stationary and moving sources. The movement of the sources
is detected by a 3-D tracker based on video cameras. Positions
and velocities of the sources are obtained from the 3-D tracker
based on a Markov Chain Monte Carlo particle filter (MCMC-PF),
which results in high sampling efficiency. The full BSS solution
is formed by integrating a frequency domain blind source separation
algorithm and beamforming: if the sources are identified
as stationary for a certain minimum period, a frequency domain
BSS algorithm is implemented with an initialization derived from
the positions of the source signals. Once the sources are moving, a
beamforming algorithm which requires no prior statistical knowledge
is used to perform real time speech enhancement and provide
separation of the sources. Experimental results confirm that
by utilizing the visual modality, the proposed algorithm not only
improves the performance of the BSS algorithm and mitigates the
permutation problem for stationary sources, but also provides a
good BSS performance for moving sources in a low reverberant
environment
Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems
Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems.
The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the
problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion.
The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have
minimum processing times.
The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the
amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems
Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network
This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source (DOA) estimator and a particle filter. Their respective performance is evaluated in various acoustic conditions such as anechoic and reverberant scenarios, stationary and moving sources at several angular velocities, and with a varying number of overlapping sources. The results show that the CRNN manages to track multiple sources more consistently than the parametric method across acoustic scenarios, but at the cost of higher localization error.202
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
- …