517 research outputs found
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
Studies on binaural and monaural signal analysis methods and applications
Sound signals can contain a lot of information about the environment and the sound sources present in it. This thesis presents novel contributions to the analysis of binaural and monaural sound signals. Some new applications are introduced in this work, but the emphasis is on analysis methods. The three main topics of the thesis are computational estimation of sound source distance, analysis of binaural room impulse responses, and applications intended for augmented reality audio.
A novel method for binaural sound source distance estimation is proposed. The method is based on learning the coherence between the sounds entering the left and right ears. Comparisons to an earlier approach are also made. It is shown that these kinds of learning methods can correctly recognize the distance of a speech sound source in most cases.
Methods for analyzing binaural room impulse responses are investigated. These methods are able to locate the early reflections in time and also to estimate their directions of arrival. This challenging problem could not be tackled completely, but this part of the work is an important step towards accurate estimation of the individual early reflections from a binaural room impulse response.
As the third part of the thesis, applications of sound signal analysis are studied. The most notable contributions are a novel eyes-free user interface controlled by finger snaps, and an investigation on the importance of features in audio surveillance.
The results of this thesis are steps towards building machines that can obtain information on the surrounding environment based on sound. In particular, the research into sound source distance estimation functions as important basic research in this area. The applications presented could be valuable in future telecommunications scenarios, such as augmented reality audio
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
Self-Localization of Ad-Hoc Arrays Using Time Difference of Arrivals
This work was supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/K007491/1
Robust sound source mapping using three-layered selective audio rays for mobile robots
© 2016 IEEE. This paper investigates sound source mapping in a real environment using a mobile robot. Our approach is based on audio ray tracing which integrates occupancy grids and sound source localization using a laser range finder and a microphone array. Previous audio ray tracing approaches rely on all observed rays and grids. As such observation errors caused by sound reflection, sound occlusion, wall occlusion, sounds at misdetected grids, etc. can significantly degrade the ability to locate sound sources in a map. A three-layered selective audio ray tracing mechanism is proposed in this work. The first layer conducts frame-based unreliable ray rejection (sensory rejection) considering sound reflection and wall occlusion. The second layer introduces triangulation and audio tracing to detect falsely detected sound sources, rejecting audio rays associated to these misdetected sounds sources (short-term rejection). A third layer is tasked with rejecting rays using the whole history (long-term rejection) to disambiguate sound occlusion. Experimental results under various situations are presented, which proves the effectiveness of our method
- …
