160 research outputs found
Semi-Supervised Sound Source Localization Based on Manifold Regularization
Conventional speaker localization algorithms, based merely on the received
microphone signals, are often sensitive to adverse conditions, such as: high
reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in
meeting rooms or cars, it can be assumed that the source position is confined
to a predefined area, and the acoustic parameters of the environment are
approximately fixed. Such scenarios give rise to the assumption that the
acoustic samples from the region of interest have a distinct geometrical
structure. In this paper, we show that the high dimensional acoustic samples
indeed lie on a low dimensional manifold and can be embedded into a low
dimensional space. Motivated by this result, we propose a semi-supervised
source localization algorithm which recovers the inverse mapping between the
acoustic samples and their corresponding locations. The idea is to use an
optimization framework based on manifold regularization, that involves
smoothness constraints of possible solutions with respect to the manifold. The
proposed algorithm, termed Manifold Regularization for Localization (MRL), is
implemented in an adaptive manner. The initialization is conducted with only
few labelled samples attached with their respective source locations, and then
the system is gradually adapted as new unlabelled samples (with unknown source
locations) are received. Experimental results show superior localization
performance when compared with a recently presented algorithm based on a
manifold learning approach and with the generalized cross-correlation (GCC)
algorithm as a baseline
Multilevel B-Splines-Based Learning Approach for Sound Source Localization
© 2001-2012 IEEE. In this paper, a new learning approach for sound source localization is presented using ad hoc either synchronous or asynchronous distributed microphone networks based on the time differences of arrival (TDOA) estimation. It is first to propose a new concept in which the coordinates of a sound source location are defined as the functions of TDOAs, computing for each pair of microphone signals in the network. Then, given a set of pre-recorded sound measurements and their corresponding source locations, the multilevel B-splines-based learning model is proposed to be trained by the input of the known TDOAs and the output of the known coordinates of the sound source locations. For a new acoustic source, if its sound signals are recorded, the correspondingly computed TDOAs can be fed into the learned model to predict the location of the new source. Superiorities of the proposed method are to incorporate the acoustic characteristics of a targeted environment and even remaining uncertainty of TDOA estimations into the learning model before conducting its prediction and to be applicable for both synchronous or asynchronous distributed microphone sensor networks. The effectiveness of the proposed algorithm in terms of localization accuracy and computational cost in comparisons with the state-of-the-art methods was extensively validated on both synthetic simulation experiments as well as in three real-life environments
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Perspectives
International audienceSource separation and speech enhancement research has made dramatic progress in the last 30 years. It is now a mainstream topic in speech and audio processing, with hundreds of papers published every year. Separation and enhancement performance have greatly improved and successful commercial applications are increasingly being deployed. This chapter provides an overview of research and development perspectives in the field. We do not attempt to cover all perspectives currently under discussion in the community. Instead, we focus on five directions in which we believe major progress is still possible: getting the most out of deep learning, exploiting phase relationships across time-frequency bins, improving the estimation accuracy of multichannel parameters, addressing scenarios involving multiple microphone arrays or other sensors, and accelerating industry transfer. These five directions are covered in Sections 19.1, 19.2, 19.3, 19.4, and 19.5, respectively
Sound Source Localization and Modeling: Spherical Harmonics Domain Approaches
Sound source localization has been an important research topic in the acoustic signal processing community because of its wide use in many acoustic applications, including speech separation, speech enhancement, sound event detection, automatic speech recognition, automated camera steering, and virtual reality. In the recent decade, there is a growing interest in the research of sound source localization using higher-order microphone arrays, which are capable of recording and analyzing the soundfield over a target spatial area. This thesis studies a novel source feature called the relative harmonic coefficient, that easily estimated from the higher-order microphone measurements. This source feature has direct applications for sound source localization due to its sole dependence on the source position.
This thesis proposes two novel sound source localization algorithms using the relative harmonic coefficients: (i) a low-complexity single source localization approach that localizes the source' elevation and azimuth separately. This approach is also appliable to acoustic enhancement for the higher-order microphone array recordings; (ii) a semi-supervised multi-source localization algorithm in a noisy and reverberant environment. Although this approach uses a learning schema, it still has a strong potential to be implemented in practice because only a limited number of labeled measurements are required. However, this algorithm has an inherent limitation as it requires the availability of single-source components. Thus, it is unusable in scenarios where the original recordings have limited single-source components (e.g., multiple sources simultaneously active). To address this issue, we develop a novel MUSIC framework based approach that directly uses simultaneous multi-source recordings. This developed MUSIC approach uses robust measurements of relative sound pressure from the higher-order microphone and is shown to be more suitable in noisy environments than the traditional MUSIC method.
While the proposed approaches address the source localization problems, in practice, the broader problem of source localization has some more common challenges, which have received less attention. One such challenge is the common assumption of the sound sources being omnidirectional, which is hardly the case with a typical commercial loudspeaker. Therefore, in this thesis, we analyze the broader problem of analyzing directional characteristics of the commercial loudspeakers by deriving equivalent theoretical acoustic models. Several acoustic models are investigated, including plane waves decomposition, point source decomposition, and mixed source decomposition. We finally conduct extensive experimental examinations to see which acoustic model has more similar characteristics with commercial loudspeakers
Inferring Room Geometries
Determining the geometry of an acoustic enclosure using microphone arrays
has become an active area of research. Knowledge gained about the acoustic
environment, such as the location of reflectors, can be advantageous for
applications such as sound source localization, dereverberation and adaptive
echo cancellation by assisting in tracking environment changes and helping
the initialization of such algorithms.
A methodology to blindly infer the geometry of an acoustic enclosure by estimating
the location of reflective surfaces based on acoustic measurements
using an arbitrary array geometry is developed and analyzed. The starting
point of this work considers a geometric constraint, valid both in two
and three-dimensions, that converts time-of-arrival and time-difference-pf-arrival information into elliptical constraints about the location of reflectors.
Multiple constraints are combined to yield the line or plane parameters of
the reflectors by minimizing a specific cost function in the least-squares
sense. An iterative constrained least-squares estimator, along with a closed-form estimator, that performs optimally in a noise-free scenario, solve the
associated common tangent estimation problem that arises from the geometric
constraint. Additionally, a Hough transform based data fusion and
estimation technique, that considers acquisitions from multiple source positions,
refines the reflector localization even in adverse conditions.
An extension to the geometric inference framework, that includes the estimation
of the actual speed of sound to improve the accuracy under temperature
variations, is presented that also reduces the required prior information
needed such that only relative microphone positions in the array are
required for the localization of acoustic reflectors. Simulated and real-world
experiments demonstrate the feasibility of the proposed method.Open Acces
- …