5,085 research outputs found
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
Sparse component separation for accurate CMB map estimation
The Cosmological Microwave Background (CMB) is of premier importance for the
cosmologists to study the birth of our universe. Unfortunately, most CMB
experiments such as COBE, WMAP or Planck do not provide a direct measure of the
cosmological signal; CMB is mixed up with galactic foregrounds and point
sources. For the sake of scientific exploitation, measuring the CMB requires
extracting several different astrophysical components (CMB, Sunyaev-Zel'dovich
clusters, galactic dust) form multi-wavelength observations. Mathematically
speaking, the problem of disentangling the CMB map from the galactic
foregrounds amounts to a component or source separation problem. In the field
of CMB studies, a very large range of source separation methods have been
applied which all differ from each other in the way they model the data and the
criteria they rely on to separate components. Two main difficulties are i) the
instrument's beam varies across frequencies and ii) the emission laws of most
astrophysical components vary across pixels. This paper aims at introducing a
very accurate modeling of CMB data, based on sparsity, accounting for beams
variability across frequencies as well as spatial variations of the components'
spectral characteristics. Based on this new sparse modeling of the data, a
sparsity-based component separation method coined Local-Generalized
Morphological Component Analysis (L-GMCA) is described. Extensive numerical
experiments have been carried out with simulated Planck data. These experiments
show the high efficiency of the proposed component separation methods to
estimate a clean CMB map with a very low foreground contamination, which makes
L-GMCA of prime interest for CMB studies.Comment: submitted to A&
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
A Geometric Approach to Sound Source Localization from Time-Delay Estimates
This paper addresses the problem of sound-source localization from time-delay
estimates using arbitrarily-shaped non-coplanar microphone arrays. A novel
geometric formulation is proposed, together with a thorough algebraic analysis
and a global optimization solver. The proposed model is thoroughly described
and evaluated. The geometric analysis, stemming from the direct acoustic
propagation model, leads to necessary and sufficient conditions for a set of
time delays to correspond to a unique position in the source space. Such sets
of time delays are referred to as feasible sets. We formally prove that every
feasible set corresponds to exactly one position in the source space, whose
value can be recovered using a closed-form localization mapping. Therefore we
seek for the optimal feasible set of time delays given, as input, the received
microphone signals. This time delay estimation problem is naturally cast into a
programming task, constrained by the feasibility conditions derived from the
geometric analysis. A global branch-and-bound optimization technique is
proposed to solve the problem at hand, hence estimating the best set of
feasible time delays and, subsequently, localizing the sound source. Extensive
experiments with both simulated and real data are reported; we compare our
methodology to four state-of-the-art techniques. This comparison clearly shows
that the proposed method combined with the branch-and-bound algorithm
outperforms existing methods. These in-depth geometric understanding, practical
algorithms, and encouraging results, open several opportunities for future
work.Comment: 13 pages, 2 figures, 3 table, journa
Imaging With Nature: Compressive Imaging Using a Multiply Scattering Medium
The recent theory of compressive sensing leverages upon the structure of
signals to acquire them with much fewer measurements than was previously
thought necessary, and certainly well below the traditional Nyquist-Shannon
sampling rate. However, most implementations developed to take advantage of
this framework revolve around controlling the measurements with carefully
engineered material or acquisition sequences. Instead, we use the natural
randomness of wave propagation through multiply scattering media as an optimal
and instantaneous compressive imaging mechanism. Waves reflected from an object
are detected after propagation through a well-characterized complex medium.
Each local measurement thus contains global information about the object,
yielding a purely analog compressive sensing method. We experimentally
demonstrate the effectiveness of the proposed approach for optical imaging by
using a 300-micrometer thick layer of white paint as the compressive imaging
device. Scattering media are thus promising candidates for designing efficient
and compact compressive imagers.Comment: 17 pages, 8 figure
- …