2,788 research outputs found
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Compressive Imaging via Approximate Message Passing with Image Denoising
We consider compressive imaging problems, where images are reconstructed from
a reduced number of linear measurements. Our objective is to improve over
existing compressive imaging algorithms in terms of both reconstruction error
and runtime. To pursue our objective, we propose compressive imaging algorithms
that employ the approximate message passing (AMP) framework. AMP is an
iterative signal reconstruction algorithm that performs scalar denoising at
each iteration; in order for AMP to reconstruct the original input signal well,
a good denoiser must be used. We apply two wavelet based image denoisers within
AMP. The first denoiser is the "amplitude-scaleinvariant Bayes estimator"
(ABE), and the second is an adaptive Wiener filter; we call our AMP based
algorithms for compressive imaging AMP-ABE and AMP-Wiener. Numerical results
show that both AMP-ABE and AMP-Wiener significantly improve over the state of
the art in terms of runtime. In terms of reconstruction quality, AMP-Wiener
offers lower mean square error (MSE) than existing compressive imaging
algorithms. In contrast, AMP-ABE has higher MSE, because ABE does not denoise
as well as the adaptive Wiener filter.Comment: 15 pages; 2 tables; 7 figures; to appear in IEEE Trans. Signal
Proces
A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices
In this thesis, a new robust noise spectral estimation algorithm is proposed for the purpose of single-microphone speech enhancement. This algorithm can generate the optimal noise spectral estimates in the Minimum Mean Square Error (MMSE) sense based on the speech statistics in the noisy environments. Compared to the well-adopted conventional noise spectral estimation method using the single-pole recursion, our proposed scheme is more reliable since the recursion coefficients are adaptable and optimal in the MMSE therein. We also propose a new accurate Resulting Signal-to-Noise Ratio (R-SNR) estimator as a quality measure to benchmark the existing noise spectral estimation techniques. This new R-SNR estimator can be applied to quantify not only the residual noise but also the speech distortion and therefore it can well serve as the overall speech quality measure after the noise suppression. We conduct the experiments to evaluate the performance of the noise suppression using our robust noise spectral estimation algorithm and compare it with those of two major existing noise spectral estimation methods. Through numerous simulations, we have shown that our noise suppression technique significantly outperforms the conventional methods in both stationary and nonstationary noise environments
Fast non-negative deconvolution for spike train inference from population calcium imaging
Calcium imaging for observing spiking activity from large populations of
neurons are quickly gaining popularity. While the raw data are fluorescence
movies, the underlying spike trains are of interest. This work presents a fast
non-negative deconvolution filter to infer the approximately most likely spike
train for each neuron, given the fluorescence observations. This algorithm
outperforms optimal linear deconvolution (Wiener filtering) on both simulated
and biological data. The performance gains come from restricting the inferred
spike trains to be positive (using an interior-point method), unlike the Wiener
filter. The algorithm is fast enough that even when imaging over 100 neurons,
inference can be performed on the set of all observed traces faster than
real-time. Performing optimal spatial filtering on the images further refines
the estimates. Importantly, all the parameters required to perform the
inference can be estimated using only the fluorescence data, obviating the need
to perform joint electrophysiological and imaging calibration experiments.Comment: 22 pages, 10 figure
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …