2,788 research outputs found

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Compressive Imaging via Approximate Message Passing with Image Denoising

    Full text link
    We consider compressive imaging problems, where images are reconstructed from a reduced number of linear measurements. Our objective is to improve over existing compressive imaging algorithms in terms of both reconstruction error and runtime. To pursue our objective, we propose compressive imaging algorithms that employ the approximate message passing (AMP) framework. AMP is an iterative signal reconstruction algorithm that performs scalar denoising at each iteration; in order for AMP to reconstruct the original input signal well, a good denoiser must be used. We apply two wavelet based image denoisers within AMP. The first denoiser is the "amplitude-scaleinvariant Bayes estimator" (ABE), and the second is an adaptive Wiener filter; we call our AMP based algorithms for compressive imaging AMP-ABE and AMP-Wiener. Numerical results show that both AMP-ABE and AMP-Wiener significantly improve over the state of the art in terms of runtime. In terms of reconstruction quality, AMP-Wiener offers lower mean square error (MSE) than existing compressive imaging algorithms. In contrast, AMP-ABE has higher MSE, because ABE does not denoise as well as the adaptive Wiener filter.Comment: 15 pages; 2 tables; 7 figures; to appear in IEEE Trans. Signal Proces

    A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

    Get PDF
    In this thesis, a new robust noise spectral estimation algorithm is proposed for the purpose of single-microphone speech enhancement. This algorithm can generate the optimal noise spectral estimates in the Minimum Mean Square Error (MMSE) sense based on the speech statistics in the noisy environments. Compared to the well-adopted conventional noise spectral estimation method using the single-pole recursion, our proposed scheme is more reliable since the recursion coefficients are adaptable and optimal in the MMSE therein. We also propose a new accurate Resulting Signal-to-Noise Ratio (R-SNR) estimator as a quality measure to benchmark the existing noise spectral estimation techniques. This new R-SNR estimator can be applied to quantify not only the residual noise but also the speech distortion and therefore it can well serve as the overall speech quality measure after the noise suppression. We conduct the experiments to evaluate the performance of the noise suppression using our robust noise spectral estimation algorithm and compare it with those of two major existing noise spectral estimation methods. Through numerous simulations, we have shown that our noise suppression technique significantly outperforms the conventional methods in both stationary and nonstationary noise environments

    Fast non-negative deconvolution for spike train inference from population calcium imaging

    Full text link
    Calcium imaging for observing spiking activity from large populations of neurons are quickly gaining popularity. While the raw data are fluorescence movies, the underlying spike trains are of interest. This work presents a fast non-negative deconvolution filter to infer the approximately most likely spike train for each neuron, given the fluorescence observations. This algorithm outperforms optimal linear deconvolution (Wiener filtering) on both simulated and biological data. The performance gains come from restricting the inferred spike trains to be positive (using an interior-point method), unlike the Wiener filter. The algorithm is fast enough that even when imaging over 100 neurons, inference can be performed on the set of all observed traces faster than real-time. Performing optimal spatial filtering on the images further refines the estimates. Importantly, all the parameters required to perform the inference can be estimated using only the fluorescence data, obviating the need to perform joint electrophysiological and imaging calibration experiments.Comment: 22 pages, 10 figure

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks
    • …
    corecore