28 research outputs found

    Online Parametric NMF for Speech Enhancement

    Get PDF

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Speech enhancement based on hidden Markov model using sparse code shrinkage

    Get PDF
    This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM framework, namely sparse code shrinkage-HMM (SCS-HMM).The proposed method on TIMIT database in the presence of three noise types at three SNR levels in terms of PESQ and SNR are evaluated and compared with Auto-Regressive HMM (AR-HMM) and speech enhancement based on HMM with discrete cosine transform (DCT) coefficients using Laplace and Gaussian distributions (LaGa-HMMDCT). The results confirm the superiority of SCS-HMM method in presence of non-stationary noises compared to LaGa-HMMDCT. The results of SCS-HMM method represent better performance of this method compared to AR-HMM in presence of white noise based on PESQ measure

    A decision-directed adaptive gain equalizer for assistive hearing instruments

    Get PDF
    Assistive hearing instruments have a significant impact on speech enhancement when the signal-to-noise ratio is low. These instruments are usually developed using the conventional adaptive gain equalizer (AGE), which has low computational complexity and low distortion in real-time speech enhancement. The conventional AGEs are intended to boost the speech segments of speech signals but they are incapable of suppressing noise segments. The overall speech quality of the assistive hearing instruments may be reduced, as the noise segments still cannot be filtered out. In this paper, a decision-directed AGE is proposed for assistive hearing instruments. It aims to overcome the limitation of the conventional AGE, which is capable only of boosting speech segments in noisy speech but incapable of suppressing noise segments. The proposed approach simultaneously boosts the speech segments and suppresses noise segments in noisy speech. Experimental results with different types of real-world noise indicate that the proposed method achieves better speech quality than does the conventional AGE. The resulting method provides an improved functionality for assistive hearing instruments

    Generating intelligible audio speech from visual speech

    Get PDF
    This work is concerned with generating intelligible audio speech from a video of a person talking. Regression and classification methods are proposed first to estimate static spectral envelope features from active appearance model (AAM) visual features. Two further methods are then developed to incorporate temporal information into the prediction - a feature-level method using multiple frames and a model-level method based on recurrent neural networks. Speech excitation information is not available from the visual signal, so methods to artificially generate aperiodicity and fundamental frequency are developed. These are combined within the STRAIGHT vocoder to produce a speech signal. The various systems are optimised through objective tests before applying subjective intelligibility tests that determine a word accuracy of 85% from a set of human listeners on the GRID audio-visual speech database. This compares favourably with a previous regression-based system that serves as a baseline which achieved a word accuracy of 33%

    Model-based speech enhancement for hearing aids

    Get PDF
    corecore