Search CORE

28 research outputs found

Online Parametric NMF for Speech Enhancement

Author: Boldt Jesper
Christensen Mads Græsbøll
Kavalekalam Mathew Shaji
Nielsen Jesper Kjær
Shi Liming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

VBN

Probabilistic Modeling Paradigms for Audio Source Separation

Author: A. P.Dempster
A.Gelman
D. L.Wang
D.FitzGerald
J.Nocedal
J.Winn
M. I.Mandel
R. J.Weiss
R.Mukai
S. T.Roweis
S.Makino
Publication venue: 'IGI Global'
Publication date: 01/01/2010
Field of study

This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Speech enhancement based on hidden Markov model using sparse code shrinkage

Author: E. Golrasan
H. Sameti
Publication venue: 'International Digital Organization for Scientific Information (IDOSI)'
Publication date: 01/07/2016
Field of study

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM framework, namely sparse code shrinkage-HMM (SCS-HMM).The proposed method on TIMIT database in the presence of three noise types at three SNR levels in terms of PESQ and SNR are evaluated and compared with Auto-Regressive HMM (AR-HMM) and speech enhancement based on HMM with discrete cosine transform (DCT) coefficients using Laplace and Gaussian distributions (LaGa-HMMDCT). The results confirm the superiority of SCS-HMM method in presence of non-stationary noises compared to LaGa-HMMDCT. The results of SCS-HMM method represent better performance of this method compared to AR-HMM in presence of white noise based on PESQ measure

Directory of Open Access Journals

A decision-directed adaptive gain equalizer for assistive hearing instruments

Author: Chan Kit Yan
Low S.Y.
Nordholm Sven
Yiu Ka Fai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Assistive hearing instruments have a significant impact on speech enhancement when the signal-to-noise ratio is low. These instruments are usually developed using the conventional adaptive gain equalizer (AGE), which has low computational complexity and low distortion in real-time speech enhancement. The conventional AGEs are intended to boost the speech segments of speech signals but they are incapable of suppressing noise segments. The overall speech quality of the assistive hearing instruments may be reduced, as the noise segments still cannot be filtered out. In this paper, a decision-directed AGE is proposed for assistive hearing instruments. It aims to overcome the limitation of the conventional AGE, which is capable only of boosting speech segments in noisy speech but incapable of suppressing noise segments. The proposed approach simultaneously boosts the speech segments and suppresses noise segments in noisy speech. Experimental results with different types of real-world noise indicate that the proposed method achieves better speech quality than does the conventional AGE. The resulting method provides an improved functionality for assistive hearing instruments

The Hong Kong Polytechnic University Pao Yue-kong Library

Southampton (e-Prints Soton)

Crossref

PolyU Institutional Repository

espace@Curtin

Generating intelligible audio speech from visual speech

Author: Le Cornu Thomas
Milner Ben P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/06/2017
Field of study

This work is concerned with generating intelligible audio speech from a video of a person talking. Regression and classification methods are proposed first to estimate static spectral envelope features from active appearance model (AAM) visual features. Two further methods are then developed to incorporate temporal information into the prediction - a feature-level method using multiple frames and a model-level method based on recurrent neural networks. Speech excitation information is not available from the visual signal, so methods to artificially generate aperiodicity and fundamental frequency are developed. These are combined within the STRAIGHT vocoder to produce a speech signal. The various systems are optimised through objective tests before applying subjective intelligibility tests that determine a word accuracy of 85% from a set of human listeners on the GRID audio-visual speech database. This compares favourably with a previous regression-based system that serves as a baseline which achieved a word accuracy of 33%

Crossref

University of East Anglia digital repository

Non-intrusive codebook-based intelligibility prediction

Author: Boldt Jesper Bünsow
Christensen Mads Græsbøll
Kavalekalam Mathew Shaji
Sørensen Charlotte
Xenaki Angeliki
Publication venue: 'Elsevier BV'
Publication date: 01/07/2018
Field of study

VBN

Model-based speech enhancement for hearing aids

Author: Kavalekalam Mathew Shaji
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2018
Field of study

VBN