Search CORE

463 research outputs found

Multi-channel Feature Enhancement for Robust Speech Recognition

Author: Emanuele Principi
Francesco Piazza
Rudy Rotili
Simone Cifani
Stefano Squartini
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

IntechOpen

IRIS UniversitÃ Politecnica delle Marche

Recommended from our members

Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement

Author: Godsill SJ
Wolfe PJ
Publication venue: EURASIP Journal on Applied Signal Processing
Publication date: 12/12/2011
Field of study

Audio signal enhancement often involves the application of a time-varying filter, or suppression rule, to the frequency-domain transform of a corrupted signal. Here we address suppression rules derived under a Gaussian model and interpret them as spectral estimators in a Bayesian statistical framework. With regard to the optimal spectral amplitude estimator of Ephraim and Malah, we show that under the same modelling assumptions, alternative methods of Bayesian estimation lead to much simpler suppression rules exhibiting similarly effective behaviour. We derive three of such rules and demonstrate that, in addition to permitting a more straightforward implementation, they yield a more intuitive interpretation of the Ephraim and Malah solution

Apollo (Cambridge)

Model-Based Speech Enhancement in the Modulation Domain

Author: Brookes DM
Wang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/12/2017
Field of study

This paper presents an algorithm for modulationdomain speech enhancement using a Kalman filter. The proposed estimator jointly models the estimated dynamics of the spectral amplitudes of speech and noise to obtain an MMSE estimation of the speech amplitude spectrum with the assumption that the speech and noise are additive in the complex domain. In order to include the dynamics of noise amplitudes with those of speech amplitudes, we propose a statistical “Gaussring” model that comprises a mixture of Gaussians whose centres lie in a circle on the complex plane. The performance of the proposed algorithm is evaluated using the perceptual evaluation of speech quality (PESQ) measure, segmental SNR (segSNR) measure and shorttime objective intelligibility (STOI) measure. For speech quality measures, the proposed algorithm is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms. Speech recognition experiments also show that the Gaussring model based algorithm performs well for two types of noise

Spiral - Imperial College Digital Repository

A semisoft thresholding method based on Teager energy operation on wavelet packet coefficients for enhancing noisy speech

Author
Publication venue: Springer
Publication date: 19/11/2013
Field of study

Springer - Publisher Connector

A semisoft thresholding method based on Teager energy operation on wavelet packet coefficients for enhancing noisy speech

Author: A Dimitriadis
A Papoulis
A Varga
B Chen
Celia Shahnaz
D Donoho
D O’Shaughnessy
DL Donoho
F Jabloun
H Gustafsson
H Sameti
H Sheikhzadeh
I Almajai
ITU
ITU
J Hansen
J Kaiser
J Kaiser
J Rouat
K Yamashita
M Bahoura
M Bahoura
M Bahoura
M Bahoura
MT Johnson
P Loizou
P Maragos
Q Fu
R Coifman
S Ayat
S Ben Jebara
S Boll
S Chang
S Kamath
S Mallat
S Tabibian
SH Chen
Tahsina Farah Sanam
WH Abdulla
Y Ephraim
Y Ghanbari
Y Hu
Y Hu
Y Lu
Y Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Single-Microphone Speech Enhancement and Separation Using Deep Learning

Author: Kolbæk Morten
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2018
Field of study

VBN

Single-Microphone Speech Enhancement and Separation Using Deep Learning

Author: Kolbæk Morten
Publication venue
Publication date: 01/01/2018
Field of study

The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page

arXiv.org e-Print Archive

VBN