Search CORE

17,025 research outputs found

From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR

Author: Bourlard Hervé
Hagen Astrid
Morris Andrew
Publication venue
Publication date: 10/03/2006
Field of study

The multi-band processing paradigm for noise robust ASR was originally motivated by the observation that human recognition appears to be based on independent processing of separate frequency sub-bands, and also by ``missing data'' results which have shown that ASR can be made significantly more robust to band-limited noise if noisy sub-bands can be detected and then ignored. Of the different multi-band models which have been proposed, only the ``Full Combination'' or ``all-wise'' multi-band HMM/ANN hybrid approach allows us to consistently overcome the difficult problem of deciding which sub-bands are noisy, by integrating over all possible positions of noisy sub-bands. While this system has performed better than any other multi-band system which we have tested, %[in the framework of both HMM/MLP hybrid systems and standard HMMs], we have also found that it only shows significantly improved robustness to noise when the noise is strongly band-limited. In real noise environments this is rarely the case. An alternative paradigm for noise robust ASR is multi-stream, as opposed to multi-band, ASR. In multi-stream processing the aim is to combine evidence from a number of different representations of the full speech signal, rather than from a number of frequency sub-bands. Several models for multi-stream ASR have recently reported significant performance improvements for speech with real noise. In this article we first present evidence to show how multi-band ASR has a strong advantage over the baseline system with band-limited noise, but no clear advantage with wide-band noise. We then show how the principled theoretical basis for Full Combination multi-band ASR can be directly transfered to multi-stream combination, and we show how this model can be used to combine data streams comprising three commonly used types of acoustic features. Preliminary results show significantly improved recognition with clean speech

Infoscience - École polytechnique fédérale de Lausanne

A Subband-Based SVM Front-End for Robust ASR

Author: Ager Matthew
Cvetkovic Zoran
Sollich Peter
Yousafzai Jibran
Publication venue
Publication date: 24/12/2013
Field of study

This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

arXiv.org e-Print Archive

King's Research Portal

EMD-based filtering (EMDF) of low-frequency noise for speech enhancement

Author: Chatlani Navin
Soraghan John J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2012
Field of study

An Empirical Mode Decomposition based filtering (EMDF) approach is presented as a post-processing stage for speech enhancement. This method is particularly effective in low frequency noise environments. Unlike previous EMD based denoising methods, this approach does not make the assumption that the contaminating noise signal is fractional Gaussian Noise. An adaptive method is developed to select the IMF index for separating the noise components from the speech based on the second-order IMF statistics. The low frequency noise components are then separated by a partial reconstruction from the IMFs. It is shown that the proposed EMDF technique is able to suppress residual noise from speech signals that were enhanced by the conventional optimallymodified log-spectral amplitude approach which uses a minimum statistics based noise estimate. A comparative performance study is included that demonstrates the effectiveness of the EMDF system in various noise environments, such as car interior noise, military vehicle noise and babble noise. In particular, improvements up to 10 dB are obtained in car noise environments. Listening tests were performed that confirm the results

University of Strathclyde Institutional Repository

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Crossref