Search CORE

273 research outputs found

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Normalization and Transformation Techniques for Robust Speaker Recognition

Author: Baojie Li
Dalei Wu
Hui Jiang
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Speech recognition in noise using weighted matching algorithms

Author: Becerra Yoma Nestor
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

On Separating Environmental and Speaker Adaptation

Author: Lima C. S.
Oliveira Jorge F.
Silva Carlos A.
Tavares Adriano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2003
Field of study

This paper presents a maximum likelihood (ML) approach, concerned to the background model estimation, in noisy acoustic non-stationary environments. The external noise source is characterised by a time constant convolutional and a time varying additive components. The HMM composition technique, provides a mechanism for integrating parametric models of acoustic background with the signal model, so that noise compensation is tightly coupled with the background model estimation. However, the existing continuous adaptation algorithms usually do not take advantage of this approach, being essentially based on the MLLR algorithm. Consequently, a model for environmental mismatch is not available and, even under constrained conditions a significant number of model parameters have to be updated. From a theoretical point of view only the noise model parameters need to be updated, being the clean speech ones unchanged by the environment. So, it can be advantageous to have a model for environmental mismatch. Additionally separating the additive and convolutional components means a separation between the environmental mismatch and speaker mismatch when the channel does not change for long periods. This approach was followed in the development of the algorithm proposed in this paper. One drawback sometimes attributed to the continuous adaptation approach is that recognition failures originate poor background estimates. This paper also proposes a MAP-like method to deal with this situation

Universidade do Minho: RepositoriUM

Histogram Equalization for Robust Speech Recognition

Author: &#193
Antonio J. Rubio
Carmen Ben&#237
Jose Carlos Segura
Luz Garc&#237
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Noise invariant frame selection: a simple method to address the background noise problem for text-independent speaker verification

Author: Schuller Björn
Shen Linlin
Shuimei Zhang
Song Siyang
Valstar Michel F.
Publication venue
Publication date: 03/05/2018
Field of study

The performance of speaker-related systems usually degrades heavily in practical applications largely due to the background noise. To improve the robustness of such systems in unknown noisy environments, this paper proposes a simple pre-processing method called Noise Invariant Frame Selection (NIFS). Based on several noisy constraints, it selects noise invariant frames from utterances to represent speakers. Experiments conducted on the TIMIT database showed that the NIFS can significantly improve the performance of Vector Quantization (VQ), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and i-vector-based speaker verification systems in different unknown noisy environments with different SNRs, in comparison to their baselines. Meanwhile, the proposed NIFS-based speaker systems has achieves similar performance when we change the constraints (hyper-parameters) or features, which indicates that it is easy to reproduce. Since NIFS is designed as a general algorithm, it could be further applied to other similar tasks

Nottingham ePrints

arXiv.org e-Print Archive

Nottingham eTheses

Crossref

Autocorrelation-based Methods for Noise-Robust Speech Recognition

Author: Gholamreza Farahani
Mohammad Ahadi
Mohammad Mehdi Homayounpour
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen