Search CORE

126 research outputs found

Statistical models for noise-robust speech recognition

Author: van Dalen Rogier Christiaan
Publication venue: University of Cambridge
Publication date: 01/01/2011
Field of study

A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. This thesis looks into improving on this approximation in two ways: firstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. The first part of this work is about compensating for within-component feature correlations under noise. For this, the covariance matrices of the computed Gaussians should be full instead of diagonal. The estimation of off-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, like VTS compensation, use for dynamic coefficients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coefficients, and dynamic coefficients, which represent signal changes over time, and are normally computed from a window of static coefficients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It first compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coefficients. It introduces a number of methods that address the correlation changes that occur in noise within this framework. The next problem is decoding speed with full covariances. This thesis re-analyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. The second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to use for recognition, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. The KL divergence proves to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.This work was supported by Toshiba Research Europe Ltd., Cambridge Research Laboratory

CiteSeerX

Apollo (Cambridge)

A TAXONOMY-ORIENTED OVERVIEW OF NOISE COMPENSATION TECHNIQUES FOR SPEECH RECOGNITION

Author: Ali
Mahmood Khan
Najmi Ghani Haider
Pathan
Syed Abbas
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Designing a machine that is capable for understanding human speech and responds properly to speech utterance or spoken language has intrigued speech research community for centuries. Among others, one of the fundamental problems to building speech recognition system is acoustic noise. The performance of speech recognition system significantly degrades in the presence of ambient noise. Background noise not only causes high level mismatch between training and testing conditions due to unseen environment but also decreases the discriminating ability of the acoustic model between speech utterances by increasing the associated uncertainty of speech. This paper presents a brief survey on different approaches to robust speech recognition. The objective of this review paper is to analyze the effect of noise on speech recognition, provide quantitative analysis of well-known noise compensation techniques used in the various approaches to robust speech recognition and present a taxonomy-oriented overview of noise compensation techniques

CiteSeerX

Recommended from our members

Sound Organization by Source Models in Humans and Machines

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

When extracting information from simultaneous sound sources, listeners successfully exploit many different factors spanning spatial location and source characteristics. I will argue that detailed constraints on the form of particular source signals are being employed, and that this therefore is an important direction for research into automatic sound organization systems, in applications ranging from speech separation to environmental sound classification to music understanding

Columbia University Academic Commons

New Stategies for Single-channel Speech Separation

Author: Mowlaee Beikzadehmahalen Pejman
Publication venue: Institut for Elektroniske Systemer, Aalborg Universitet
Publication date: 01/01/2010
Field of study

VBN

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Author: Arnab Ghoshal Member
Kk Chin
Liang Lu
Steve Renals Senior
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/07/2013
Field of study

Abstract—Joint uncertainty decoding (JUD) is a model-based noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD clusters the Gaussian components into a smaller number of classes, sharing the compensation parameters for the set of Gaussians in a given class. This significantly reduces the computational cost. In this paper, we investigate noise compensation for subspace Gaussian mixture model (SGMM) based speech recognition systems using JUD. The total number of Gaussian components in an SGMM is typically very large. Therefore direct compensation of the individual Gaussian components, as performed by VTS, is computationally expensive. In this paper we show that JUDbased noise compensation can be successfully applied to SGMMs in a computationally efficient way. We evaluate the JUD/SGMM technique on the standard Aurora 4 corpus. Our experimental results indicate that the JUD/SGMM system results in lower word error rates compared with a conventional GMM system with either VTS-based or JUD-based noise compensation. Index Terms—subspace Gaussian mixture model, vector Taylor series, joint uncertainty decoding, noise robust ASR, Aurora

CiteSeerX

Edinburgh Research Explorer

Indirect model-based speech enhancement

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Approximate Bayesian inference for robust speech processing

Author: Maina Ciira wa
Publication venue: Drexel University
Publication date
Field of study

Speech processing applications such as speech enhancement and speaker identification rely on the estimation of relevant parameters from the speech signal. Theseparameters must often be estimated from noisy observations since speech signals are rarely obtained in ‘clean’ acoustic environments in the real world. As a result, the parameter estimation algorithms we employ must be robust to environmental factors such as additive noise and reverberation. In this work we derive and evaluate approximate Bayesian algorithms for the following speech processing tasks: 1) speech enhancement 2) speaker identification 3) speaker verification and 4) voice activity detection.Building on previous work in the field of statistical model based speech enhancement, we derive speech enhancement algorithms that rely on speaker dependent priors over linear prediction parameters. These speaker dependent priors allow us to handle speech enhancement and speaker identification in a joint framework. Furthermore, we show how these priors allow voice activity detection to be performed in a robust manner.We also develop algorithms in the log spectral domain with applications in robust speaker verification. The use of speaker dependent priors in the log spectral domain is shown to improve equal error rates in noisy environments and to compensate for mismatch between training and testing conditions.Ph.D., Electrical Engineering -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Noise-Robust Speech Recognition Using Deep Neural Network

Author: LI BO
Publication venue
Publication date: 06/01/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)