1,081 research outputs found
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
Hidden Markov models and neural networks for speech recognition
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Adaptive Hidden Markov Noise Modelling for Speech Enhancement
A robust and reliable noise estimation algorithm is required in many speech enhancement
systems. The aim of this thesis is to propose and evaluate a robust noise estimation
algorithm for highly non-stationary noisy environments. In this work, we model the
non-stationary noise using a set of discrete states with each state representing a distinct
noise power spectrum. In this approach, the state sequence over time is conveniently
represented by a Hidden Markov Model (HMM).
In this thesis, we first present an online HMM re-estimation framework that models
time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics
by a sequential model update procedure that tracks the noise characteristics
during the absence of speech. In addition the algorithm will when necessary create new
model states to represent novel noise spectra and will merge existing states that have similar
characteristics. We then extend our work in robust noise estimation during speech
activity by incorporating a speech model into our existing noise model. The noise characteristics
within each state are updated based on a speech presence probability which
is derived from a modified Minima controlled recursive averaging method.
We have demonstrated the effectiveness of our noise HMM in tracking both stationary
and highly non-stationary noise, and shown that it gives improved performance over
other conventional noise estimation methods when it is incorporated into a standard
speech enhancement algorithm
- …