10,696 research outputs found
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
Robust ASR using Support Vector Machines
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units.
In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad
Speaker segmentation and clustering
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
We present state-of-the-art automatic speech recognition (ASR) systems
employing a standard hybrid DNN/HMM architecture compared to an attention-based
encoder-decoder design for the LibriSpeech task. Detailed descriptions of the
system development, including model design, pretraining schemes, training
schedules, and optimization approaches are provided for both system
architectures. Both hybrid DNN/HMM and attention-based systems employ
bi-directional LSTMs for acoustic modeling/encoding. For language modeling, we
employ both LSTM and Transformer based architectures. All our systems are built
using RWTHs open-source toolkits RASR and RETURNN. To the best knowledge of the
authors, the results obtained when training on the full LibriSpeech training
set, are the best published currently, both for the hybrid DNN/HMM and the
attention-based systems. Our single hybrid system even outperforms previous
results obtained from combining eight single systems. Our comparison shows that
on the LibriSpeech 960h task, the hybrid DNN/HMM system outperforms the
attention-based system by 15% relative on the clean and 40% relative on the
other test sets in terms of word error rate. Moreover, experiments on a reduced
100h-subset of the LibriSpeech training corpus even show a more pronounced
margin between the hybrid DNN/HMM and attention-based architectures.Comment: Proceedings of INTERSPEECH 201
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
- …