676 research outputs found
Adjusted Viterbi training for hidden Markov models
To estimate the emission parameters in hidden Markov models one commonly uses
the EM algorithm or its variation. Our primary motivation, however, is the
Philips speech recognition system wherein the EM algorithm is replaced by the
Viterbi training algorithm. Viterbi training is faster and computationally less
involved than EM, but it is also biased and need not even be consistent. We
propose an alternative to the Viterbi training -- adjusted Viterbi training --
that has the same order of computational complexity as Viterbi training but
gives more accurate estimators. Elsewhere, we studied the adjusted Viterbi
training for a special case of mixtures, supporting the theory by simulations.
This paper proves the adjusted Viterbi training to be also possible for more
general hidden Markov models.Comment: 45 pages, 2 figure
A minimax search algorithm for robust continuous speech recognition
In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov-model-based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of the intrinsic nature of a recursive search, the proposed method can be easily extended to perform continuous speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and efficiency of the proposed minimax search algorithm.published_or_final_versio
Accuracy of MAP segmentation with hidden Potts and Markov mesh prior models via Path Constrained Viterbi Training, Iterated Conditional Modes and Graph Cut based algorithms
In this paper, we study statistical classification accuracy of two different
Markov field environments for pixelwise image segmentation, considering the
labels of the image as hidden states and solving the estimation of such labels
as a solution of the MAP equation. The emission distribution is assumed the
same in all models, and the difference lays in the Markovian prior hypothesis
made over the labeling random field. The a priori labeling knowledge will be
modeled with a) a second order anisotropic Markov Mesh and b) a classical
isotropic Potts model. Under such models, we will consider three different
segmentation procedures, 2D Path Constrained Viterbi training for the Hidden
Markov Mesh, a Graph Cut based segmentation for the first order isotropic Potts
model, and ICM (Iterated Conditional Modes) for the second order isotropic
Potts model.
We provide a unified view of all three methods, and investigate goodness of
fit for classification, studying the influence of parameter estimation,
computational gain, and extent of automation in the statistical measures
Overall Accuracy, Relative Improvement and Kappa coefficient, allowing robust
and accurate statistical analysis on synthetic and real-life experimental data
coming from the field of Dental Diagnostic Radiography. All algorithms, using
the learned parameters, generate good segmentations with little interaction
when the images have a clear multimodal histogram. Suboptimal learning proves
to be frail in the case of non-distinctive modes, which limits the complexity
of usable models, and hence the achievable error rate as well.
All Matlab code written is provided in a toolbox available for download from
our website, following the Reproducible Research Paradigm
Whole Word Phonetic Displays for Speech Articulation Training
The main objective of this dissertation is to investigate and develop speech recognition technologies for speech training for people with hearing impairments. During the course of this work, a computer aided speech training system for articulation speech training was also designed and implemented. The speech training system places emphasis on displays to improve children\u27s pronunciation of isolated Consonant-Vowel-Consonant (CVC) words, with displays at both the phonetic level and whole word level. This dissertation presents two hybrid methods for combining Hidden Markov Models (HMMs) and Neural Networks (NNs) for speech recognition. The first method uses NN outputs as posterior probability estimators for HMMs. The second method uses NNs to transform the original speech features to normalized features with reduced correlation. Based on experimental testing, both of the hybrid methods give higher accuracy than standard HMM methods. The second method, using the NN to create normalized features, outperforms the first method in terms of accuracy. Several graphical displays were developed to provide real time visual feedback to users, to help them to improve and correct their pronunciations
Hidden Markov models with kernel density estimation of emission probabilities and their use in activity recognition
In this paper, we present a modified hidden Markov model with emission probabilities modelled by kernel density estimation and its use for activity recognition in videos. In the proposed approach, kernel density estimation of the emission probabilities is operated simultaneously with that of all the other model parameters by an adapted Baum-Welch algorithm. This allows us to retain maximum-likelihood estimation while overcoming the known limitations of mixture of Gaussions in modelling certain probability distributions. Experiments on activity recognition have been performed on groundtruthed data from the CAVIAR video surveillance database and reported in the paper. The error on the training and validation sets with kernel density estimation remains around 14-16% while for the conventional Gaussian mixture approach varies between 15 and 24%, strongly depending on the initial values chosen for the parameters. Overall, kernel density estimation proves capable of providing more flexible modelling of the emission probabilities and, unlike Gaussian mixtures, does not suffer from being highly parametric and of difficult initialisation. © 2007 IEEE
- …