464 research outputs found
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Towards a Maximum Entropy Method for Estimating HMM Parameters
Training a Hidden Markov Model (HMM) to maximise the probability of a given sequence can result in over-fitting. That is, the model represents the training sequence well, but fails to generalise. In this paper, we present a possible solution to this problem, which is to maximise a linear combination of the likelihood of the training data, and the entropy of the model. We derive the necessary equations for gradient based maximisation of this combined term. The performance of the system is then evaluated in comparison with three other algorithms, on a classification task using synthetic data. The results indicate that the method is potentially useful. The main problem with the method is the computational intractability of the entropy calculation
Hidden Markov Models for Spatio-Temporal Pattern Recognition and Image Segmentation
Time and again hidden Markov models have been demonstrated to be highly effective in one-dimensional pattern recognition and classification problems such as speech recognition. A great deal of attention is now focussed on 2-D and possibly 3-D applications arising from problems encountered in computer vision in domains such as gesture, face, and handwriting recognition. Despite their widespread usage and numerous successful applications, there are few analytical results which can explain their remarkably good performance and guide researchers in selecting topologies and parameters to improve classification performance
Recommended from our members
Unsupervised intralingual and cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction
Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper firstly presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Secondly, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Thirdly, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation
- …