Search CORE

3 research outputs found

Applying SPHINX-II to the DARPA Wall Street Journal CSR Task

Author: F. Alleva
H. Hon
M. Hwang
R. Rosenfeld
R. Weide
X. Huang
Publication venue
Publication date: 01/01/1992
Field of study

This paper reports recent efforts to apply the speaker-independent SPHINX-H system to the DARPA Wall Street Journal continuous speech recognition task. In SPHINX-H, we incorporated additional dynamic and speaker-normalized features, replaced discrete models with sex-dependent semi-continuous hidden Markov models, augmented within-word triphones with between-word triphones, and extended generalized triphone models to shareddistribution models. The configuration of SPHINX-II being used for this task includes sex-dependent, semi-continuous, shareddistribution hidden Markov models and left context dependent between-word triphones. In applying our technology to this task we addressed issues that were not previously of concern owing to the (relatively) small size of the Resource Management task. 1 1

CiteSeerX

Crossref

Applying SPHINX-II to the DARPA Wall Street Journal CSR Task

Author
Publication venue: 'Defense Technical Information Center (DTIC)'
Publication date
Field of study

Crossref

Towards an automatic speech recognition system for use by deaf students in lectures

Author: Collingham Russell James
Publication venue
Publication date: 01/01/1994
Field of study

According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain

Durham e-Theses