17,564 research outputs found

    Spontaneous speech recognition using HMMs

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2003.Includes bibliographical references (leaf 63).This thesis describes a speech recognition system that was built to support spontaneous speech understanding. The system is composed of (1) a front end acoustic analyzer which computes Mel-frequency cepstral coefficients, (2) acoustic models of context-dependent phonemes (triphones), (3) a back-off bigram statistical language model, and (4) a beam search decoder based on the Viterbi algorithm. The contextdependent acoustic models resulted in 67.9% phoneme recognition accuracy on the standard TIMIT speech database. Spontaneous speech was collected using a "Wizard of Oz" simulation of a simple spatial manipulation game. Naive subjects were instructed to manipulate blocks on a computer screen in order to solve a series of geometric puzzles using only spoken commands. A hidden human operator performed actions in response to each spoken command. The speech from thirteen subjects formed the corpus for the speech recognition results reported here. Using a task-specific bigram statistical language model and context-dependent acoustic models, the system achieved a word recognition accuracy of 67.6%. The recognizer operated using a vocabulary of 523 words. The recognition had a word perplexity of 36.by Benjamin W. Yoder.M.Eng

    Adapting Machine Translation Models toward Misrecognized Speech with Text-to-Speech Pronunciation Rules and Acoustic Confusability

    Get PDF
    open4sihttp://interspeech2015.orgIn the spoken language translation pipeline, machine translation systems that are trained solely on written bitexts are often unable to recover from speech recognition errors due to the mismatch in training data. We propose a novel technique to simulate the errors generated by an ASR system, using the ASR system’s pronunciation dictionary and language model. Lexical entries in the pronunciation dictionary are converted into phoneme sequences using a text-to-speech (TTS) analyzer and stored in a phoneme-to-word translation model. The translation model and ASR language model are combined into a phoneme-to-word MT system that “damages” clean texts to look like ASR outputs based on acoustic confusions. Training texts are TTS-converted and damaged into synthetic ASR data for use as adaptation data for training a speech translation system. Our proposed technique yields consistent improvements in translation quality on English-French lectures.Ruiz Nicholas; Gao Qin; Lewis Will ; Marcello FedericoRuiz, Nicholas; Gao, Qin; Lewis, Will; Federico, Marcell

    SKOPE: A connectionist/symbolic architecture of spoken Korean processing

    Full text link
    Spoken language processing requires speech and natural language integration. Moreover, spoken Korean calls for unique processing methodology due to its linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic spoken Korean processing engine, which emphasizes that: 1) connectionist and symbolic techniques must be selectively applied according to their relative strength and weakness, and 2) the linguistic characteristics of Korean must be fully considered for phoneme recognition, speech and language integration, and morphological/syntactic processing. The design and implementation of SKOPE demonstrates how connectionist/symbolic hybrid architectures can be constructed for spoken agglutinative language processing. Also SKOPE presents many novel ideas for speech and language processing. The phoneme recognition, morphological analysis, and syntactic analysis experiments show that SKOPE is a viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be presented at IJCAI95 workshops on new approaches to learning for natural language processin

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Chart-driven Connectionist Categorial Parsing of Spoken Korean

    Full text link
    While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglutinative languages such as Korean and Japanese, the morphological processing plays a major role in the language processing since these languages have very complex morphological phenomena and relatively simple syntactic functionality. Obviously degenerated morphological processing limits the usable vocabulary size for the system and word-level dictionary results in exponential explosion in the number of dictionary entries. For the agglutinative languages, we need sub-word level integration which leaves rooms for general morphological processing. In this paper, we developed a phoneme-level integration model of speech and linguistic processings through general morphological analysis for agglutinative languages and a efficient parsing scheme for that integration. Korean is modeled lexically based on the categorial grammar formalism with unordered argument and suppressed category extensions, and chart-driven connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9

    Fast and Accurate OOV Decoder on High-Level Features

    Full text link
    This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from word lattices, followed by their smoothing. For the PPB features we developed a special novel very fast, simple and efficient OOV decoder. Experimental results are presented on the Georgian language from the IARPA Babel Program, which was the test language in the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum term weighted value (MTWV) metric and computational speed, for single ASR systems, the proposed approach significantly outperforms the state-of-the-art approach based on using in-vocabulary proxies for OOV keywords in the indexed database. The comparison of the two OOV KWS approaches on the fusion results of the nine different ASR systems demonstrates that the proposed OOV decoder outperforms the proxy-based approach in terms of MTWV metric given the comparable processing speed. Other important advantages of the OOV decoder include extremely low memory consumption and simplicity of its implementation and parameter optimization.Comment: Interspeech 2017, August 2017, Stockholm, Sweden. 201
    • …
    corecore