10 research outputs found

    Automatic syllable detection for vowel landmarks

    Get PDF
    Supervised by Kenneth N. Stevens.Also issued as Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 192-200).by Andrew Wilson Howitt

    Loanword adaptation as first-language phonological perception

    Get PDF
    We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation

    Syllable-based constraints on properties of English sounds

    Get PDF
    Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1989.Includes bibliographical references (p. 169-174).Work sponsored in part by the Office of Naval Research. N00014-82-K-0727Mark A. Randolph

    Towards multi-domain speech understanding with flexible and dynamic vocabulary

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 201-208).In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis will describe our work towards realizing such a vision, using a multi-stage architecture. Our work is focused on organizing the application of linguistic constraints in order to accommodate multiple domain topics and dynamic vocabulary at the spoken input. The philosophy is to exclusively apply below word-level linguistic knowledge at the initial stage. Such knowledge is domain-independent and general to all of the English language. Hence, this is broad enough to support any unknown words that may appear at the input, as well as input from several topic domains. At the same time, the initial pass narrows the search space for the next stage, where domain-specific knowledge that resides at the word-level or above is applied. In the second stage, we envision several parallel recognizers, each with higher order language models tailored specifically to its domain. A final decision algorithm selects a final hypothesis from the set of parallel recognizers.(cont.) Part of our contribution is the development of a novel first stage which attempts to maximize linguistic constraints, using only below word-level information. The goals are to prevent sequences of unknown words from being pruned away prematurely while maintaining performance on in-vocabulary items, as well as reducing the search space for later stages. Our solution coordinates the application of various subword level knowledge sources. The recognizer lexicon is implemented with an inventory of linguistically motivated units called morphs, which are syllables augmented with spelling and word position. This first stage is designed to output a phonetic network so that we are not committed to the initial hypotheses. This adds robustness, as later stages can propose words directly from phones. To maximize performance on the first stage, much of our focus has centered on the integration of a set of hierarchical sublexical models into this first pass. To do this, we utilize the ANGIE framework which supports a trainable context-free grammar, and is designed to acquire subword-level and phonological information statistically. Its models can generalize knowledge about word structure, learned from in-vocabulary data, to previously unseen words. We explore methods for collapsing the ANGIE models into a finite-state transducer (FST) representation which enables these complex models to be efficiently integrated into recognition. The ANGIE-FST needs to encapsulate the hierarchical knowledge of ANGIE and replicate ANGIE's ability to support previously unobserved phonetic sequences ...by Grace Chung.Ph.D

    Acoustic-phonetic constraints in continuous speech recognition: a case study using the digit vocabulary.

    Get PDF
    Thesis (Ph.D.)—Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (leaves 155-159).This electronic version was scanned from a copy of the thesis on file at the Speech Communication Group. The certified thesis is available in the Institute Archives and Special Collections.Vinton-Hayes Fellowship. DARPA, monitored through the Office of Naval Research. System Development Foundation.Ph.D

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

    Methods for more efficient, effective and robust speech recognition Michael Galler.

    No full text
    In the last few years the field of computer speech recognition has come into its own as a practical technology. These advances have been due primarily to a steady, step-wise refinement of existing techniques, and the availability of ever more powerful computers on which to implement them. However, many problems remain partially unsolved or entirely open. For example, the current methods continue to fail in cases of small, highly confusable vocabularies.New acoustic modeling techniques and search methods are proposed in the area of robust, speaker-independent continuous speech recognition. Randomized search techniques are used to refine and improve the topologies and clustered training of allophone hidden Markov models. A new approach for word hypothesization and pronunciation modeling is proposed, based on pseudo-syllabic units. Algorithms are described for transforming a lattice of syllables into words, and learning the phonotactics of syllables automatically in a statistical framework. Next, a multi-grammar method for generating alternative hypotheses is presented, together with the method's experimental evaluation in a telephone-based spelled word recognition system. A blueprint is given for building time and memory-optimized speech decoders. Finally, analysis and recognition techniques are applied to the problem of enhancing the intelligibility of speech produced by alaryngeal speakers
    corecore