77 research outputs found

    Temiar Reduplication in One-Level Prosodic Morphology

    Full text link
    Temiar reduplication is a difficult piece of prosodic morphology. This paper presents the first computational analysis of Temiar reduplication, using the novel finite-state approach of One-Level Prosodic Morphology originally developed by Walther (1999b, 2000). After reviewing both the data and the basic tenets of One-level Prosodic Morphology, the analysis is laid out in some detail, using the notation of the FSA Utilities finite-state toolkit (van Noord 1997). One important discovery is that in this approach one can easily define a regular expression operator which ambiguously scans a string in the left- or rightward direction for a certain prosodic property. This yields an elegant account of base-length-dependent triggering of reduplication as found in Temiar.Comment: 9 pages, 2 figures. Finite-State Phonology: SIGPHON-2000, Proceedings of the Fifth Workshop of the ACL Special Interest Group in Computational Phonology, pp.13-21. Aug. 6, 2000. Luxembour

    Computational Perspectives on Phonological Constituency and Recursion

    Get PDF
    Whether or not phonology has recursion is often conflated with whether or not phonology has strings or trees as data structures. Taking a computational perspective from formal language theory and focusing on how phonological strings and trees are built, we disentangle these issues. We show that even considering the boundedness of words and utterances in physical realization and the lack of observable examples of potential recursive embedding of phonological constituents beyond a few layers, recursion is a natural consequence of expressing generalization in phonological grammars for strings and trees. While prosodically-conditioned phonological patterns can be represented using grammars for strings, e.g., with bracketed string representations, we show how grammars for trees provide a natural way to express these patterns and provide insight into the kinds of analyses that phonologists have proposed for them.Que la fonologia mostri o no recursivitat sovint va lligat al fet que tingui o no cadenes o arbres en l'estructura de les seves dades. A partir de la perspectiva computacional de la teoria formal del llenguatge i tenint en compte com es construeixen les cadenes i els arbres fonològics, mirem de destriar aquestes qüestions. Mostrem que, fins i tot tenint en compte la limitació de paraules i enunciats en la realització física i la manca d'exemples observables d'incorporació recursiva potencial de constituents fonològics més enllà d'unes poques capes, la recursivitat és una conseqüència natural de l'expressió de generalitzacions fonològiques per a cadenes i arbres. Tot i que els patrons fonològics condicionats prosòdicament es poden representar utilitzant gramàtiques per a cadenes, per exemple amb representacions amb claudàtors, mostrem com les gramàtiques amb arbres proporcionen una manera natural d'expressar aquests patrons i proporcionen coneixement rellevant sobre els tipus d'anàlisis d'aquests patrons que s'han proposat des de la fonologia

    Towards multi-domain speech understanding with flexible and dynamic vocabulary

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 201-208).In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis will describe our work towards realizing such a vision, using a multi-stage architecture. Our work is focused on organizing the application of linguistic constraints in order to accommodate multiple domain topics and dynamic vocabulary at the spoken input. The philosophy is to exclusively apply below word-level linguistic knowledge at the initial stage. Such knowledge is domain-independent and general to all of the English language. Hence, this is broad enough to support any unknown words that may appear at the input, as well as input from several topic domains. At the same time, the initial pass narrows the search space for the next stage, where domain-specific knowledge that resides at the word-level or above is applied. In the second stage, we envision several parallel recognizers, each with higher order language models tailored specifically to its domain. A final decision algorithm selects a final hypothesis from the set of parallel recognizers.(cont.) Part of our contribution is the development of a novel first stage which attempts to maximize linguistic constraints, using only below word-level information. The goals are to prevent sequences of unknown words from being pruned away prematurely while maintaining performance on in-vocabulary items, as well as reducing the search space for later stages. Our solution coordinates the application of various subword level knowledge sources. The recognizer lexicon is implemented with an inventory of linguistically motivated units called morphs, which are syllables augmented with spelling and word position. This first stage is designed to output a phonetic network so that we are not committed to the initial hypotheses. This adds robustness, as later stages can propose words directly from phones. To maximize performance on the first stage, much of our focus has centered on the integration of a set of hierarchical sublexical models into this first pass. To do this, we utilize the ANGIE framework which supports a trainable context-free grammar, and is designed to acquire subword-level and phonological information statistically. Its models can generalize knowledge about word structure, learned from in-vocabulary data, to previously unseen words. We explore methods for collapsing the ANGIE models into a finite-state transducer (FST) representation which enables these complex models to be efficiently integrated into recognition. The ANGIE-FST needs to encapsulate the hierarchical knowledge of ANGIE and replicate ANGIE's ability to support previously unobserved phonetic sequences ...by Grace Chung.Ph.D

    Coding Partitions of Regular Sets *

    Get PDF
    Abstract A coding partition of a set of words partitions this set into classes such that whenever a sequence, of minimal length, has two distinct factorizations, the words of these factorizations belong to the same class. The canonical coding partition is the finest coding partition that partitions the set of words in at most one unambiguous class and other classes that localize the ambiguities in the factorizations of finite sequences. We prove that the canonical coding partition of a regular set contains a finite number of regular classes and we give an algorithm for computing this partition. From this we derive a canonical decomposition of a regular monoid into a free product of finitely many regular monoids

    Computational Locality in Morphological Maps

    Get PDF

    Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2). 29 November 2012, Lisbon, Portugal

    Get PDF
    Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), held in Lisbon, Portugal on 29 November 2012
    corecore