3,151 research outputs found

    Joint Morphological and Syntactic Disambiguation

    Get PDF
    In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve

    Improving Statistical MT through Morphological Analysis

    Get PDF

    Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

    Get PDF
    The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

    One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

    Full text link
    We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.Comment: Accepted at ACL 201

    Theoretical issues in the interpretation of Cappadocian, a not-so-dead Greek contact language

    Get PDF
    Cappadocian is a mixed Greek-Turkish dialect continuum spoken in the Turkish Central Anatolia Region until the population exchange between Greece and Turkey in the 1920s. Only a few Cappadocian dialects are still spoken in present-day Greece. Since the publication of Thomason and Kaufman’s Language Contact, Creolization, and Genetic Linguistics in 1988, Cappadocian has attracted the attention of historical and contact linguists, because of its unique mixed character. In this paper, I will discuss a number of theoretical issues in the interpretation of the linguistic structure of Cappadocian, focusing on the following topics: (1) the status of loan phonemes and loan morphemes in contact languages, (2) the distinction between code switching and code mixing in relation to Poplack’s Free Morpheme Constraint, (3) the schizoid typology of contact languages
    corecore