3,151 research outputs found
Joint Morphological and Syntactic Disambiguation
In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve
Statistical parsing of morphologically rich languages (SPMRL): what, how and whither
The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
We present a novel cross-lingual transfer method for paradigm completion, the
task of mapping a lemma to its inflected forms, using a neural encoder-decoder
model, the state of the art for the monolingual task. We use labeled data from
a high-resource language to increase performance on a low-resource language. In
experiments on 21 language pairs from four different language families, we
obtain up to 58% higher accuracy than without transfer and show that even
zero-shot and one-shot learning are possible. We further find that the degree
of language relatedness strongly influences the ability to transfer
morphological knowledge.Comment: Accepted at ACL 201
Theoretical issues in the interpretation of Cappadocian, a not-so-dead Greek contact language
Cappadocian is a mixed Greek-Turkish dialect continuum spoken in the Turkish Central Anatolia Region until the population exchange between Greece and Turkey in the 1920s.
Only a few Cappadocian dialects are still spoken in present-day Greece. Since the publication of Thomason and Kaufman’s Language Contact, Creolization, and Genetic Linguistics in 1988, Cappadocian has attracted the attention of historical and contact linguists, because of its unique mixed character. In this paper, I will discuss a number of theoretical issues in the interpretation of the linguistic structure of Cappadocian, focusing on the following topics: (1) the status of loan phonemes and loan morphemes in contact languages, (2) the distinction between code switching and code mixing in relation to Poplack’s Free Morpheme Constraint, (3) the schizoid typology of contact languages
- …