2,002 research outputs found

    Tracking Typological Traits of Uralic Languages in Distributed Language Representations

    Get PDF
    Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successes have been shown for such language representations, they have not been subjected to much typological probing. In this paper, we first look at whether this type of language representations are empirically useful for model transfer between Uralic languages in deep neural networks. We then investigate which typological features are encoded in these representations by attempting to predict features in the World Atlas of Language Structures, at various stages of fine-tuning of the representations. We focus on Uralic languages, and find that some typological traits can be automatically inferred with accuracies well above a strong baseline.Comment: Finnish abstract included in the pape

    Verb-Framed Motion Events in Uralic (with special attention to Mari)

    Get PDF
    Uralic languages have been described as “satellite-framed” in general linguistic publications, meaning that the path of a motion event is typically not expressed by the verb of motion, but by an independent element – a particle, an affix, etc. – that accompanies the verb or verbal stem. While this assertion holds true for the critical mass of Uralic languages, it seems to be too broad – especially with respect to languages influenced by “verb-framed” Turkic languages, in which the verb of motion typically denotes the path. This paper aims to give a comprehensive overview of the expression of motion events in Mari, the presumably most heavily verb-framed extant Uralic language, and a brief overview of verb-framed motion events in other Uralic languages

    The Inverse Agreement Constraint in Uralic languages

    Get PDF
    The paper aims to answer the question why object–verb agreement is blocked in Hungarian, Tundra Nenets, Selkup, and Nganasan if the object is a first or second person pronoun. Based on Dalrymple & Nikolaeva (2011), it is argued that object–verb agreement serves (or served historically) to mark the secondary topic status of the object. The gaps in object-verb agreement can be derived from the Inverse Agreement Constraint, a formal, semantically unmotivated constraint observed by Comrie (1980) in Chukchee, Koryak and Kamchadal, forbidding object-verb agreement if the object is more ʻanimate’ than the subject: The paper claims that the Inverse Agreement Constraint is a constraint on information structure. What it requires is that a secondary topic be less topical than the primary topic. An object more topical than the primary topic can only figure as a focus. A version of the constraint can also explain why Hungarian first and second person objects have no accusative suffix, and why accusative marking is optional in the case of objects having a first or second person possessor

    The 11th International Congress for Finno-Ugric Studies: Finno-Ugric Peoples and Languages in the 21st Century

    Get PDF
    The 11th International Congress for Finno-Ugric Studies was one of the biggest conferences in the last years among the Finno-Ugric events. Finno-Ugric People and Languages in the 21st Century dealt mainly with the language and political situation of the Finno-Ugric languages in Russia. Recent researches on descriptive linguistics and new approaches to theoretical and typological issues were also presented at the Congress

    Data-Driven Morphological Analysis for Uralic Languages

    Get PDF
    This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic languages. The paper differs from previous work in thatour work covers both lemmatization and generating ambiguous analyses. Whilehand-crafted finite-state transducers represent the state of the art in morpholog-ical analysis for most Uralic languages, we believe that there is a place for data-driven approaches, especially with respect to making up for lack of completenessin the шlexicon. We present results for nine Uralic languages that show that, atleast for basic nominal morphology for six out of the nine languages, data-drivenmethods can achieve an F-score of over 90%, providing results that approach thoseof finite-state techniques. We also compare our system to an earlier approach toFinnish data-driven morphological analysis (Silfverberg and Hulden,2018) andshow that our system outperforms this baseline.Peer reviewe

    On the historical background of habitive and izafet constructions in Hungarian

    Get PDF
    This paper deals with two ways of expressing possessive relationships, their morphological make-up and the possible circumstances of their emergence. One of these is the habitive construction (`X has Y'), whereas the other is the attributive possessive construction (`X's Y, the Y of X'). The former is a clause, whereas the latter is a phrase. It will be argued that both types of constructions may have emerged in the Uralic languages without the contribution of any foreign influence, but as far as the retention of the latter is concerned, foreign influence may have had a role in it in Uralic languages that were engaged in intensive Uralic–Turkic linguistic contacts

    Indo-Uralic and Altaic

    Get PDF
    Elsewhere I have argued that the Indo-European verbal system can be understood in terms of its Indo-Uralic origins because the reconstructed Indo-European endings can be derived from combinations of Indo-Uralic morphemes by a series of well-motivated phonetic and analogic developments (2002). Moreover, I have claimed (2004b) that the Proto-Uralic consonant gradation accounts for the peculiar correlations between Indo-European root structure and accentuation discovered by Lubotsky (1988)

    The Indo-Uralic verb

    Get PDF
    C.C. Uhlenbeck made a distinction between two components of Proto-Indo-European, which he called A and B (1935a: 133ff.). The first component comprises pronouns, verbal roots, and derivational suffixes, and may be compared with Uralic, whereas the second component contains isolated words, such as numerals and most underived nouns, which have a different source. The wide attestation of the Indo-European numerals must be attributed to the development of trade resulting from the increased mobility which was the primary cause of the Indo-European expansions. Numerals do not belong to the basic vocabulary of a neolithic culture, as is clear from their absence in Proto-Uralic (cf. also Collinder 1965: 112) and from the spread of Chinese numerals throughout East Asia. Though Uhlenbeck objects to the term “substratum” for his B complex, I think that it is a perfectly appropriate denomination
    corecore