3,950 research outputs found

    Modeling Global Syntactic Variation in English Using Dialect Classification

    Get PDF
    This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

    The Oxford Handbook of Evidentiality

    Get PDF
    This volume offers a thorough, systematic, and crosslinguistic account of evidentiality, the linguistic encoding of the source of information on which a statement is based. In some languages, the speaker always has to specify this source - for example whether they saw the event, heard it, inferred it based on visual evidence or common sense, or was told about it by someone else. While not all languages have obligatory marking of this type, every language has ways of referring to information source and associated epistemological meanings. The continuum of epistemological expressions covers a range of devices from the lexical means in familiar European languages and in many languages of Aboriginal Australia to the highly grammaticalized systems in Amazonia or North America. In this handbook, experts from a variety of fields explore topics such as the relationship between evidentials and epistemic modality, contact-induced changes in evidential systems, the acquisition of evidentials, and formal semantic theories of evidentiality. The book also contains detailed case studies of evidentiality in language families across the world, including Algonquian, Korean, Nakh-Dagestanian, Nambikwara, Turkic, Uralic, and Uto-Aztecan

    UniMorph 4.0:Universal Morphology

    Get PDF

    The relationship between first language acquisition and dialect variation:Linking resources from distinct disciplines in a CLARIN-NL project

    Get PDF
    AbstractIt is remarkable that first language acquisition and historical dialectology should have remained strange bedfellows for so long considering the common assumption in historical linguistics that language change is due to the process of non-target transmission of linguistic features, forms and structures between generations, and thus between parents or adults and children. Both disciplines have remained isolated from each other due to, among other things, different research questions, methods of data-collection and types of empirical resources. The aim of this paper is to demonstrate that the common assumption in historical linguistics mentioned above can be examined with the help of Digital Humanities projects like CLARIN. CLARIN infrastructure makes it possible to carry out e-Humanities type research by combining datasets from distinct disciplines through tools for data processing. The outcome of the CLARIN-NL COAVA-project (acronym of: Cognition, Acquisition and Variation tool) allows researchers to access two datasets from two different sub disciplines simultaneously, namely Dutch first child language acquisition files located in Childes (MacWhinney, 2000) and historical Dutch Dialect Dictionaries through the development of a tool for easy exploration of nouns

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems’ predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving \u3e90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems’ performance on previously unseen lemmas

    UniMorph 4.0:Universal Morphology

    Get PDF
    • …
    corecore