160 research outputs found

    Building an endangered language resource in the classroom: Universal dependencies for Kakataibo

    Get PDF
    In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resourc

    ‘Jaysus, keep talking like that and you’ll fit right in’- an investigation of oral Irish English in contemporary Irish fiction

    Get PDF
    This project is an interdisciplinary and comparative investigation of the reproduction of linguistic features of Irish English (IrE) present in contemporary IrE fiction. To do this, a corpus of over 1 million words comprising 16 works of fiction published in the Republic of Ireland by 8 authors was compiled: the Corpus of Contemporary Fictionalized Irish English (CoFIrE). The goal of this thesis, therefore, is to determine 1) which are the most frequently reproduced features of IrE orality in contemporary IrE fiction, 1a) how realistic is their fictional portrayal when contrasted against real spoken uses, 2) what does the use of the most frequently reproduced features in the corpus encode with regard to speaker identity, and 3) in what manner may modern Irishness be encoded through the reproduction of pragmatic items in fiction. Utilizing a variety of interdisciplinary methodologies, including Corpus Stylistics, Corpus Linguistics, Sociolinguistic, and Pragmatic techniques, the thesis identifies signature linguistic features that are thought to be representative of IrE in the corpus via quantitative and qualitative, comparative corpus analysis. To evaluate the level of realism inherent in the fictional rendition, the findings are contrasted against the Limerick Corpus of Irish English and the BNC2014. A second corpus comprising books by one of the CoFIrE authors, i.e. Paul Howard, was also compiled. Thus, the Ross O’Carroll-Kelly Corpus (CoROCK) was created given this series’ reputation for being a chronicler of modern Ireland and because of the high frequency of IrE orality reproduction these books were found to contribute to CoFIrE. Two case studies on non-standard, non-traditionally IrE high frequency intensifiers are conducted on CoROCK to better answer the research questions regarding the potential indexation of modern Irishness through speech reproduction in fiction. Finally, by evaluating the type of speaker identity these features may index when used in contemporary fiction, this thesis determines the type of modern Irishness that appears to be encoded through fictional speech representations.N

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference

    Particles, word order, and intonation

    Get PDF
    Synopsis: This study explores information structure (IS) within the framework of corpus linguistics and functional linguistics. As a case study, it investigates IS phenomena in spoken Japanese: particles including so-called topic particles, case particles, and zero particles; word order; and intonation. The study discusses how these phenomena are related to cognitive and communicative mechanisms of humans
    • 

    corecore