58 research outputs found

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    Sequence Comparison in Historical Linguistics

    Get PDF
    B

    Sequence Comparison in Historical Linguistics

    Get PDF

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages

    The construction of Erasmus student identity: a discourse historic approach

    Get PDF
    This thesis examines the construction of a student mobility programme and mobile students’ identities in discourses of Erasmus exchange students (bottom-up discourses) and political speeches and institutional texts (top-down discourses). By adopting a post-modern perspective on identity and its construction in discourse, this study intends to fill the gap in the field of student mobility research, which has been predominantly concerned with North American, rather than European, or even less so with the Latvian context and has been mainly quantitative in nature, looking at large-scale statistical data, while overlooking the complexities and variation among individual experiences. The study applies the Discourse Historical Approach (DHA) to three sets of data: individual interviews with incoming Erasmus exchange students in Latvia, political speeches by the former EU Minister of Education, A. Vassiliou and online texts published on the web page of the Latvian State Education Agency. The results indicate that mobile European exchange students’ identities are constructed differently in institutional as opposed to the experiential contexts. It seems that on the one hand, Latvian institutional texts focus on building a positive representation of Latvia, characterised by openness and its affiliations with Europe and the world as the outcome of the Erasmus programme; the EU political discourse promotes the triumph of Erasmus as a European project, pointing to the vitality of the student mobility programme leading to an increase in the number of people with European identity as the actual proof of the programme’s success. Contrary to the institutional online texts and the Commissioner’s speeches, on the other hand, the Erasmus students indicate their awareness of the complex, multiple and changing nature of mobile students’ identities and their construction in discourse when faced with new contexts and diverse individuals

    The construction of Erasmus student identity: a discourse historic approach

    Get PDF
    This thesis examines the construction of a student mobility programme and mobile students’ identities in discourses of Erasmus exchange students (bottom-up discourses) and political speeches and institutional texts (top-down discourses). By adopting a post-modern perspective on identity and its construction in discourse, this study intends to fill the gap in the field of student mobility research, which has been predominantly concerned with North American, rather than European, or even less so with the Latvian context and has been mainly quantitative in nature, looking at large-scale statistical data, while overlooking the complexities and variation among individual experiences. The study applies the Discourse Historical Approach (DHA) to three sets of data: individual interviews with incoming Erasmus exchange students in Latvia, political speeches by the former EU Minister of Education, A. Vassiliou and online texts published on the web page of the Latvian State Education Agency. The results indicate that mobile European exchange students’ identities are constructed differently in institutional as opposed to the experiential contexts. It seems that on the one hand, Latvian institutional texts focus on building a positive representation of Latvia, characterised by openness and its affiliations with Europe and the world as the outcome of the Erasmus programme; the EU political discourse promotes the triumph of Erasmus as a European project, pointing to the vitality of the student mobility programme leading to an increase in the number of people with European identity as the actual proof of the programme’s success. Contrary to the institutional online texts and the Commissioner’s speeches, on the other hand, the Erasmus students indicate their awareness of the complex, multiple and changing nature of mobile students’ identities and their construction in discourse when faced with new contexts and diverse individuals

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
    • …
    corecore