56,434 research outputs found

    My many selves are still me: Motivation and multilingualism

    Get PDF
    Two concepts of multilingualism that relate to the selves aspect of Dörnyei’s (2009) L2 motivational self system (L2MSS) are highlighted in this article: Thompson’s concept of perceived positive language interaction (PPLI) and Henry’s notion of the ideal multilingual self. With the dynamic model of multilingualism informing both concepts (Herdina & Jessner, 2002; Jessner, 2006, 2008), the intangible advantage that multilingual speakers have over monolingual speakers is clearly articulated in the discussion of this topic. The interconnectivity of language systems is an inherent aspect of the DMM; as such, both Thompson with PPLI and Henry with the ideal multilingual self incorporate the DMM as a framework to indicate the fluid nature of these constructs as additional language learning experiences are added to the system over time. This article further explores the dynamicity of multilingual learners’ language systems and the influences that induce change. Specifically, data from Thompson’s (2017b) study on LOTE learners are re-examined to explore this question. Additionally, excerpts from Natasha Lvovich’s (1997) The Multilingual Self, an autobiography of an L1 Russian speaker, are analyzed to present different possible models of incorporating the multilingual self and PPLI. The article ends with a discussion of an inherently multilingual context, as well as thoughts regarding the possibility of different types of future selves

    Integrated content presentation for multilingual and multimedia information access

    Get PDF
    For multilingual and multimedia information retrieval from multiple potentially distributed collections generating the output in the form of standard ranked lists may often mean that a user has to explore the contents of many lists before finding sufficient relevant or linguistically accessible material to satisfy their information need. In some situations delivering an integrated multilingual multimedia presentation could enable the user to explore a topic allowing them to select from among a range of available content based on suitably chosen displayed metadata. A presentation of this type has similarities with the outputs of existing adaptive hypermedia systems. However, such systems are generated based on “closed” content with sophisticated user and domain models. Extending them to “open” domain information retrieval applications would raise many issues. We present an outline exploration of what will form a challenging new direction for research in multilingual information access

    Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

    Get PDF
    The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets

    Multilingual Dynamic Topic Model

    Get PDF
    Dynamic topic models (DTMs) capture the evolution of topics and trends in time series data. Current DTMs are applicable only to monolingual datasets. In this paper we present the multilingual dynamic topic model (ML-DTM), a novel topic model that combines DTM with an existing multilingual topic modeling method to capture crosslingual topics that evolve across time. We present results of this model on a parallel German-English corpus of news articles and a comparable corpus of Finnish and Swedish news articles. We demonstrate the capability of ML-DTM to track significant events related to a topic and show that it finds distinct topics and performs as well as existing multilingual topic models in aligning cross-lingual topics.Peer reviewe

    SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

    Full text link
    Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages. In this paper, we created SIB-200 -- a large-scale open-sourced benchmark dataset for topic classification in 200 languages and dialects to address the lack of evaluation dataset for Natural Language Understanding (NLU). For many of the languages covered in SIB-200, this is the first publicly available evaluation dataset for NLU. The dataset is based on Flores-200 machine translation corpus. We annotated the English portion of the dataset and extended the sentence-level annotation to the remaining 203 languages covered in the corpus. Despite the simplicity of this task, our evaluation in full-supervised setting, cross-lingual transfer setting and prompting of large language model setting show that there is still a large gap between the performance of high-resource and low-resource languages when multilingual evaluation is scaled to numerous world languages. We found that languages unseen during the pre-training of multilingual language models, under-represented language families (like Nilotic and Altantic-Congo), and languages from the regions of Africa, Americas, Oceania and South East Asia, often have the lowest performance on our topic classification dataset. We hope our dataset will encourage a more inclusive evaluation of multilingual language models on a more diverse set of languages. https://github.com/dadelani/sib-200Comment: under submissio
    • 

    corecore