Search CORE

56,434 research outputs found

My many selves are still me: Motivation and multilingualism

Author: Thompson Amy S.
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 01/01/2020
Field of study

Two concepts of multilingualism that relate to the selves aspect of Dörnyei’s (2009) L2 motivational self system (L2MSS) are highlighted in this article: Thompson’s concept of perceived positive language interaction (PPLI) and Henry’s notion of the ideal multilingual self. With the dynamic model of multilingualism informing both concepts (Herdina & Jessner, 2002; Jessner, 2006, 2008), the intangible advantage that multilingual speakers have over monolingual speakers is clearly articulated in the discussion of this topic. The interconnectivity of language systems is an inherent aspect of the DMM; as such, both Thompson with PPLI and Henry with the ideal multilingual self incorporate the DMM as a framework to indicate the fluid nature of these constructs as additional language learning experiences are added to the system over time. This article further explores the dynamicity of multilingual learners’ language systems and the influences that induce change. Specifically, data from Thompson’s (2017b) study on LOTE learners are re-examined to explore this question. Additionally, excerpts from Natasha Lvovich’s (1997) The Multilingual Self, an autobiography of an L1 Russian speaker, are analyzed to present different possible models of incorporating the multilingual self and PPLI. The article ends with a discussion of an inherently multilingual context, as well as thoughts regarding the possibility of different types of future selves

Studies in Second Language Learning and Teaching

Biblioteka Nauki - repozytorium artykuÅÃ³w

Integrated content presentation for multilingual and multimedia information access

Author: Jones Gareth J.F.
Wade Vincent
Publication venue
Publication date: 01/08/2006
Field of study

For multilingual and multimedia information retrieval from multiple potentially distributed collections generating the output in the form of standard ranked lists may often mean that a user has to explore the contents of many lists before finding sufficient relevant or linguistically accessible material to satisfy their information need. In some situations delivering an integrated multilingual multimedia presentation could enable the user to explore a topic allowing them to select from among a range of available content based on suitably chosen displayed metadata. A presentation of this type has similarities with the outputs of existing adaptive hypermedia systems. However, such systems are generated based on “closed” content with sophisticated user and domain models. Extending them to “open” domain information retrieval applications would raise many issues. We present an outline exploration of what will form a challenging new direction for research in multilingual information access

Irish Universities

DCU Online Research Access Service

Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Author: Boella G.
Costamagna F.
Di Caro L.
Gerbaudo Marco
Grossio L.
Nanda Rohan
Siragusa Giovanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets

Maastricht University Research Portal

Institutional Research Information System University of Turin

Multilingual Dynamic Topic Model

Author: Granroth-Wilding Mark
Zosa Elaine
Publication venue: INCOMA
Publication date: 01/09/2019
Field of study

Dynamic topic models (DTMs) capture the evolution of topics and trends in time series data. Current DTMs are applicable only to monolingual datasets. In this paper we present the multilingual dynamic topic model (ML-DTM), a novel topic model that combines DTM with an existing multilingual topic modeling method to capture crosslingual topics that evolve across time. We present results of this model on a parallel German-English corpus of news articles and a comparable corpus of Finnish and Swedish news articles. We demonstrate the capability of ML-DTM to track significant events related to a topic and show that it finds distinct topics and performs as well as existing multilingual topic models in aligning cross-lingual topics.Peer reviewe

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

Author: Adelani David Ifeoluwa
Alabi Jesujoba O.
Gao Haonan
Lee Annie En-Shiun
Liu Hannah
Mao Yanke
Shen Xiaoyu
Vassilyev Nikita
Publication venue
Publication date: 14/09/2023
Field of study

Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages. In this paper, we created SIB-200 -- a large-scale open-sourced benchmark dataset for topic classification in 200 languages and dialects to address the lack of evaluation dataset for Natural Language Understanding (NLU). For many of the languages covered in SIB-200, this is the first publicly available evaluation dataset for NLU. The dataset is based on Flores-200 machine translation corpus. We annotated the English portion of the dataset and extended the sentence-level annotation to the remaining 203 languages covered in the corpus. Despite the simplicity of this task, our evaluation in full-supervised setting, cross-lingual transfer setting and prompting of large language model setting show that there is still a large gap between the performance of high-resource and low-resource languages when multilingual evaluation is scaled to numerous world languages. We found that languages unseen during the pre-training of multilingual language models, under-represented language families (like Nilotic and Altantic-Congo), and languages from the regions of Africa, Americas, Oceania and South East Asia, often have the lowest performance on our topic classification dataset. We hope our dataset will encourage a more inclusive evaluation of multilingual language models on a more diverse set of languages. https://github.com/dadelani/sib-200Comment: under submissio

arXiv.org e-Print Archive