56,434 research outputs found
My many selves are still me: Motivation and multilingualism
Two concepts of multilingualism that relate to the selves aspect of Dörnyeiâs (2009) L2 motivational self system (L2MSS) are highlighted in this article: Thompsonâs concept of perceived positive language interaction (PPLI) and Henryâs notion of the ideal multilingual self. With the dynamic model of multilingualism informing both concepts (Herdina & Jessner, 2002; Jessner, 2006, 2008), the intangible advantage that multilingual speakers have over monolingual speakers is clearly articulated in the discussion of this topic. The interconnectivity of language systems is an inherent aspect of the DMM; as such, both Thompson with PPLI and Henry with the ideal multilingual self incorporate the DMM as a framework to indicate the fluid nature of these constructs as additional language learning experiences are added to the system over time. This article further explores the dynamicity of multilingual learnersâ language systems and the influences that induce change. Specifically, data from Thompsonâs (2017b) study on LOTE learners are re-examined to explore this question. Additionally, excerpts from Natasha Lvovichâs (1997) The Multilingual Self, an autobiography of an L1 Russian speaker, are analyzed to present different possible models of incorporating the multilingual self and PPLI. The article ends with a discussion of an inherently multilingual context, as well as thoughts regarding the possibility of different types of future selves
Integrated content presentation for multilingual and multimedia information access
For multilingual and multimedia information retrieval from
multiple potentially distributed collections generating the
output in the form of standard ranked lists may often mean
that a user has to explore the contents of many lists before
finding sufficient relevant or linguistically accessible material to satisfy their information need. In some situations delivering an integrated multilingual multimedia presentation could enable the user to explore a topic allowing them to select from among a range of available content based on suitably chosen displayed metadata. A presentation of this type has similarities with the outputs of existing adaptive hypermedia systems. However, such systems are generated based on âclosedâ content with sophisticated user and domain models. Extending them to âopenâ domain information retrieval applications would raise many issues. We present an outline exploration of what will form a challenging new direction for research in multilingual information access
Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives
The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets
Multilingual Dynamic Topic Model
Dynamic topic models (DTMs) capture the evolution of topics and trends in time series data. Current DTMs are applicable only to monolingual datasets. In this paper we present the multilingual dynamic topic model (ML-DTM), a novel topic model that combines DTM with an existing multilingual topic modeling method to capture crosslingual topics that evolve across time. We present results of this model on a parallel German-English corpus of news articles and a comparable corpus of Finnish and Swedish news articles. We demonstrate the capability of ML-DTM to track significant events related to a topic and show that it finds distinct topics and performs as well as existing multilingual topic models in aligning cross-lingual topics.Peer reviewe
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Despite the progress we have recorded in the last few years in multilingual
natural language processing, evaluation is typically limited to a small set of
languages with available datasets which excludes a large number of low-resource
languages. In this paper, we created SIB-200 -- a large-scale open-sourced
benchmark dataset for topic classification in 200 languages and dialects to
address the lack of evaluation dataset for Natural Language Understanding
(NLU). For many of the languages covered in SIB-200, this is the first publicly
available evaluation dataset for NLU. The dataset is based on Flores-200
machine translation corpus. We annotated the English portion of the dataset and
extended the sentence-level annotation to the remaining 203 languages covered
in the corpus. Despite the simplicity of this task, our evaluation in
full-supervised setting, cross-lingual transfer setting and prompting of large
language model setting show that there is still a large gap between the
performance of high-resource and low-resource languages when multilingual
evaluation is scaled to numerous world languages. We found that languages
unseen during the pre-training of multilingual language models,
under-represented language families (like Nilotic and Altantic-Congo), and
languages from the regions of Africa, Americas, Oceania and South East Asia,
often have the lowest performance on our topic classification dataset. We hope
our dataset will encourage a more inclusive evaluation of multilingual language
models on a more diverse set of languages. https://github.com/dadelani/sib-200Comment: under submissio
- âŠ