Search CORE

14 research outputs found

Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks

Author: Choenni Rochelle
Garrette Dan
Shutova Ekaterina
Publication venue
Publication date: 14/11/2023
Field of study

Recent work has proposed explicitly inducing language-wise modularity in multilingual LMs via sparse fine-tuning (SFT) on per-language subnetworks as a means of better guiding cross-lingual sharing. In this work, we investigate (1) the degree to which language-wise modularity naturally arises within models with no special modularity interventions, and (2) how cross-lingual sharing and interference differ between such models and those with explicit SFT-guided subnetwork modularity. To quantify language specialization and cross-lingual interaction, we use a Training Data Attribution method that estimates the degree to which a model's predictions are influenced by in-language or cross-language training examples. Our results show that language-specialized subnetworks do naturally arise, and that SFT, rather than always increasing modularity, can decrease language specialization of subnetworks in favor of more cross-lingual sharing

arXiv.org e-Print Archive

Probing LLMs for Joint Encoding of Linguistic Categories

Author: Choenni Rochelle
Leidinger Alina
Panagiotopoulos Apostolos
Papakostas Konstantinos
Rosati Matteo
Shutova Ekaterina
Starace Giulio
Publication venue
Publication date: 28/10/2023
Field of study

Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (Tenney et al., 2019) suggests that a linguistic hierarchy emerges in the LLM layers, with lower layers better suited to solving syntactic tasks and higher layers employed for semantic processing. Yet, little is known about how encodings of different linguistic phenomena interact within the models and to what extent processing of linguistically-related categories relies on the same, shared model representations. In this paper, we propose a framework for testing the joint encoding of linguistic categories in LLMs. Focusing on syntax, we find evidence of joint encoding both at the same (related part-of-speech (POS) classes) and different (POS classes and related syntactic dependency relations) levels of linguistic hierarchy. Our cross-lingual experiments show that the same patterns hold across languages in multilingual LLMs.Comment: Accepted in EMNLP Findings 202

arXiv.org e-Print Archive

Do large language models solve verbal analogies like children do?

Author: Choenni Rochelle
Shutova Ekaterina
Stevenson Claire E.
ter Veen Mathilde
van der Maas Han L. J.
Publication venue
Publication date: 31/10/2023
Field of study

Analogy-making lies at the heart of human cognition. Adults solve analogies such as \textit{Horse belongs to stable like chicken belongs to ...?} by mapping relations (\textit{kept in}) and answering \textit{chicken coop}. In contrast, children often use association, e.g., answering \textit{egg}. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form using associations, similar to what children do. We use verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year-olds from the Netherlands solved 622 analogies in Dutch. The six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, around the 7-year-old level, and XLM-V and GPT-3 the best, slightly above the 11-year-old level. However, when we control for associative processes this picture changes and each model's performance level drops 1-2 years. Further experiments demonstrate that associative processes often underlie correctly solved analogies. We conclude that the LLMs we tested indeed tend to solve verbal analogies by association with C like children do

arXiv.org e-Print Archive

Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology

Author: Ekaterina Shutova
Rochelle Choenni
Publication venue: 'MIT Press - Journals'
Publication date: 01/04/2022
Field of study

Multilingual sentence encoders have seen much success in cross-lingual model transfer for downstream NLP tasks. The success of this transfer is, however, dependent on the model’s ability to encode the patterns of cross-lingual similarity and variation. Yet, we know relatively little about the properties of individual languages or the general patterns of linguistic variation that the models encode. In this article, we investigate these questions by leveraging knowledge from the field of linguistic typology, which studies and documents structural and semantic variation across languages. We propose methods for separating language-specific subspaces within state-of-the-art multilingual sentence encoders (LASER, M-BERT, XLM, and XLM-R) with respect to a range of typological properties pertaining to lexical, morphological, and syntactic structure. Moreover, we investigate how typological information about languages is distributed across all layers of the models. Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies. In addition, we propose a simple method to study how shared typological properties of languages are encoded in two state-of-the-art multilingual models—M-BERT and XLM-R. The results provide insight into their information-sharing mechanisms and suggest that these linguistic properties are encoded jointly across typologically similar languages in these models

Directory of Open Access Journals

International Migration, Integration and Social Cohesion online publications

Semantic drift in multilingual representations

Author: Beinborn Lisa
Choenni Rochelle
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2020
Field of study

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distri-butional representations that are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: They can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages

arXiv.org e-Print Archive

VU Research Portal

On the Usability of Big (Social) Data

Author: Bargh M.S.
Choenni R.
Choenni Rochelle
Netten C.P.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Due to the growing availability of huge amounts of data of different types and the growing capabilities to analyze these data, the expectations of big data applications are high. In this paper, we argue that the usability of big data in the social domain is far from trivial. If the outcomes of big data are wrongly interpreted, this may shape the development of our society in a wrong direction. Therefore, care should be taken of a proper interpretation of big data outcomes and its applications in real-life. To support such an interpretation, we distinguish three major building blocks in big data, the data as input for analyses, the algorithms to analyze the data, and the models as output of the analyses. We show that each of the building blocks entail different complications for a proper interpretation of big data outcomes in practice. Therefore, well thought-through strategies are required for using big data outcomes in a responsible way. We discuss a framework for such strategies

Challenges of Big Data from a philosophical perspective

Author: Bargh M.S.
Choenni R.
Choenni Rochelle
Netten C.P.M.
Publication venue
Publication date
Field of study

Due to the many potential applications of Big Data, the expectations are high. However, there are some fundamental objections on the straightforward use of Big Data outcomes. In this paper, we take a philosophical view on the Big Data approach and discuss these objections. Formally, Big Data induces models from very large data sets, which are nevertheless incomplete. In many cases these data sets might be skewed as well. This gives rise to the question to what extent induced models represent the real world adequately, and therefore are sufficiently grounded to base new policies on. We argue that caution is needed in interpreting these models and well thought through strategies are required for using the models in practice in a responsible way. We discuss two strategies that may be used

Robust Evaluation of Language–Brain Encoding Experiments

Author: Abnar Samira
Beinborn Lisa
Choenni Rochelle
Publication venue: Springer Science and Business Media Deutschland GmbH
Publication date: 01/01/2023
Field of study

Language–brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli. The evaluation scenarios for this task have not yet been standardized which makes it difficult to compare and interpret results. We perform a series of evaluation experiments with a consistent encoding setup and compute the results for multiple fMRI datasets. In addition, we test the sensitivity of the evaluation measures to randomized data and analyze the effect of voxel selection methods. Our experimental framework is publicly available to make modelling decisions more transparent and support reproducibility for future comparisons.</p

VU Research Portal