85 research outputs found
Morphological Inflection with Phonological Features
Recent years have brought great advances into solving morphological tasks,
mostly due to powerful neural models applied to various tasks as (re)inflection
and analysis. Yet, such morphological tasks cannot be considered solved,
especially when little training data is available or when generalizing to
previously unseen lemmas. This work explores effects on performance obtained
through various ways in which morphological models get access to subcharacter
phonological features that are the targets of morphological processes. We
design two methods to achieve this goal: one that leaves models as is but
manipulates the data to include features instead of characters, and another
that manipulates models to take phonological features into account when
building representations for phonemes. We elicit phonemic data from standard
graphemic data using language-specific grammars for languages with shallow
grapheme-to-phoneme mapping, and we experiment with two reinflection models
over eight languages. Our results show that our methods yield comparable
results to the grapheme-based baseline overall, with minor improvements in some
of the languages. All in all, we conclude that patterns in character
distributions are likely to allow models to infer the underlying phonological
characteristics, even when phonemes are not explicitly represented.Comment: ACL 2023 main conference; 8 pages, 1 figur
Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces
The ability to identify and control different kinds of linguistic information
encoded in vector representations of words has many use cases, especially for
explainability and bias removal. This is usually done via a set of simple
classification tasks, termed probes, to evaluate the information encoded in the
embedding space. However, the involvement of a trainable classifier leads to
entanglement between the probe's results and the classifier's nature. As a
result, contemporary works on probing include tasks that do not involve
training of auxiliary models. In this work we introduce the term indicator
tasks for non-trainable tasks which are used to query embedding spaces for the
existence of certain properties, and claim that this kind of tasks may point to
a direction opposite to probes, and that this contradiction complicates the
decision on whether a property exists in an embedding space. We demonstrate our
claims with two test cases, one dealing with gender debiasing and another with
the erasure of morphological information from embedding spaces. We show that
the application of a suitable indicator provides a more accurate picture of the
information captured and removed compared to probes. We thus conclude that
indicator tasks should be implemented and taken into consideration when
eliciting information from embedded representations.Comment: Findings of EMNLP 202
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Data contamination has become prevalent and challenging with the rise of
models pretrained on large automatically-crawled corpora. For closed models,
the training data becomes a trade secret, and even for open models, it is not
trivial to detect contamination. Strategies such as leaderboards with hidden
answers, or using test data which is guaranteed to be unseen, are expensive and
become fragile with time. Assuming that all relevant actors value clean test
data and will cooperate to mitigate data contamination, what can be done? We
propose three strategies that can make a difference: (1) Test data made public
should be encrypted with a public key and licensed to disallow derivative
distribution; (2) demand training exclusion controls from closed API holders,
and protect your test data by refusing to evaluate without them; (3) avoid data
which appears with its solution on the internet, and release the web-page
context of internet-derived data along with the data. These strategies are
practical and can be effective in preventing data contamination.Comment: Accepted to EMNLP 202
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models
Large language models (LLMs) have been shown to possess impressive
capabilities, while also raising crucial concerns about the faithfulness of
their responses. A primary issue arising in this context is the management of
(un)answerable queries by LLMs, which often results in hallucinatory behavior
due to overconfidence. In this paper, we explore the behavior of LLMs when
presented with (un)answerable queries. We ask: do models represent the fact
that the question is (un)answerable when generating a hallucinatory answer? Our
results show strong indications that such models encode the answerability of an
input query, with the representation of the first decoded token often being a
strong indicator. These findings shed new light on the spatial organization
within the latent representations of LLMs, unveiling previously unexplored
facets of these models. Moreover, they pave the way for the development of
improved decoding techniques with better adherence to factual generation,
particularly in scenarios where query (un)answerability is a concern.Comment: EMNLP 202
The MRL 2022 Shared Task on Multilingual Clause-level Morphology
International audienceThe 2022 Multilingual Representation Learning (MRL) Shared Task was dedicated to clause-level morphology. As the first ever benchmark that defines and evaluates morphology outside its traditional lexical boundaries, the shared task on multilingual clause-level morphology sets the scene for competition across different approaches to morphological modeling, with 3 clause-level sub-tasks: morphological inflection, reinflection and analysis, where systems are required to generate, manipulate or analyze simple sentences centered around a single content lexeme and a set of morphological features characterizing its syntactic clause. This year's tasks covered eight typologically distinct languages: English, French, German, Hebrew, Russian, Spanish, Swahili and Turkish. The tasks has received submissions of four systems from three teams which were compared to two baselines implementing prominent multilingual learning methods. The results show that modern NLP models are effective in solving morphological tasks even at the clause level. However, there is still room for improvement, especially in the task of morphological analysis
One size does not fit all: local determinants of measles vaccination in four districts of Pakistan
Common factors are associated with vaccination. However, despite common factors the pattern of variables related to measles vaccination differs between and within districts. In this study children were more likely to receive measles vaccination if their mother had any formal education, if she knew at least one vaccine preventable disease, and if she had not heard of any bad effects of vaccination. In rural areas, living within 5 km of a vaccination facility or in a community visited by a vaccination team were factors associated with vaccination, as was the mother receiving information about vaccinations
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, VÔro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Ashåninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe
- âŠ