8 research outputs found
A Lightweight Method to Generate Unanswerable Questions in English
If a question cannot be answered with the available information, robust
systems for question answering (QA) should know _not_ to answer. One way to
build QA models that do this is with additional training data comprised of
unanswerable questions, created either by employing annotators or through
automated methods for unanswerable question generation. To show that the model
complexity of existing automated approaches is not justified, we examine a
simpler data augmentation method for unanswerable question generation in
English: performing antonym and entity swaps on answerable questions. Compared
to the prior state-of-the-art, data generated with our training-free and
lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data
with BERT-large), and has higher human-judged relatedness and readability. We
quantify the raw benefits of our approach compared to no augmentation across
multiple encoder models, using different amounts of generated data, and also on
TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish
swaps as a simple but strong baseline for future work.Comment: Accepted to Findings of EMNLP 202
Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
Personal names simultaneously differentiate individuals and categorize them
in ways that are important in a given society. While the natural language
processing community has thus associated personal names with sociodemographic
characteristics in a variety of tasks, researchers have engaged to varying
degrees with the established methodological problems in doing so. To guide
future work that uses names and sociodemographic characteristics, we provide an
overview of relevant research: first, we present an interdisciplinary
background on names and naming. We then survey the issues inherent to
associating names with sociodemographic attributes, covering problems of
validity (e.g., systematic error, construct validity), as well as ethical
concerns (e.g., harms, differential impact, cultural insensitivity). Finally,
we provide guiding questions along with normative recommendations to avoid
validity and ethical pitfalls when dealing with names and sociodemographic
characteristics in natural language processing.Comment: Gender Bias in Natural Language Processing Workshop at ACL 202
Factoring the Matrix of Domination: A Critical Review and Reimagination of Intersectionality in AI Fairness
Intersectionality is a critical framework that, through inquiry and praxis,
allows us to examine how social inequalities persist through domains of
structure and discipline. Given AI fairness' raison d'etre of "fairness", we
argue that adopting intersectionality as an analytical framework is pivotal to
effectively operationalizing fairness. Through a critical review of how
intersectionality is discussed in 30 papers from the AI fairness literature, we
deductively and inductively: 1) map how intersectionality tenets operate within
the AI fairness paradigm and 2) uncover gaps between the conceptualization and
operationalization of intersectionality. We find that researchers
overwhelmingly reduce intersectionality to optimizing for fairness metrics over
demographic subgroups. They also fail to discuss their social context and when
mentioning power, they mostly situate it only within the AI pipeline. We: 3)
outline and assess the implications of these gaps for critical inquiry and
praxis, and 4) provide actionable recommendations for AI fairness researchers
to engage with intersectionality in their work by grounding it in AI
epistemology.Comment: To appear at AIES 202
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis
In-context learning is a popular inference strategy where large language
models solve a task using only a few labeled demonstrations without needing any
parameter updates. Although there have been extensive studies on English
in-context learning, multilingual in-context learning remains under-explored,
and we lack an in-depth understanding of the role of demonstrations in this
context. To address this gap, we conduct a multidimensional analysis of
multilingual in-context learning, experimenting with 5 models from different
model families, 9 datasets covering classification and generation tasks, and 56
typologically diverse languages. Our results reveal that the effectiveness of
demonstrations varies significantly across models, tasks, and languages. We
also find that strong instruction-following models including Llama 2-Chat,
GPT-3.5, and GPT-4 are largely insensitive to the quality of demonstrations.
Instead, a carefully crafted template often eliminates the benefits of
demonstrations for some tasks and languages altogether. These findings show
that the importance of demonstrations might be overestimated. Our work
highlights the need for granular evaluation across multiple axes towards a
better understanding of in-context learning.Comment: ACL 2024 finding
Synthetic Data:Representation and/vs Representativeness
Synthetic data is increasingly used throughout the AI development pipeline to address three primary challenges surrounding data use - data scarcity, privacy concerns, and data representativeness or diversity. With the introduction of the AI Act, these three challenges take on new urgency. Creating synthetic data clearly addresses the data scarcity problem and over a decade of research has interrogated the possibilities of differential privacy, yet little attention has been paid to whether and how data diversity is addressed in these systems. When applied to data, the term representation has multiple definitions, including both “representativeness,” which describes quantitative metrics of how many instances of a particular kind or grouping are in a dataset, and “representation,” which concerns the qualities that tend to be assigned to groups and individuals. In this workshop we will explore synthetic data with a view to this plurality of representation as essential to responsible AI development practices.</p
