33 research outputs found
Automatically Neutralizing Subjective Bias in Text
Texts like news, encyclopedias, and some social media strive for objectivity.
Yet bias in the form of inappropriate subjectivity - introducing attitudes via
framing, presupposing truth, and casting doubt - remains ubiquitous. This kind
of bias erodes our collective trust and fuels social conflict. To address this
issue, we introduce a novel testbed for natural language generation:
automatically bringing inappropriately subjective text into a neutral point of
view ("neutralizing" biased text). We also offer the first parallel corpus of
biased language. The corpus contains 180,000 sentence pairs and originates from
Wikipedia edits that removed various framings, presuppositions, and attitudes
from biased sentences. Last, we propose two strong encoder-decoder baselines
for the task. A straightforward yet opaque CONCURRENT system uses a BERT
encoder to identify subjective words as part of the generation process. An
interpretable and controllable MODULAR algorithm separates these steps, using
(1) a BERT-based classifier to identify problematic words and (2) a novel join
embedding through which the classifier can edit the hidden states of the
encoder. Large-scale human evaluation across four domains (encyclopedias, news
headlines, books, and political speeches) suggests that these algorithms are a
first step towards the automatic identification and reduction of bias.Comment: To appear at AAAI 202
Towards Detection of Subjective Bias using Contextualized Word Embeddings
Subjective bias detection is critical for applications like propaganda
detection, content recommendation, sentiment analysis, and bias neutralization.
This bias is introduced in natural language via inflammatory words and phrases,
casting doubt over facts, and presupposing the truth. In this work, we perform
comprehensive experiments for detecting subjective bias using BERT-based models
on the Wiki Neutrality Corpus(WNC). The dataset consists of labeled
instances, from Wikipedia edits that remove various instances of the bias. We
further propose BERT-based ensembles that outperform state-of-the-art methods
like by a margin of F1 score.Comment: To appear in Companion Proceedings of the Web Conference 2020 (WWW
'20 Companion
Quantification of Gender-related Stereotypes in Psychotherapy Sessions
Gender-related stereotypes and biases can have severe consequences in the medical domain, especially in mental health therapy. In this study, we analyzed 91 psychotherapy transcripts from the Alexander Street database to investigate whether gender-related stereotypes differ in the treatment of patients by male versus female therapists using natural language processing and statistical analyses. We built a lexicon of ten high-level categories that capture sentence-level attributes and represent gender-related stereotypes. Our results suggest significant statistical differences in categories such as active, negatives, positives, etc., during the treatment of female patients by male therapists as compared to female therapists. We built logistic regression models using the ten high-level lexical categories to predict the gender of the therapist. We also provide recommendations on how our analytical methods can be used, along with other advanced deep-learning methods, to detect and reduce gender-related stereotypes in psychotherapy sessions
Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles
Media has a substantial impact on the public perception of events. A
one-sided or polarizing perspective on any topic is usually described as media
bias. One of the ways how bias in news articles can be introduced is by
altering word choice. Biased word choices are not always obvious, nor do they
exhibit high context-dependency. Hence, detecting bias is often difficult. We
propose a Transformer-based deep learning architecture trained via Multi-Task
Learning using six bias-related data sets to tackle the media bias detection
problem. Our best-performing implementation achieves a macro of 0.776,
a performance boost of 3\% compared to our baseline, outperforming existing
methods. Our results indicate Multi-Task Learning as a promising alternative to
improve existing baseline models in identifying slanted reporting
Emotion and Sentiment Guided Paraphrasing
Paraphrase generation, a.k.a. paraphrasing, is a common and important task in
natural language processing. Emotional paraphrasing, which changes the emotion
embodied in a piece of text while preserving its meaning, has many potential
applications, including moderating online dialogues and preventing
cyberbullying. We introduce a new task of fine-grained emotional paraphrasing
along emotion gradients, that is, altering the emotional intensities of the
paraphrases in fine-grained settings following smooth variations in affective
dimensions while preserving the meaning of the original text. We reconstruct
several widely used paraphrasing datasets by augmenting the input and target
texts with their fine-grained emotion labels. Then, we propose a framework for
emotion and sentiment guided paraphrasing by leveraging pre-trained language
models for conditioned text generation. Extensive evaluation of the fine-tuned
models suggests that including fine-grained emotion labels in the paraphrase
task significantly improves the likelihood of obtaining high-quality
paraphrases that reflect the desired emotions while achieving consistently
better scores in paraphrase metrics such as BLEU, ROUGE, and METEOR.Comment: 13th Workshop on Computational Approaches to Subjectivity, Sentiment
& Social Media Analysis (WASSA) 2023 at The 61st Annual Meeting of the
Association for Computational Linguistics (ACL) 2023. arXiv admin note:
substantial text overlap with arXiv:2212.0329
Automatic and Human-AI Interactive Text Generation
In this tutorial, we focus on text-to-text generation, a class of natural
language generation (NLG) tasks, that takes a piece of text as input and then
generates a revision that is improved according to some specific criteria
(e.g., readability or linguistic styles), while largely retaining the original
meaning and the length of the text. This includes many useful applications,
such as text simplification, paraphrase generation, style transfer, etc. In
contrast to text summarization and open-ended text completion (e.g., story),
the text-to-text generation tasks we discuss in this tutorial are more
constrained in terms of semantic consistency and targeted language styles. This
level of control makes these tasks ideal testbeds for studying the ability of
models to generate text that is both semantically adequate and stylistically
appropriate. Moreover, these tasks are interesting from a technical standpoint,
as they require complex combinations of lexical and syntactical
transformations, stylistic control, and adherence to factual knowledge, -- all
at once. With a special focus on text simplification and revision, this
tutorial aims to provide an overview of the state-of-the-art natural language
generation research from four major aspects -- Data, Models, Human-AI
Collaboration, and Evaluation -- and to discuss and showcase a few significant
and recent advances: (1) the use of non-retrogressive approaches; (2) the shift
from fine-tuning to prompting with large language models; (3) the development
of new learnable metric and fine-grained human evaluation framework; (4) a
growing body of studies and datasets on non-English languages; (5) the rise of
HCI+NLP+Accessibility interdisciplinary research to create real-world writing
assistant systems.Comment: To appear at ACL 2024, Tutoria
Curated Datasets for Use in Automated Media Monitoring and Feedback System: “News Classification System” Dataset, “Government News Classification” Dataset
Online journalism in India, a growing field that involves news websites and Digital media, connects with the Press Information Bureau (PIB), a government agency dedicated to sharing accurate information about government policies and initiatives with journalists. While various news outlets publish diverse articles and opinions on these topics, the government seeks to leverage Artificial Intelligence and Machine Learning for gathering feedback in multiple languages. To develop such a system, a notable obstacle is the lack of a readily accessible standard dataset is required. To address this, two datasets are developed named, 'NCS' and 'GNC,' consisting of information from 2020 to 2023 and collected through web scraping tools like Parsehub and manually scrapping. NCS represents News Classification system dataset and GNC represents Government News Classification. The 'NCS' dataset includes Indian news in Hindi, Marathi, and English with categorization of Indian news as government-related or not. Then, a Machine Learning model called "Government News Classifier" to sort news articles using the 'NCS' dataset into either government-related or non-government-related categories. The objective is to use this model to figure out if a news source is discussing topics related to the government or not. Using this model, we created the 'GNC' dataset, which contains only news articles related to government schemes and policies in Hindi, Marathi, and English. In GNC dataset, Human experts manually classify each news source into three categories: "government favourable," "government non-favourable," or "neutral." In essence, this research emphasizes the importance of having access to a large dataset, which can stimulate more advanced prediction models in this complex field
Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance
Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered
significant attention due to their exceptional capabilities in handling a
diverse range of tasks. Recent studies demonstrate that open-sourced smaller
foundational models, such as 7B-size LLaMA, can also display remarkable
proficiency in tackling diverse tasks when fine-tuned using instruction-driven
data. In this work, we investigate a practical problem setting where the
primary focus is on one or a few particular tasks rather than general-purpose
instruction following, and explore whether LLMs can be beneficial and further
improved for such targeted scenarios. We choose the writing-assistant scenario
as the testbed, which includes seven writing tasks. We collect training data
for these tasks, reframe them in an instruction-following format, and
subsequently refine the LLM, specifically LLaMA, via instruction tuning.
Experimental results show that fine-tuning LLaMA on writing instruction data
significantly improves its ability on writing tasks. We also conduct more
experiments and analyses to offer insights for future work on effectively
fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion
regarding the necessity of employing LLMs for only one targeted task, taking
into account the efforts required for tuning and the resources consumed during
deployment
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Text editing is a crucial task that involves modifying text to better align
with user intents. However, existing text editing benchmark datasets have
limitations in providing only coarse-grained instructions. Consequently,
although the edited output may seem reasonable, it often deviates from the
intended changes outlined in the gold reference, resulting in low evaluation
scores. To comprehensively investigate the text editing capabilities of large
language models, this paper introduces XATU, the first benchmark specifically
designed for fine-grained instruction-based explainable text editing. XATU
covers a wide range of topics and text types, incorporating lexical, syntactic,
semantic, and knowledge-intensive edits. To enhance interpretability, we
leverage high-quality data sources and human annotation, resulting in a
benchmark that includes fine-grained instructions and gold-standard edit
explanations. By evaluating existing open and closed large language models
against our benchmark, we demonstrate the effectiveness of instruction tuning
and the impact of underlying architecture across various editing tasks.
Furthermore, extensive experimentation reveals the significant role of
explanations in fine-tuning language models for text editing tasks. The
benchmark will be open-sourced to support reproduction and facilitate future
research.Comment: Work in progres
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities