33 research outputs found

    Automatically Neutralizing Subjective Bias in Text

    Full text link
    Texts like news, encyclopedias, and some social media strive for objectivity. Yet bias in the form of inappropriate subjectivity - introducing attitudes via framing, presupposing truth, and casting doubt - remains ubiquitous. This kind of bias erodes our collective trust and fuels social conflict. To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view ("neutralizing" biased text). We also offer the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs and originates from Wikipedia edits that removed various framings, presuppositions, and attitudes from biased sentences. Last, we propose two strong encoder-decoder baselines for the task. A straightforward yet opaque CONCURRENT system uses a BERT encoder to identify subjective words as part of the generation process. An interpretable and controllable MODULAR algorithm separates these steps, using (1) a BERT-based classifier to identify problematic words and (2) a novel join embedding through which the classifier can edit the hidden states of the encoder. Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a first step towards the automatic identification and reduction of bias.Comment: To appear at AAAI 202

    Towards Detection of Subjective Bias using Contextualized Word Embeddings

    Full text link
    Subjective bias detection is critical for applications like propaganda detection, content recommendation, sentiment analysis, and bias neutralization. This bias is introduced in natural language via inflammatory words and phrases, casting doubt over facts, and presupposing the truth. In this work, we perform comprehensive experiments for detecting subjective bias using BERT-based models on the Wiki Neutrality Corpus(WNC). The dataset consists of 360k360k labeled instances, from Wikipedia edits that remove various instances of the bias. We further propose BERT-based ensembles that outperform state-of-the-art methods like BERTlargeBERT_{large} by a margin of 5.65.6 F1 score.Comment: To appear in Companion Proceedings of the Web Conference 2020 (WWW '20 Companion

    Quantification of Gender-related Stereotypes in Psychotherapy Sessions

    Get PDF
    Gender-related stereotypes and biases can have severe consequences in the medical domain, especially in mental health therapy. In this study, we analyzed 91 psychotherapy transcripts from the Alexander Street database to investigate whether gender-related stereotypes differ in the treatment of patients by male versus female therapists using natural language processing and statistical analyses. We built a lexicon of ten high-level categories that capture sentence-level attributes and represent gender-related stereotypes. Our results suggest significant statistical differences in categories such as active, negatives, positives, etc., during the treatment of female patients by male therapists as compared to female therapists. We built logistic regression models using the ten high-level lexical categories to predict the gender of the therapist. We also provide recommendations on how our analytical methods can be used, along with other advanced deep-learning methods, to detect and reduce gender-related stereotypes in psychotherapy sessions

    Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles

    Full text link
    Media has a substantial impact on the public perception of events. A one-sided or polarizing perspective on any topic is usually described as media bias. One of the ways how bias in news articles can be introduced is by altering word choice. Biased word choices are not always obvious, nor do they exhibit high context-dependency. Hence, detecting bias is often difficult. We propose a Transformer-based deep learning architecture trained via Multi-Task Learning using six bias-related data sets to tackle the media bias detection problem. Our best-performing implementation achieves a macro F1F_{1} of 0.776, a performance boost of 3\% compared to our baseline, outperforming existing methods. Our results indicate Multi-Task Learning as a promising alternative to improve existing baseline models in identifying slanted reporting

    Emotion and Sentiment Guided Paraphrasing

    Full text link
    Paraphrase generation, a.k.a. paraphrasing, is a common and important task in natural language processing. Emotional paraphrasing, which changes the emotion embodied in a piece of text while preserving its meaning, has many potential applications, including moderating online dialogues and preventing cyberbullying. We introduce a new task of fine-grained emotional paraphrasing along emotion gradients, that is, altering the emotional intensities of the paraphrases in fine-grained settings following smooth variations in affective dimensions while preserving the meaning of the original text. We reconstruct several widely used paraphrasing datasets by augmenting the input and target texts with their fine-grained emotion labels. Then, we propose a framework for emotion and sentiment guided paraphrasing by leveraging pre-trained language models for conditioned text generation. Extensive evaluation of the fine-tuned models suggests that including fine-grained emotion labels in the paraphrase task significantly improves the likelihood of obtaining high-quality paraphrases that reflect the desired emotions while achieving consistently better scores in paraphrase metrics such as BLEU, ROUGE, and METEOR.Comment: 13th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2023 at The 61st Annual Meeting of the Association for Computational Linguistics (ACL) 2023. arXiv admin note: substantial text overlap with arXiv:2212.0329

    Automatic and Human-AI Interactive Text Generation

    Full text link
    In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simplification, paraphrase generation, style transfer, etc. In contrast to text summarization and open-ended text completion (e.g., story), the text-to-text generation tasks we discuss in this tutorial are more constrained in terms of semantic consistency and targeted language styles. This level of control makes these tasks ideal testbeds for studying the ability of models to generate text that is both semantically adequate and stylistically appropriate. Moreover, these tasks are interesting from a technical standpoint, as they require complex combinations of lexical and syntactical transformations, stylistic control, and adherence to factual knowledge, -- all at once. With a special focus on text simplification and revision, this tutorial aims to provide an overview of the state-of-the-art natural language generation research from four major aspects -- Data, Models, Human-AI Collaboration, and Evaluation -- and to discuss and showcase a few significant and recent advances: (1) the use of non-retrogressive approaches; (2) the shift from fine-tuning to prompting with large language models; (3) the development of new learnable metric and fine-grained human evaluation framework; (4) a growing body of studies and datasets on non-English languages; (5) the rise of HCI+NLP+Accessibility interdisciplinary research to create real-world writing assistant systems.Comment: To appear at ACL 2024, Tutoria

    Curated Datasets for Use in Automated Media Monitoring and Feedback System: “News Classification System” Dataset, “Government News Classification” Dataset

    Get PDF
    Online journalism in India, a growing field that involves news websites and Digital media, connects with the Press Information Bureau (PIB), a government agency dedicated to sharing accurate information about government policies and initiatives with journalists. While various news outlets publish diverse articles and opinions on these topics, the government seeks to leverage Artificial Intelligence and Machine Learning for gathering feedback in multiple languages. To develop such a system, a notable obstacle is the lack of a readily accessible standard dataset is required. To address this, two datasets are developed named, 'NCS' and 'GNC,' consisting of information from 2020 to 2023 and collected through web scraping tools like Parsehub and manually scrapping. NCS represents News Classification system dataset and GNC represents Government News Classification. The 'NCS' dataset includes Indian news in Hindi, Marathi, and English with categorization of Indian news as government-related or not. Then, a Machine Learning model called "Government News Classifier" to sort news articles using the 'NCS' dataset into either government-related or non-government-related categories. The objective is to use this model to figure out if a news source is discussing topics related to the government or not. Using this model, we created the 'GNC' dataset, which contains only news articles related to government schemes and policies in Hindi, Marathi, and English. In GNC dataset, Human experts manually classify each news source into three categories: "government favourable," "government non-favourable," or "neutral." In essence, this research emphasizes the importance of having access to a large dataset, which can stimulate more advanced prediction models in this complex field

    Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

    Full text link
    Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment

    XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

    Full text link
    Text editing is a crucial task that involves modifying text to better align with user intents. However, existing text editing benchmark datasets have limitations in providing only coarse-grained instructions. Consequently, although the edited output may seem reasonable, it often deviates from the intended changes outlined in the gold reference, resulting in low evaluation scores. To comprehensively investigate the text editing capabilities of large language models, this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU covers a wide range of topics and text types, incorporating lexical, syntactic, semantic, and knowledge-intensive edits. To enhance interpretability, we leverage high-quality data sources and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing open and closed large language models against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research.Comment: Work in progres

    Language (Technology) is Power: A Critical Survey of "Bias" in NLP

    Full text link
    We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing "bias" in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of "bias"---i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements---and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities
    corecore