14 research outputs found

    AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

    Full text link
    Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distribution

    Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

    Full text link
    Labeling social-media data for custom dimensions of toxicity and social bias is challenging and labor-intensive. Existing transfer and active learning approaches meant to reduce annotation effort require fine-tuning, which suffers from over-fitting to noise and can cause domain shift with small sample sizes. In this work, we propose a novel Active Transfer Few-shot Instructions (ATF) approach which requires no fine-tuning. ATF leverages the internal linguistic knowledge of pre-trained language models (PLMs) to facilitate the transfer of information from existing pre-labeled datasets (source-domain task) with minimum labeling effort on unlabeled target data (target-domain task). Our strategy can yield positive transfer achieving a mean AUC gain of 10.5% compared to no transfer with a large 22b parameter PLM. We further show that annotation of just a few target-domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort (26% drop in gain between 100 and 2000 annotated examples). Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural Language Processing, 2022, New Orlean

    Conversational Agents for Health and Wellbeing

    Get PDF
    Conversational agents have increasingly been deployed in healthcare applications. However, significant challenges remain in developing this technology. Recent research in this area has highlighted that: i) patient safety was rarely evaluated; ii) health outcomes were poorly measured, and iii) no standardised evaluation methods were employed. The conversational agents in healthcare are lagging behind the developments in other domains. This one-day workshop aims to create a roadmap for healthcare conversational agents to develop standardised design and evaluation frameworks. This will prioritise health outcomes and patient safety while ensuring a high-quality user experience. In doing so, this workshop will bring together researchers and practitioners from HCI, healthcare and related speech and chatbot domains to collaborate on these key challenges

    Challenges in moderating disruptive player behavior in online competitive action games

    Get PDF
    Online competitive action games are a very popular form of entertainment. While most are respectfully enjoyed by millions of players, a small group of players engages in disruptive behavior, such as cheating and hate speech. Identifying and subsequently moderating these toxic players is a challenging task. Previous research has only studied specific aspects of this problem using curated data and with limited access to real-world moderation practices. In contrast, our work offers a unique and holistic view of the universal challenges of moderating disruptive behavior in online systems. We combine an analysis of a large dataset from a popular online competitive first-person action title (Call of Duty®: Modern Warfare®II) with insights from stakeholders involved in moderation. We identify six universal challenges related to handling disruptive behaviors in such games. We discuss challenges omitted by prior work, such as handling high-volume imbalanced data or ensuring the comfort of human moderators. We also offer a discussion of possible technical, design, and policy approaches to mitigating these challenges

    A trust evaluation framework for sensor readings in body area sensor networks

    No full text
    This paper addresses a framework to evaluate trustworthiness of a Body Area Sensor Networks (BASN), in particular, of sensor readings. We show that such trustworthiness is to be interpreted with respect to a certain statement or goal; its evaluation is based on quality aspects derived from observations and opinions from others. We examine relevant quality aspects of sensor readings which correspond to potential deviating behaviors of sensors. We then look at how to derive such qualities from observations taking uncertainty into the evaluation as well as decay over time. We develop an extension of subjective logic for this purpose and we show how we can compute quality properties without storing long time series. We then demonstrate this for two examples, including Galvanic Skin Response (GSR) and Electrocardiography (ECG) sensed data

    Visual fidelity of video prototypes and user feedback: A case study

    No full text
    This paper addresses the debate regarding the respective merits of high and low fidelity prototypes, in the domain of video prototyping. Video prototyping is a popular tool for interface designers. Despite this, there is practically no research reported to date examining the fidelity of the design representation that the video prototype should manifest. We report a case study where the same design concept was rendered on video in two formats with differing degree of visual fidelity: animated paper cut-outs (low visual fidelity) versus a video with real actors, edited to simulate computer output (high visual fidelity). A two-pronged comparative evaluation was carried out: a between-subjects questionnaire survey, consisting of AttrakDiff, open-ended questions completed by 99 participants, and semi-structured qualitative interviews with 9 participants. The results did not reveal any differences regarding the amount or quality of feedback one should expect from a low or a high fidelity video. These results lead us to suggest that the paper cut out animation is a valid prototype that should be explored more by interaction designers for obtaining early user feedback at low cost

    Data_Sheet_1_Challenges in moderating disruptive player behavior in online competitive action games.pdf

    No full text
    Online competitive action games are a very popular form of entertainment. While most are respectfully enjoyed by millions of players, a small group of players engages in disruptive behavior, such as cheating and hate speech. Identifying and subsequently moderating these toxic players is a challenging task. Previous research has only studied specific aspects of this problem using curated data and with limited access to real-world moderation practices. In contrast, our work offers a unique and holistic view of the universal challenges of moderating disruptive behavior in online systems. We combine an analysis of a large dataset from a popular online competitive first-person action title (Call of Duty®: Modern Warfare®II) with insights from stakeholders involved in moderation. We identify six universal challenges related to handling disruptive behaviors in such games. We discuss challenges omitted by prior work, such as handling high-volume imbalanced data or ensuring the comfort of human moderators. We also offer a discussion of possible technical, design, and policy approaches to mitigating these challenges.</p
    corecore