14 research outputs found
AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models
Social bias in Pretrained Language Models (PLMs) affects text generation and
other downstream NLP tasks. Existing bias testing methods rely predominantly on
manual templates or on expensive crowd-sourced data. We propose a novel
AutoBiasTest method that automatically generates sentences for testing bias in
PLMs, hence providing a flexible and low-cost alternative. Our approach uses
another PLM for generation and controls the generation of sentences by
conditioning on social group and attribute terms. We show that generated
sentences are natural and similar to human-produced content in terms of word
length and diversity. We illustrate that larger models used for generation
produce estimates of social bias with lower variance. We find that our bias
scores are well correlated with manual templates, but AutoBiasTest highlights
biases not captured by these templates due to more diverse and realistic test
sentences. By automating large-scale test sentence generation, we enable better
estimation of underlying bias distribution
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Labeling social-media data for custom dimensions of toxicity and social bias
is challenging and labor-intensive. Existing transfer and active learning
approaches meant to reduce annotation effort require fine-tuning, which suffers
from over-fitting to noise and can cause domain shift with small sample sizes.
In this work, we propose a novel Active Transfer Few-shot Instructions (ATF)
approach which requires no fine-tuning. ATF leverages the internal linguistic
knowledge of pre-trained language models (PLMs) to facilitate the transfer of
information from existing pre-labeled datasets (source-domain task) with
minimum labeling effort on unlabeled target data (target-domain task). Our
strategy can yield positive transfer achieving a mean AUC gain of 10.5%
compared to no transfer with a large 22b parameter PLM. We further show that
annotation of just a few target-domain samples via active learning can be
beneficial for transfer, but the impact diminishes with more annotation effort
(26% drop in gain between 100 and 2000 annotated examples). Finally, we find
that not all transfer scenarios yield a positive gain, which seems related to
the PLMs initial performance on the target-domain task.Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural
Language Processing, 2022, New Orlean
Conversational Agents for Health and Wellbeing
Conversational agents have increasingly been deployed in healthcare applications. However, significant challenges remain in developing this technology. Recent research in this area has highlighted that: i) patient safety was rarely evaluated; ii) health outcomes were poorly measured, and iii) no standardised evaluation methods were employed. The conversational agents in healthcare are lagging behind the developments in other domains. This one-day workshop aims to create a roadmap for healthcare conversational agents to develop standardised design and evaluation frameworks. This will prioritise health outcomes and patient safety while ensuring a high-quality user experience. In doing so, this workshop will bring together researchers and practitioners from HCI, healthcare and related speech and chatbot domains to collaborate on these key challenges
Challenges in moderating disruptive player behavior in online competitive action games
Online competitive action games are a very popular form of entertainment. While most are respectfully enjoyed by millions of players, a small group of players engages in disruptive behavior, such as cheating and hate speech. Identifying and subsequently moderating these toxic players is a challenging task. Previous research has only studied specific aspects of this problem using curated data and with limited access to real-world moderation practices. In contrast, our work offers a unique and holistic view of the universal challenges of moderating disruptive behavior in online systems. We combine an analysis of a large dataset from a popular online competitive first-person action title (Call of Duty®: Modern Warfare®II) with insights from stakeholders involved in moderation. We identify six universal challenges related to handling disruptive behaviors in such games. We discuss challenges omitted by prior work, such as handling high-volume imbalanced data or ensuring the comfort of human moderators. We also offer a discussion of possible technical, design, and policy approaches to mitigating these challenges
A trust evaluation framework for sensor readings in body area sensor networks
This paper addresses a framework to evaluate trustworthiness of a Body Area Sensor Networks (BASN), in particular, of sensor readings. We show that such trustworthiness is to be interpreted with respect to a certain statement or goal; its evaluation is based on quality aspects derived from observations and opinions from others. We examine relevant quality aspects of sensor readings which correspond to potential deviating behaviors of sensors. We then look at how to derive such qualities from observations taking uncertainty into the evaluation as well as decay over time. We develop an extension of subjective logic for this purpose and we show how we can compute quality properties without storing long time series. We then demonstrate this for two examples, including Galvanic Skin Response (GSR) and Electrocardiography (ECG) sensed data
Visual fidelity of video prototypes and user feedback: A case study
This paper addresses the debate regarding the respective merits of high and low fidelity prototypes, in the domain of video prototyping. Video prototyping is a popular tool for interface designers. Despite this, there is practically no research reported to date examining the fidelity of the design representation that the video prototype should manifest. We report a case study where the same design concept was rendered on video in two formats with differing degree of visual fidelity: animated paper cut-outs (low visual fidelity) versus a video with real actors, edited to simulate computer output (high visual fidelity). A two-pronged comparative evaluation was carried out: a between-subjects questionnaire survey, consisting of AttrakDiff, open-ended questions completed by 99 participants, and semi-structured qualitative interviews with 9 participants. The results did not reveal any differences regarding the amount or quality of feedback one should expect from a low or a high fidelity video. These results lead us to suggest that the paper cut out animation is a valid prototype that should be explored more by interaction designers for obtaining early user feedback at low cost
Data_Sheet_1_Challenges in moderating disruptive player behavior in online competitive action games.pdf
Online competitive action games are a very popular form of entertainment. While most are respectfully enjoyed by millions of players, a small group of players engages in disruptive behavior, such as cheating and hate speech. Identifying and subsequently moderating these toxic players is a challenging task. Previous research has only studied specific aspects of this problem using curated data and with limited access to real-world moderation practices. In contrast, our work offers a unique and holistic view of the universal challenges of moderating disruptive behavior in online systems. We combine an analysis of a large dataset from a popular online competitive first-person action title (Call of Duty®: Modern Warfare®II) with insights from stakeholders involved in moderation. We identify six universal challenges related to handling disruptive behaviors in such games. We discuss challenges omitted by prior work, such as handling high-volume imbalanced data or ensuring the comfort of human moderators. We also offer a discussion of possible technical, design, and policy approaches to mitigating these challenges.</p