Search CORE

14 research outputs found

AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

Author: Alvarez R. Michael
Anandkumar Anima
Kocielnik Rafal
Prabhumoye Shrimai
Zhang Vivian
Publication venue
Publication date: 14/02/2023
Field of study

Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distribution

arXiv.org e-Print Archive

Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

Author: Alvarez R. Michael
Anandkumar Anima
Hari Meena
Kangaslahti Sara
Kocielnik Rafal
Prabhumoye Shrimai
Publication venue
Publication date: 21/11/2022
Field of study

Labeling social-media data for custom dimensions of toxicity and social bias is challenging and labor-intensive. Existing transfer and active learning approaches meant to reduce annotation effort require fine-tuning, which suffers from over-fitting to noise and can cause domain shift with small sample sizes. In this work, we propose a novel Active Transfer Few-shot Instructions (ATF) approach which requires no fine-tuning. ATF leverages the internal linguistic knowledge of pre-trained language models (PLMs) to facilitate the transfer of information from existing pre-labeled datasets (source-domain task) with minimum labeling effort on unlabeled target data (target-domain task). Our strategy can yield positive transfer achieving a mean AUC gain of 10.5% compared to no transfer with a large 22b parameter PLM. We further show that annotation of just a few target-domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort (26% drop in gain between 100 and 2000 annotated examples). Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural Language Processing, 2022, New Orlean

arXiv.org e-Print Archive

Conversational Agents for Health and Wellbeing

Author: Clark L
Kocaballi AB
Kocielnik R
Laranjo L
Liao QV
Miner A
Moore RJ
Park SY
Quiroz JC
Rezazadegan D
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Conversational agents have increasingly been deployed in healthcare applications. However, significant challenges remain in developing this technology. Recent research in this area has highlighted that: i) patient safety was rarely evaluated; ii) health outcomes were poorly measured, and iii) no standardised evaluation methods were employed. The conversational agents in healthcare are lagging behind the developments in other domains. This one-day workshop aims to create a roadmap for healthcare conversational agents to develop standardised design and evaluation frameworks. This will prioritise health outcomes and patient safety while ensuring a high-quality user experience. In doing so, this workshop will bring together researchers and practitioners from HCI, healthcare and related speech and chatbot domains to collaborate on these key challenges

OPUS - University of Technology Sydney

Cronfa at Swansea University

Challenges in moderating disruptive player behavior in online competitive action games

Author: Animashree Anandkumar
Arman Dehpanah
Carly Taylor
Claudia Kann
Deshawn Sambrano
Feri Soltani
Grant Cahill
Jacob Morrier
Min Kim
Mitchell Linegar
Nabiha Naqvie
R. Michael Alvarez
R. Michael Alvarez
Rafal Kocielnik
Zhuofang Li
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

Online competitive action games are a very popular form of entertainment. While most are respectfully enjoyed by millions of players, a small group of players engages in disruptive behavior, such as cheating and hate speech. Identifying and subsequently moderating these toxic players is a challenging task. Previous research has only studied specific aspects of this problem using curated data and with limited access to real-world moderation practices. In contrast, our work offers a unique and holistic view of the universal challenges of moderating disruptive behavior in online systems. We combine an analysis of a large dataset from a popular online competitive first-person action title (Call of Duty®: Modern Warfare®II) with insights from stakeholders involved in moderation. We identify six universal challenges related to handling disruptive behaviors in such games. We discuss challenges omitted by prior work, such as handling high-volume imbalanced data or ensuring the comfort of human moderators. We also offer a discussion of possible technical, design, and policy approaches to mitigating these challenges

Directory of Open Access Journals

A trust evaluation framework for sensor readings in body area sensor networks

Author: Bui T.V.
Kocielnik R.D.
Lukkien J.J.
Verhoeven R.
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 01/01/2013
Field of study

This paper addresses a framework to evaluate trustworthiness of a Body Area Sensor Networks (BASN), in particular, of sensor readings. We show that such trustworthiness is to be interpreted with respect to a certain statement or goal; its evaluation is based on quality aspects derived from observations and opinions from others. We examine relevant quality aspects of sensor readings which correspond to potential deviating behaviors of sensors. We then look at how to derive such qualities from observations taking uncertainty into the evaluation as well as decay over time. We develop an extension of subjective logic for this purpose and we show how we can compute quality properties without storing long time series. We then demonstrate this for two examples, including Galvanic Skin Response (GSR) and Electrocardiography (ECG) sensed data

Special Issue on Conversational Agents for Healthcare and Wellbeing

Author: Bickmore T
Clark L
Kocaballi AB
Kocielnik R
Laranjo L
Liao QV
Moore RJ
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2023
Field of study

OPUS - University of Technology Sydney

Visual fidelity of video prototypes and user feedback: A case study

Author: Banach P
Dhillon B
Emparanza JP
Kocielnik R
Markopoulos P
Politis I
Ra̧czewska A
Publication venue
Publication date: 01/01/2011
Field of study

This paper addresses the debate regarding the respective merits of high and low fidelity prototypes, in the domain of video prototyping. Video prototyping is a popular tool for interface designers. Despite this, there is practically no research reported to date examining the fidelity of the design representation that the video prototype should manifest. We report a case study where the same design concept was rendered on video in two formats with differing degree of visual fidelity: animated paper cut-outs (low visual fidelity) versus a video with real actors, edited to simulate computer output (high visual fidelity). A two-pronged comparative evaluation was carried out: a between-subjects questionnaire survey, consisting of AttrakDiff, open-ended questions completed by 99 participants, and semi-structured qualitative interviews with 9 participants. The results did not reveal any differences regarding the amount or quality of feedback one should expect from a low or a high fidelity video. These results lead us to suggest that the paper cut out animation is a valid prototype that should be explored more by interaction designers for obtaining early user feedback at low cost

Crossref

Pure OAI Repository

CUED - Cambridge University Engineering Department