Search CORE

132 research outputs found

RECAST: Interactive Auditing of Automatic Toxicity Detection Models

Author: Ahmed Muhammed
Chau Duen Horng
Epperson Will
Park Haekyu
Pinel Stephane
Shaikh Omar
Wright Austin P.
Yang Diyi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2020
Field of study

As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments. Despite the fairness concerns, lack of adversarial robustness, and limited prediction explainability for deep learning systems, there is currently little work for auditing these systems and understanding how they work for both developers and users. We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.Comment: 8 Pages, 3 figures, The eighth International Workshop of Chinese CHI Proceeding

arXiv.org e-Print Archive

Crossref

Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning

Author: Ganguly Debasis
Sen Procheta
Publication venue
Publication date: 03/04/2020
Field of study

Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhuman predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective learning framework that given a set of identity attributes (e.g. gender, ethnicity etc.) and a subset of sensitive categories of the possible classes of prediction outputs, learns to reduce the frequency of predicting certain combinations of them, e.g. predicting stereotypes such as `most blacks use abusive language', or `fear is a virtue of women'. Our experiments conducted on an emotion prediction task with balanced class priors shows that a set of baseline bias-agnostic models exhibit cognitive biases with respect to gender, such as women are prone to be afraid whereas men are more prone to be angry. In contrast, our proposed bias-aware multi-objective learning methodology is shown to reduce such biases in the predictied emotions

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Detecting East Asian Prejudice on Social Media

Author: Botelho Austin
Broniatowski David
Guest Ella
Hale Scott
Hall Matthew
Margetts Helen
Tromble Rebekah
Vidgen Bertie
Waseem Zeerak
Publication venue
Publication date: 01/01/2020
Field of study

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. In this paper we report on the creation of a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive