4 research outputs found
Reddit dataset for Adverse Drug Reaction
Reddit dataset for Adverse Drug Reaction (ADR) detection which was created with the help of expert annotators
Characterising and Mitigating Aggregation-Bias in Crowdsourced Toxicity Annotations
Training machine learning (ML) models for natural language processing usually requires large amount of data, often acquired through crowdsourcing. The way this data is collected and aggregated can have an effect on the outputs of the trained model such as ignoring the labels which differ from the majority. In this paper we investigate how label aggregation can bias the ML results towards certain data samples and propose a methodology to highlight and mitigate this bias. Although our work is applicable to any kind of label aggregation for data subject to multiple interpretations, we focus on the effects of the bias introduced by majority voting on toxicity prediction over sentences. Our preliminary results point out that we can mitigate the majority-bias and get increased prediction accuracy for the minority opinions if we take into account the different labels from annotators when training adapted models, rather than rely on the aggregated labels
A Human in the Loop Approach to Capture Bias and Support Media Scientists in News Video Analysis
Bias is inevitable and inherent in any form of communication. News often appear biased to citizens with dierent political orientations, and understood dierently by news media scholars and the broader public. In this paper we advocate the need for accurate methods for bias identication in video news item, to enable rich analytics capabilities in order to assist humanities media scholars and social political scientists. We propose to analyze biases that are typical in video news (includingframing, gender and racial biases) by means of a human-in-the-loop approachthat combines text and image analysis with human computation techniques