Search CORE

11 research outputs found

Disembodied Machine Learning: On the Illusion of Objectivity in NLP

Author: Augenstein Isabelle
Bingel Joachim
Lulz Smarika
Waseem Zeerak
Publication venue
Publication date: 01/01/2020
Field of study

Machine Learning seeks to identify and encode bodies of knowledge within provided datasets. However, data encodes subjective content, which determines the possible outcomes of the models trained on it. Because such subjectivity enables marginalisation of parts of society, it is termed (social) `bias' and sought to be removed. In this paper, we contextualise this discourse of bias in the ML community against the subjective choices in the development process. Through a consideration of how choices in data and model development construct subjectivity, or biases that are represented in a model, we argue that addressing and mitigating biases is near-impossible. This is because both data and ML models are objects for which meaning is made in each step of the development pipeline, from data selection over annotation to model training and analysis. Accordingly, we find the prevalent discourse of bias limiting in its ability to address social marginalisation. We recommend to be conscientious of this, and to accept that de-biasing methods only correct for a fraction of biases.Comment: In revie

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dynabench: Rethinking Benchmarking in NLP

Author: Bansal Mohit
Bartolo Max
Geiger Atticus
Jia Robin
Kaushik Divyansh
Kiela Douwe
Ma Zhiyi
Nie Yixin
Potts Christopher
Prasad Grusha
Riedel Sebastian
Ringshia Pratik
Singh Amanpreet
Stenetorp Pontus
Thrush Tristan
Vidgen Bertie
Waseem Zeerak
Williams Adina
Wu Zhengxuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/06/2021
Field of study

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field

UCL Discovery

Findings from the Hackathon on Understanding Euroscepticism Through the Lens of Textual Data

Author: Aker Ahmet
Bleier Arnim
Carlotti Benedetta
Conti Nicolo
Gessler Theresa
Glavas Goran
Henrichsen Tim
Hovy Dirk
Kahmann Christian
Karan Mladen
Matsuo Akitaka
Menini Stefano
Nanni Federico
Nguyen Dong
Niekler Andreas
Palmero Aprosio Alessio
Ponzetto Simone Paolo
Posch Lisa
Tonelli Sara
Vegetti Federico
Waseem Zeerak
Whyte Tanya
Yordanova Nikoleta
Publication venue
Publication date: 01/01/2018
Field of study

We present an overview and the results of a shared-task hackathon that took place as part of a research seminar bringing together a variety of experts and young researchers from the fields of political science, natural language processing and computational social science. The task looked at ways to develop novel methods for political text scaling to better quantify political party positions on European integration and Euroscepticism from the transcript of speeches of three legislations of the European Parliament

MAnnheim DOCument Server

Edinburgh Research Explorer

Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter

Author: Hovy Dirk
Waseem Zeerak
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Copenhagen University Research Information System

Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter

Author: Hovy Dirk
Waseem Zeerak
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2016
Field of study

Hate speech in the form of racist and sexist remarks are a common occurrence on social media. For that reason, many social media services address the problem of identifying hate speech, but the definition of hate speech varies markedly and is largely a manual effort. We provide a list of criteria founded in critical race theory, and use them to annotate a publicly available corpus of more than 16k tweets. We analyze the impact of various extra-linguistic features in conjunction with character n-grams for hate-speech detection. We also present a dictionary based the most indicative words in our data

Archivio istituzionale della Ricerca - Bocconi

Copenhagen University Research Information System

Leaky academia: digital intimacy and open secrets in times of COVID-19

Author: Agostinho Daniela
Thylstrup Nanna
Waseem Zeerak
Publication venue
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System

HateCheck: Functional Tests for Hate Speech Detection Models

Author: Margetts Helen
Nguyen Dong
Pierrehumbert Janet
Röttger Paul
Vidgen Bertie
Waseem Zeerak
Publication venue
Publication date: 31/12/2020
Field of study

Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck's utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses

arXiv.org e-Print Archive

Utrecht University Repository