Search CORE

136 research outputs found

Resources and benchmark corpora for hate speech detection: a systematic review

Author: Basile Valerio
Bosco Cristina
Patti Viviana
Poletto Fabio
Sanguinetti Manuela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Toxicity

Author: Risch Julian
Publication venue: Berlin
Publication date: 01/01/2023
Field of study

In research on online comments on social media platforms, different terms are widely used to describe comments that are hateful or disrespectful and thereby poison a discussion. This chapter takes a theoretical perspective on the term toxicity and related research in the field of computer science. More specifically, it explains the usage of the term and why its exact interpretation depends on the platform in question. Further, the article discusses the advantages of toxicity over other terms and provides an overview of the available toxic comment datasets. Finally, it introduces the concept of engaging comments as the counterpart of toxic comments, leading to a task that is complementary to the prevention and removal of toxic comments: the fostering and highlighting of engaging comments

SSOAR - Social Science Open Access Repository

SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter

Author: Basile Valerio
Bosco Cristina
Debora Nozza
Fersini Elisabetta
Francisco Manuel Rangel Pardo
Patti Viviana
Rosso Paolo
Sanguinetti Manuela
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Archivio istituzionale della Ricerca - Bocconi

Institutional Research Information System University of Turin

Multilingual Cross-domain Perspectives on Online Hate Speech

Author: Daelemans Walter
De Pauw Guy
De Smedt Tom
Gwóźdź Maja
Jaki Sylvia
Kotzé Eduan
Saoud Leïla
Publication venue
Publication date: 01/01/2018
Field of study

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

SWSR: A Chinese dataset and lexicon for online sexism detection

Author: Jiang A
Liu Y
Yang X
Zubiaga A
Publication venue
Publication date: 20/05/2021
Field of study

Online sexism has become an increasing concern in social media platforms as it has affected the healthy development of the Internet and can have negative effects in society. While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform. Our objective here is to broaden the scope of this research by considering the Chinese language on Sina Weibo. We propose the first Chinese sexism dataset – Sina Weibo Sexism Review (SWSR) dataset –, as well as a large Chinese lexicon SexHateLex made of abusive and gender-related terms. We introduce our data collection and annotation process, and provide an exploratory analysis of the dataset characteristics to validate its quality and to show how sexism is manifested in Chinese. The SWSR dataset provides labels at different levels of granularity including (i) sexism or non-sexism, (ii) sexism category and (iii) target type, which can be exploited, among others, for building computational methods to identify and investigate finer-grained gender-related abusive language. We conduct experiments for the three sexism classification tasks making use of state-of-the-art machine learning models. Our results show competitive performance, providing a benchmark for sexism detection in the Chinese language, as well as an error analysis highlighting open challenges needing more research in Chinese NLP. The SWSR dataset and SexHateLex lexicon are publicly available.

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Queen Mary Research Online

Towards multidomain and multilingual abusive language detection: a survey

Author: Basile V.
Pamungkas E. W.
Patti V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter

Author: Basile Valerio
Bosco Cristina
Fersini Elisabetta
Nozza Debora
Patti Viviana
Rangel Francisco
Rosso Paolo
Sanguinetti Manuela
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Archivio istituzionale della Ricerca - Bocconi