Search CORE

1,362 research outputs found

Multilingual Cross-domain Perspectives on Online Hate Speech

Author: Daelemans Walter
De Pauw Guy
De Smedt Tom
Gwóźdź Maja
Jaki Sylvia
Kotzé Eduan
Saoud Leïla
Publication venue
Publication date: 01/01/2018
Field of study

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Deep Learning for User Comment Moderation

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Pavlopoulos John
Publication venue
Publication date: 01/01/2017
Field of study

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation

arXiv.org e-Print Archive

Crossref

TA-COS 2018 : 2nd Workshop on Text Analytics for Cybersecurity and Online Safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Detecting Online Hate Speech Using Both Supervised and Weakly-Supervised Approaches

Author: Gao Lei
Publication venue
Publication date: 17/01/2019
Field of study

In the wake of a polarizing election, social media is laden with hateful content. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. We provide an annotated corpus of hate speech with context information well kept. Then we propose two types of supervised hate speech detection models that incorporate context information, a logistic regression model with context features and a neural network model with learning components for context. Further, to address various limitations of supervised hate speech classification methods including corpus bias and huge cost of annotation, we propose a weakly supervised two-path bootstrapping approach for online hate speech detection by leveraging large-scale unlabeled data. This system significantly outperforms hate speech detection systems that are trained in a supervised manner using manually annotated data. Applying this model on a large quantity of tweets collected before, after, and on election day reveals motivations and patterns of inflammatory language

Texas A&M Repository

Assessing the impact of contextual information in hate speech detection

Author: Cotik Viviana
Debandi Natalia
Gravano Agustín
Kondratzky Martín
Luque Franco
Miguel Paula
Moro Agustín
Pérez Juan Manuel
Serrati Pablo
Zajac Joaquín
Zayat Demian
Publication venue
Publication date: 01/01/2023
Field of study

In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the great amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms. One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not. In this work, we provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic. Classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance for two proposed tasks (binary and multi-label prediction). We make our code, models, and corpus available for further research

arXiv.org e-Print Archive

Repositorio Digital Universidad Torcuato Di Tella