2,416 research outputs found
Multilingual Cross-domain Perspectives on Online Hate Speech
In this report, we present a study of eight corpora of online hate speech, by
demonstrating the NLP techniques that we used to collect and analyze the
jihadist, extremist, racist, and sexist content. Analysis of the multilingual
corpora shows that the different contexts share certain characteristics in
their hateful rhetoric. To expose the main features, we have focused on text
classification, text profiling, keyword and collocation extraction, along with
manual annotation and qualitative study.Comment: 24 page
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
Automated Hate Speech Detection and the Problem of Offensive Language
A key challenge for automatic hate-speech detection on social media is the
separation of hate speech from other instances of offensive language. Lexical
detection methods tend to have low precision because they classify all messages
containing particular terms as hate speech and previous work using supervised
learning has failed to distinguish between the two categories. We used a
crowd-sourced hate speech lexicon to collect tweets containing hate speech
keywords. We use crowd-sourcing to label a sample of these tweets into three
categories: those containing hate speech, only offensive language, and those
with neither. We train a multi-class classifier to distinguish between these
different categories. Close analysis of the predictions and the errors shows
when we can reliably separate hate speech from other offensive language and
when this differentiation is more difficult. We find that racist and homophobic
tweets are more likely to be classified as hate speech but that sexist tweets
are generally classified as offensive. Tweets without explicit hate keywords
are also more difficult to classify.Comment: To appear in the Proceedings of ICWSM 2017. Please cite that versio
Towards Measuring Adversarial Twitter Interactions against Candidates in the US Midterm Elections
Adversarial interactions against politicians on social media such as Twitter
have significant impact on society. In particular they disrupt substantive
political discussions online, and may discourage people from seeking public
office. In this study, we measure the adversarial interactions against
candidates for the US House of Representatives during the run-up to the 2018 US
general election. We gather a new dataset consisting of 1.7 million tweets
involving candidates, one of the largest corpora focusing on political
discourse. We then develop a new technique for detecting tweets with toxic
content that are directed at any specific candidate.Such technique allows us to
more accurately quantify adversarial interactions towards political candidates.
Further, we introduce an algorithm to induce candidate-specific adversarial
terms to capture more nuanced adversarial interactions that previous techniques
may not consider toxic. Finally, we use these techniques to outline the breadth
of adversarial interactions seen in the election, including offensive
name-calling, threats of violence, posting discrediting information, attacks on
identity, and adversarial message repetition
- …