4,962 research outputs found
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
A Semantics-Based Measure of Emoji Similarity
Emoji have grown to become one of the most important forms of communication
on the web. With its widespread use, measuring the similarity of emoji has
become an important problem for contemporary text processing since it lies at
the heart of sentiment analysis, search, and interface design tasks. This paper
presents a comprehensive analysis of the semantic similarity of emoji through
embedding models that are learned over machine-readable emoji meanings in the
EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji
sense definitions, and with different training corpora obtained from Twitter
and Google News, we develop and test multiple embedding models to measure emoji
similarity. To evaluate our work, we create a new dataset called EmoSim508,
which assigns human-annotated semantic similarity scores to a set of 508
carefully selected emoji pairs. After validation with EmoSim508, we present a
real-world use-case of our emoji embedding models using a sentiment analysis
task and show that our models outperform the previous best-performing emoji
embedding model on this task. The EmoSim508 dataset and our emoji embedding
models are publicly released with this paper and can be downloaded from
http://emojinet.knoesis.org/.Comment: This paper is accepted at Web Intelligence 2017 as a full paper, In
2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). Leipzig,
Germany: ACM, 201
- …