4,453 research outputs found
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
Geomatics Applications to Contemporary Social and Environmental Problems in Mexico
Trends in geospatial technologies have led to the development of new powerful analysis and representation techniques that involve processing of massive datasets, some unstructured, some acquired from ubiquitous sources, and some others from remotely located sensors of different kinds, all of which complement the structured information produced on a regular basis by governmental and international agencies. In this chapter, we provide both an extensive revision of such techniques and an insight of the applications of some of these techniques in various study cases in Mexico for various scales of analysis: from regional migration flows of highly qualified people at the country level and the spatio-temporal analysis of unstructured information in geotagged tweets for sentiment assessment, to more local applications of participatory cartography for policy definitions jointly between local authorities and citizens, and an automated method for three dimensional (3D) modelling and visualisation of forest inventorying with laser scanner technology
On the Statistical and Temporal Dynamics of Sentiment Analysis
Despite the broad interest and use of sentiment analysis nowadays, most of the conclusions in current literature are driven by simple statistical representations of sentiment scores. On that basis, the generated sentiment evaluation consists nowadays of encoding and aggregating emotional information from a number of individuals and their populational trends. We hypothesized that the stochastic processes aimed to be measured by sentiment analysis systems will exhibit nontrivial statistical and temporal properties. We established an experimental setup consisting of analyzing the short text messages (tweets) of 6 user groups with different nature (universities, politics, musicians, communication media, technological companies, and financial companies), including in each group ten high-intensity users in their regular generation of traffic on social networks. Statistical descriptors were checked to converge at about 2000 messages for each user, for which messages from the last two weeks were compiled using a custom-made tool. The messages were subsequently processed for sentiment scoring in terms of different lexicons currently available and widely used. Not only the temporal dynamics of the resulting score time series per user was scrutinized, but also its statistical description as given by the score histogram, the temporal autocorrelation, the entropy, and the mutual information. Our results showed that the actual dynamic range of lexicons is in general moderate, and hence not much resolution is given within their end-of-scales. We found that seasonal patterns were more present in the time evolution of the number of tweets, but to a much lesser extent in the sentiment intensity. Additionally, we found that the presence of retweets added negligible effects over standard statistical modes, while it hindered informational and temporal patterns. The innovative Compounded Aggregated Positivity Index developed in this work proved to be characteristic for industries and at ..
- …