156 research outputs found
An analysis of the user occupational class through Twitter content
Social media content can be used as a complementary source to the traditional
methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a userās occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweetās word, emoji, and expression tokensā embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Unionās Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
"Go eat a bat, {Chang!}": {A}n Early Look on the Emergence of Sinophobic Behavior on {Web} Communities in the Face of {COVID}-19
The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, many countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like \dspol, and to a lesser extent on mainstream ones like Twitter. Also, using word embeddings over time, we characterize the evolution and emergence of new Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs
- ā¦