Search CORE

1,270 research outputs found

Twitter sentiment analysis in the era of emojis

Author: Li Mengdi
Publication venue
Publication date
Field of study

Twitter has become an important site for national discussions where we can get a new and timely update of the public opinion towards any event. Twitter Sentiment Analysis (TSA) can be an effective method for unpacking the deep insights embodied within the opinions of the public. Recently, various TSA techniques have been developed, but little consideration has gone into emojis, which is a new invention and has been popularly shared by Twitter users from different countries, with various demographic characteristics, and diverse cultural backgrounds. The ubiquitous adoption of emojis on Twitter provides new opportunities to analyse sentiment expressions in a textual context. Emojis should be included when conducting TSA as the meaning of a Twitter post and its sentiment can be identified with greater clarity and accuracy with emojis. This research aims to develop novel approaches that handle emojis properly and tackle current open issues in TSA. Consisting of four phases, this thesis presents a comprehensive and in-depth research work in the field of Emoji Analytics and TSA. Several studies have been conducted to investigate emoji usage on Twitter and evaluate their effects on TSA. The experimental results demonstrate that emojis has become an essential component of Twitter communication and it is an important area of study complementary to TSA, implying promising future research opportunities for TSA. A novel TSA methodological framework that collects, pre-processes, analyses and maps citizen sentiments from Twitter in helping learn citizens’ moods has been implemented and proved to be effective. The novel framework identifies the best setting for TSA when involving emojis, and proposes an effective emoji training heuristic, which is feasible for both ternary and multi-class classification of tweets. Besides, it innovatively includes the visualisation of user-generated contents in a location-based manner on geographical maps, which provides a much easier-to-understand visual representation of the sentiment. The methodological framework has been proved applicable in real-world scenarios and can be used to support research in other fields. Being the first to consider popularity of emojis on Twitter and include them in performing TSA, this research is considered to be a pioneering work in the field, suggesting a new direction for TSA in the era of emojis

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

Author: Grcar Miha
Mozetic Igor
Smailovic Jasmina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/02/2016
Field of study

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

arXiv.org e-Print Archive

Directory of Open Access Journals

Digital repository of Slovenian research organizations

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Author: Felbo Bjarke
Lehmann Sune
Mislove Alan
Rahwan Iyad
Søgaard Anders
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.Comment: Accepted at EMNLP 2017. Please include EMNLP in any citations. Minor changes from the EMNLP camera-ready version. 9 pages + references and supplementary materia

arXiv.org e-Print Archive

Copenhagen University Research Information System

Detecting and Monitoring Hate Speech in Twitter

Author: Camacho-Collados Miguel
Liberatore Federico
Pereira-Kohatsu Juan Carlos
Quijano-Sánchez Lara
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge

Universidad Carlos III de Madrid e-Archivo

Detecting Sarcasm in Multimodal Social Platforms

Author: Bamman D.
Davidov D.
Frome A.
Ghosh D.
Gibbs R.
González-Ibánez R.
Kincaid J. P.
Mikolov T.
Riloff E.
Tepperman J.
Tsur O.
Veale T.
Verstraten P.
Wang Z.
You Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.Comment: 10 pages, 3 figures, final version published in the Proceedings of ACM Multimedia 201

arXiv.org e-Print Archive