    Lexicon-based bot-aware public emotion mining and sentiment analysis of the Nigerian 2019 presidential election on Twitter

    Online social networks have been widely engaged as rich potential platforms to predict election outcomes' in several countries of the world. The vast amount of readily-available data on such platforms, coupled with the emerging power of natural language processing algorithms and tools, have made it possible to mine and generate foresight into the possible directions of elections' outcome. In this paper, lexicon-based public emotion mining and sentiment analysis were conducted to predict win in the 2019 presidential election in Nigeria. 224,500 tweets, associated with the two most prominent political parties in Nigeria, People's Democratic Party (PDP) and All Progressive Congress (APC), and the two most prominent presidential candidates that represented these parties in the 2019 elections, Atiku Abubakar and Muhammadu Buhari, were collected between 9th October 2018 and 17th December 2018 via the Twitter's streaming API. tm and NRC libraries, defined in the 'R' integrated development environment, were used for data cleaning and preprocessing purposes. Botometer was introduced to detect the presence of automated bots in the preprocessed data while NRC Word Emotion Association Lexicon (EmoLex) was used to generate distributions of subjective public sentiments and emotions that surround the Nigerian 2019 presidential election. Emotions were grouped into eight categories (sadness, trust, anger, fear, joy, anticipation, disgust, surprise) while sentiments were grouped into two (negative and positive) based on Plutchik's emotion wheel. Results obtained indicate a higher positive and a lower negative sentiment for APC than was observed with PDP. Similarly, for the presidential aspirants, Atiku has a slightly higher positive and a slightly lower negative sentiment than was observed with Buhari. These results show that APC is the predicted winning party and Atiku as the most preferred winner of the 2019 presidential election. These predictions were corroborated by the actual election results as APC emerged as the winning party while Buhari and Atiku shared very close vote margin in the election. Hence, this research is an indication that twitter data can be appropriately used to predict election outcomes and other offline future events. Future research could investigate spatiotemporal dimensions of the prediction

    Spatio-temporal distribution analysis of brand interest in social networks

    Social Networks applications such as Facebook and Twitter became part of many people’s lives and are used daily by millions of users. In such platforms, users share their emotions, opinions, experiences, and thoughts. Twitter, in particular, is used to discuss diverse topics, including brands, their products and services. In this thesis, we analyse how brand interest is reflected on Twitter and how this platform can be used to monitor what people say about specific brands, as an indicator of brand interest. Brand interest can be defined as the level of interest one has in a brand, and the level of curiosity one has to learn more about a brand. For this work, the volume of tweets is used as a measure of brand interest. Our methodology is based on time, location, and the number of brand-related tweets to perform a spatio-temporal analysis. Additionally, we propose a framework for discovering latent patterns (topics) from a large dataset of grouped short messages to analyse brand interest, using Twitter as a data source. We applied a well-known Text Mining technique called Topic Modelling, which is an unsupervised learning technique used when dealing with text data, useful to uncover topics in a collection of documents. This technique provides a convenient way to retrieve information from unstructured text. Topic Modelling tasks have been applied to track events/trends and uncover topics in domains such as academic, public health, marketing, and so forth. The framework consists of training LDA (Latent Dirichlet Allocation) topic models on aggregated tweets, and then applying the model on different documents, also composed by grouped Twitter posts. Furthermore, we describe a set of pre-processing tasks that helped to improve the performance of topic models, enabling us to obtain a better output, thus performing a better analysis of it. The experiments demonstrated that Topic Modelling can successfully track people’s discussions on Social Networks even in massive datasets such as the one used in the current work, and capture those topics spiked by real-life eventsActualmente, plataformas como Twitter e Facebook fazem parte do dia-a-dia de muitas pessoas e são usadas por milhões de utilizadores. Nestas plataformas, denominadas Redes Sociais, os utilizadores partilham informações incluindo opiniões, sentimentos, experiências e pensamentos. A plataforma Twitter, em particular, e usada para partilhar diversos tópicos, que podem incluir dicussões sobre marcas, seus produtos e/ou serviços. O presente estudo analisa como o interesse numa marca e reflectido na Rede Social Twitter e apresenta uma metodologia que permite utilizar o Twitter como fonte de informação para monitorizar o que os utilizadores dizem acerca de determinadas marcas. O interesse numa marca pode ser definido como o nível de interesse que um indivíduo tem por uma marca, e o nível de curiosidade que um indivíduo tem e que o leva a aprender mais acerca dessa marca. Neste estudo, o número de tweets publicados e usado para medir o interesse nas marcas escolhidas. A metodologia seguida baseia-se na data em que o tweet foi publicado, localização, e número de publicações, para efectuar uma análise espacio-temporal. Adicionalmente, apresenta-se uma framework que possibilita a exploração de um vasto conjunto de dados, com o objectivo de revelar padrões latentes, bem como analisar o interesse nas marcas seleccionadas, usando o Twitter como fonte dados. Para o efeito, aplicou-se Topic Modelling, uma técnica de Text Mining bastante utilizada para descobrir tópicos em texto não estruturado. Algoritmos de Topic Modelling têm sido amplamente utilizados para monitorizar eventos e tendências e descobrir tópicos em áreas como educação, marketing, saúde, entre outras. A framework consiste em treinar o modelo de tópicos LDA (Latent Dirichlet Allocation) usando tweets agrupados (considerando determinado critério) e posteriormente aplicar o modelo treinado noutro conjunto de tweets agrupados (considerando outro critério). Descreve-se um conjunto de tarefas de pré-processamento dos dados que ajudaram a melhorar o desempenho dos modelos, a obter melhor resultados e, consequentemente, a efectuar uma melhor análise. As experiências revelam que atravês de Topic Modelling e possível rastrear dicussões de utilizadores de Redes Sociais durante um longo período de tempo, e capturar alterações relacionadas com acontecimentos reais

    Comparing and Combining Sentiment Analysis Methods

    Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Networks (OSNs). There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the wide use and popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message as the current literature does not provide a method of comparison among existing methods. Such a comparison is crucial for understanding the potential limitations, advantages, and disadvantages of popular methods in analyzing the content of OSNs messages. Our study aims at filling this gap by presenting comparisons of eight popular sentiment analysis methods in terms of coverage (i.e., the fraction of messages whose sentiment is identified) and agreement (i.e., the fraction of identified sentiments that are in tune with ground truth). We develop a new method that combines existing approaches, providing the best coverage results and competitive agreement. We also present a free Web service called iFeel, which provides an open API for accessing and comparing results across different sentiment methods for a given text.Comment: Proceedings of the first ACM conference on Online social networks (2013) 27-3

    Suspended accounts align with the Internet Research Agency misinformation campaign to influence the 2016 US election

    Abstract The ongoing debate surrounding the impact of the Internet Research Agency’s (IRA) social media campaign during the 2016 U.S. presidential election has largely overshadowed the involvement of other actors. Our analysis brings to light a substantial group of suspended Twitter users, outnumbering the IRA user group by a factor of 60, who align with the ideologies of the IRA campaign. Our study demonstrates that this group of suspended Twitter accounts significantly influenced individuals categorized as undecided or weak supporters, potentially with the aim of swaying their opinions, as indicated by Granger causality

    Social Media Text Processing and Semantic Analysis for Smart Cities

    With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content. With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization. We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle. We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture

    TrollBus, An Empirical Study Of Features For Troll Detection

    No atual contexto de redes sociais, a discussão política tornou-se um evento normal. Utilizadores de todos os segmentos do espetro político têm a possibilidade de expressar as suas opiniões livremente e discutir as suas visões em várias redes sociais, incluindo o Twitter. Desde 2016, um grupo de utilizadores cujo objetivo é polarizar discussões e semear a discórdia começou a ganhar notoriedade nesta rede social. Estas contas são conhecidas como Trolls, e têm sido ligadas a vários eventos na história recente, tais como a interferência em eleições e a organização de manifestações violentas. Desde a sua descoberta, vários trabalhos de investigação têm sido realizados de modo a detetar estas contas através de machine learning. As abordagens existentes usaram tipos diferentes de atributos. O objetivo deste trabalho é comparar esses grupos de atributos. Para tal, um estudo empírico foi realizado, no qual estes atributos são adaptados à comunidade portuguesa do Twitter. O objetivo deste trabalho foi de analisar as múltiplas abordagens realizadas para a deteção de trolls, com uma descrição das suas features e a sua comparação, quer individualmente quer em grupo. Para tal, um estudo empírico foi realizado, em que estas features são adaptadas à comunidade portuguesa do Twitter. Os dados para este projeto foram recolhidos através do SocialBus, uma ferramenta para a recolha, processamento e armazenamento de dados de redes sociais, nomeadamente do Twitter. O conjunto de contas usado para a recolha de dados foi obtido a partir de jornalistas de política portugueses, e a anotação de trolls foi realizada através de um conjunto restrito de regras comportamentais, auxiliada por uma função de pontuação. Um novo módulo para esta plataforma foi desenvolvido, chamado Trollbus, que realiza a deteção de trolls em tempo real. Um dataset público foi também disponibilizado. Os atributos do melhor modelo combinam os metadados do perfil de uma conta com os aspetos superficiais presentes no seu texto. O grupo de atributos mais importantes revelou ser os aspetos numéricos dos dados, com o mais importante a revelar ser a presença de insultos políticos.In today's social network context, the discussion of politics online has become a normal event. Users from all sides of the political spectrum are able to express their opinions freely and discuss their views in various social networks, including Twitter. From 2016 onward, a group of users whose objective is to polarize discussions and sow discord began to gain notoriety in this social network. These accounts are known as Trolls, and they have been linked to several events in recent history such as the influencing of elections and the organizing of violent protests. Since their discovery, several approaches have been developed to detect these accounts using machine learning techniques. Existing approaches have used different types of features. The goal of this work is to compare those different sets of features. To do so, an empirical study was performed, which adapts these features to the Portuguese Twitter community. The necessary data was collected through SocialBus, a tool for the collection, processing and storage of data from social networks, namely Twitter. The set of accounts used to collect the data were obtained from Portuguese political journalists and the labelling of trolls was performed with a strict set of behavioural rules, aided by a scoring function. A new module for SocialBus was developed, called Trollbus, which performs troll detection in real time. A public dataset was also released. The features of the best model obtained combine an account's profile metadata with the superficial aspects present in its text. The most important feature set noted to be the numerical aspects of the text, with the most important feature revealing to be the presence of political insults

    In Quest of Significance: Identifying Types of Twitter Sentiment Events that Predict Spikes in Sales

    We study the power of Twitter events to predict consumer sales events by analysing sales for 75 companies from the retail sector and over 150 million tweets mentioning those companies along with their sentiment. We suggest an approach for events identification on Twitter extending existing methodologies of event study. We also propose a robust method for clustering Twitter events into different types based on their shape, which captures the varying dynamics of information propagation through the social network. We provide empirical evidence that through events differentiation based on their shape we can clearly identify types of Twitter events that have a more significant power to predict spikes in sales than the aggregated Twitter signal
