342 research outputs found
Lexicon-based bot-aware public emotion mining and sentiment analysis of the Nigerian 2019 presidential election on Twitter
Online social networks have been widely engaged as rich potential platforms to predict election outcomes' in several countries of the world. The vast amount of readily-available data on such platforms, coupled with the emerging power of natural language processing algorithms and tools, have made it possible to mine and generate foresight into the possible directions of elections' outcome. In this paper, lexicon-based public emotion mining and sentiment analysis were conducted to predict win in the 2019 presidential election in Nigeria. 224,500 tweets, associated with the two most prominent political parties in Nigeria, People's Democratic Party (PDP) and All Progressive Congress (APC), and the two most prominent presidential candidates that represented these parties in the 2019 elections, Atiku Abubakar and Muhammadu Buhari, were collected between 9th October 2018 and 17th December 2018 via the Twitter's streaming API. tm and NRC libraries, defined in the 'R' integrated development environment, were used for data cleaning and preprocessing purposes. Botometer was introduced to detect the presence of automated bots in the preprocessed data while NRC Word Emotion Association Lexicon (EmoLex) was used to generate distributions of subjective public sentiments and emotions that surround the Nigerian 2019 presidential election. Emotions were grouped into eight categories (sadness, trust, anger, fear, joy, anticipation, disgust, surprise) while sentiments were grouped into two (negative and positive) based on Plutchik's emotion wheel. Results obtained indicate a higher positive and a lower negative sentiment for APC than was observed with PDP. Similarly, for the presidential aspirants, Atiku has a slightly higher positive and a slightly lower negative sentiment than was observed with Buhari. These results show that APC is the predicted winning party and Atiku as the most preferred winner of the 2019 presidential election. These predictions were corroborated by the actual election results as APC emerged as the winning party while Buhari and Atiku shared very close vote margin in the election. Hence, this research is an indication that twitter data can be appropriately used to predict election outcomes and other offline future events. Future research could investigate spatiotemporal dimensions of the prediction
Spatio-temporal distribution analysis of brand interest in social networks
Social Networks applications such as Facebook and Twitter became part of many people’s
lives and are used daily by millions of users. In such platforms, users share their emotions,
opinions, experiences, and thoughts. Twitter, in particular, is used to discuss diverse topics,
including brands, their products and services. In this thesis, we analyse how brand interest is
reflected on Twitter and how this platform can be used to monitor what people say about specific
brands, as an indicator of brand interest. Brand interest can be defined as the level of interest
one has in a brand, and the level of curiosity one has to learn more about a brand. For this work,
the volume of tweets is used as a measure of brand interest. Our methodology is based on time,
location, and the number of brand-related tweets to perform a spatio-temporal analysis.
Additionally, we propose a framework for discovering latent patterns (topics) from a large
dataset of grouped short messages to analyse brand interest, using Twitter as a data source. We
applied a well-known Text Mining technique called Topic Modelling, which is an unsupervised
learning technique used when dealing with text data, useful to uncover topics in a collection
of documents. This technique provides a convenient way to retrieve information from unstructured text. Topic Modelling tasks have been applied to track events/trends and uncover topics
in domains such as academic, public health, marketing, and so forth. The framework consists of training LDA (Latent Dirichlet Allocation) topic models on aggregated tweets, and then
applying the model on different documents, also composed by grouped Twitter posts. Furthermore, we describe a set of pre-processing tasks that helped to improve the performance of topic
models, enabling us to obtain a better output, thus performing a better analysis of it. The experiments demonstrated that Topic Modelling can successfully track people’s discussions on Social
Networks even in massive datasets such as the one used in the current work, and capture those
topics spiked by real-life eventsActualmente, plataformas como Twitter e Facebook fazem parte do dia-a-dia de muitas pessoas e são usadas por milhões de utilizadores. Nestas plataformas, denominadas Redes Sociais,
os utilizadores partilham informações incluindo opiniões, sentimentos, experiências e pensamentos. A plataforma Twitter, em particular, e usada para partilhar diversos tópicos, que podem
incluir dicussões sobre marcas, seus produtos e/ou serviços. O presente estudo analisa como o
interesse numa marca e reflectido na Rede Social Twitter e apresenta uma metodologia que permite utilizar o Twitter como fonte de informação para monitorizar o que os utilizadores dizem
acerca de determinadas marcas. O interesse numa marca pode ser definido como o nÃvel de
interesse que um indivÃduo tem por uma marca, e o nÃvel de curiosidade que um indivÃduo tem
e que o leva a aprender mais acerca dessa marca. Neste estudo, o número de tweets publicados
e usado para medir o interesse nas marcas escolhidas. A metodologia seguida baseia-se na data
em que o tweet foi publicado, localização, e número de publicações, para efectuar uma análise
espacio-temporal.
Adicionalmente, apresenta-se uma framework que possibilita a exploração de um vasto
conjunto de dados, com o objectivo de revelar padrões latentes, bem como analisar o interesse
nas marcas seleccionadas, usando o Twitter como fonte dados. Para o efeito, aplicou-se Topic
Modelling, uma técnica de Text Mining bastante utilizada para descobrir tópicos em texto não
estruturado. Algoritmos de Topic Modelling têm sido amplamente utilizados para monitorizar
eventos e tendências e descobrir tópicos em áreas como educação, marketing, saúde, entre outras. A framework consiste em treinar o modelo de tópicos LDA (Latent Dirichlet Allocation)
usando tweets agrupados (considerando determinado critério) e posteriormente aplicar o modelo treinado noutro conjunto de tweets agrupados (considerando outro critério). Descreve-se um
conjunto de tarefas de pré-processamento dos dados que ajudaram a melhorar o desempenho dos modelos, a obter melhor resultados e, consequentemente, a efectuar uma melhor análise. As experiências revelam que atravês de Topic Modelling e possÃvel rastrear dicussões de utilizadores
de Redes Sociais durante um longo perÃodo de tempo, e capturar alterações relacionadas com acontecimentos reais
Comparing and Combining Sentiment Analysis Methods
Several messages express opinions about events, products, and services,
political views or even their author's emotional state and mood. Sentiment
analysis has been used in several applications including analysis of the
repercussions of events in social networks, analysis of opinions about products
and services, and simply to better understand aspects of social communication
in Online Social Networks (OSNs). There are multiple methods for measuring
sentiments, including lexical-based approaches and supervised machine learning
methods. Despite the wide use and popularity of some methods, it is unclear
which method is better for identifying the polarity (i.e., positive or
negative) of a message as the current literature does not provide a method of
comparison among existing methods. Such a comparison is crucial for
understanding the potential limitations, advantages, and disadvantages of
popular methods in analyzing the content of OSNs messages. Our study aims at
filling this gap by presenting comparisons of eight popular sentiment analysis
methods in terms of coverage (i.e., the fraction of messages whose sentiment is
identified) and agreement (i.e., the fraction of identified sentiments that are
in tune with ground truth). We develop a new method that combines existing
approaches, providing the best coverage results and competitive agreement. We
also present a free Web service called iFeel, which provides an open API for
accessing and comparing results across different sentiment methods for a given
text.Comment: Proceedings of the first ACM conference on Online social networks
(2013) 27-3
Suspended accounts align with the Internet Research Agency misinformation campaign to influence the 2016 US election
Abstract The ongoing debate surrounding the impact of the Internet Research Agency’s (IRA) social media campaign during the 2016 U.S. presidential election has largely overshadowed the involvement of other actors. Our analysis brings to light a substantial group of suspended Twitter users, outnumbering the IRA user group by a factor of 60, who align with the ideologies of the IRA campaign. Our study demonstrates that this group of suspended Twitter accounts significantly influenced individuals categorized as undecided or weak supporters, potentially with the aim of swaying their opinions, as indicated by Granger causality
Social Media Text Processing and Semantic Analysis for Smart Cities
With the rise of Social Media, people obtain and share information almost
instantly on a 24/7 basis. Many research areas have tried to gain valuable
insights from these large volumes of freely available user generated content.
With the goal of extracting knowledge from social media streams that might be
useful in the context of intelligent transportation systems and smart cities,
we designed and developed a framework that provides functionalities for
parallel collection of geo-located tweets from multiple pre-defined bounding
boxes (cities or regions), including filtering of non-complying tweets, text
pre-processing for Portuguese and English language, topic modeling, and
transportation-specific text classifiers, as well as, aggregation and data
visualization.
We performed an exploratory data analysis of geo-located tweets in 5
different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and
Melbourne, comprising a total of more than 43 million tweets in a period of 3
months. Furthermore, we performed a large scale topic modelling comparison
between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are
shared between both cities which despite being in the same country are
considered very different regarding population, economy and lifestyle.
We take advantage of recent developments in word embeddings and train such
representations from the collections of geo-located tweets. We then use a
combination of bag-of-embeddings and traditional bag-of-words to train
travel-related classifiers in both Portuguese and English to filter
travel-related content from non-related. We created specific gold-standard data
to perform empirical evaluation of the resulting classifiers. Results are in
line with research work in other application areas by showing the robustness of
using word embeddings to learn word similarities that bag-of-words is not able
to capture
TrollBus, An Empirical Study Of Features For Troll Detection
No atual contexto de redes sociais, a discussão polÃtica tornou-se um evento normal. Utilizadores de todos os segmentos do espetro polÃtico têm a possibilidade de expressar as suas opiniões livremente e discutir as suas visões em várias redes sociais, incluindo o Twitter. Desde 2016, um grupo de utilizadores cujo objetivo é polarizar discussões e semear a discórdia começou a ganhar notoriedade nesta rede social. Estas contas são conhecidas como Trolls, e têm sido ligadas a vários eventos na história recente, tais como a interferência em eleições e a organização de manifestações violentas. Desde a sua descoberta, vários trabalhos de investigação têm sido realizados de modo a detetar estas contas através de machine learning.
As abordagens existentes usaram tipos diferentes de atributos. O objetivo deste trabalho é comparar esses grupos de atributos. Para tal, um estudo empÃrico foi realizado, no qual estes atributos são adaptados à comunidade portuguesa do Twitter.
O objetivo deste trabalho foi de analisar as múltiplas abordagens realizadas para a deteção de trolls, com uma descrição das suas features e a sua comparação, quer individualmente quer em grupo. Para tal, um estudo empÃrico foi realizado, em que estas features são adaptadas à comunidade portuguesa do Twitter. Os dados para este projeto foram recolhidos através do SocialBus, uma ferramenta para a recolha, processamento e armazenamento de dados de redes sociais, nomeadamente do Twitter. O conjunto de contas usado para a recolha de dados foi obtido a partir de jornalistas de polÃtica portugueses, e a anotação de trolls foi realizada através de um conjunto restrito de regras comportamentais, auxiliada por uma função de pontuação.
Um novo módulo para esta plataforma foi desenvolvido, chamado Trollbus, que realiza a deteção de trolls em tempo real. Um dataset público foi também disponibilizado.
Os atributos do melhor modelo combinam os metadados do perfil de uma conta com os aspetos superficiais presentes no seu texto. O grupo de atributos mais importantes revelou ser os aspetos numéricos dos dados, com o mais importante a revelar ser a presença de insultos polÃticos.In today's social network context, the discussion of politics online has become a normal event. Users from all sides of the political spectrum are able to express their opinions freely and discuss their views in various social networks, including Twitter. From 2016 onward, a group of users whose objective is to polarize discussions and sow discord began to gain notoriety in this social network. These accounts are known as Trolls, and they have been linked to several events in recent history such as the influencing of elections and the organizing of violent protests. Since their discovery, several approaches have been developed to detect these accounts using machine learning techniques.
Existing approaches have used different types of features. The goal of this work is to compare those different sets of features. To do so, an empirical study was performed, which adapts these features to the Portuguese Twitter community. The necessary data was collected through SocialBus, a tool for the collection, processing and storage of data from social networks, namely Twitter. The set of accounts used to collect the data were obtained from Portuguese political journalists and the labelling of trolls was performed with a strict set of behavioural rules, aided by a scoring function.
A new module for SocialBus was developed, called Trollbus, which performs troll detection in real time. A public dataset was also released.
The features of the best model obtained combine an account's profile metadata with the superficial aspects present in its text. The most important feature set noted to be the numerical aspects of the text, with the most important feature revealing to be the presence of political insults
In Quest of Significance: Identifying Types of Twitter Sentiment Events that Predict Spikes in Sales
We study the power of Twitter events to predict consumer
sales events by analysing sales for 75 companies from the retail sector
and over 150 million tweets mentioning those companies along with their
sentiment. We suggest an approach for events identification on Twitter
extending existing methodologies of event study. We also propose a robust
method for clustering Twitter events into different types based on
their shape, which captures the varying dynamics of information propagation
through the social network. We provide empirical evidence that
through events differentiation based on their shape we can clearly identify
types of Twitter events that have a more significant power to predict
spikes in sales than the aggregated Twitter signal
- …