3,824 research outputs found
Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump
Measuring and forecasting opinion trends from real-time social media is a
long-standing goal of big-data analytics. Despite its importance, there has
been no conclusive scientific evidence so far that social media activity can
capture the opinion of the general population. Here we develop a method to
infer the opinion of Twitter users regarding the candidates of the 2016 US
Presidential Election by using a combination of statistical physics of complex
networks and machine learning based on hashtags co-occurrence to develop an
in-domain training set approaching 1 million tweets. We investigate the social
networks formed by the interactions among millions of Twitter users and infer
the support of each user to the presidential candidates. The resulting Twitter
trends follow the New York Times National Polling Average, which represents an
aggregate of hundreds of independent traditional polls, with remarkable
accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls
by 10 days, showing that Twitter can be an early signal of global opinion
trends. Our analytics unleash the power of Twitter to uncover social trends
from elections, brands to political movements, and at a fraction of the cost of
national polls
A meta-analysis of state-of-the-art electoral prediction from Twitter data
Electoral prediction from Twitter data is an appealing research topic. It
seems relatively straightforward and the prevailing view is overly optimistic.
This is problematic because while simple approaches are assumed to be good
enough, core problems are not addressed. Thus, this paper aims to (1) provide a
balanced and critical review of the state of the art; (2) cast light on the
presume predictive power of Twitter data; and (3) depict a roadmap to push
forward the field. Hence, a scheme to characterize Twitter prediction methods
is proposed. It covers every aspect from data collection to performance
evaluation, through data processing and vote inference. Using that scheme,
prior research is analyzed and organized to explain the main approaches taken
up to date but also their weaknesses. This is the first meta-analysis of the
whole body of research regarding electoral prediction from Twitter data. It
reveals that its presumed predictive power regarding electoral prediction has
been rather exaggerated: although social media may provide a glimpse on
electoral outcomes current research does not provide strong evidence to support
it can replace traditional polls. Finally, future lines of research along with
a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table
Identifying Purpose Behind Electoral Tweets
Tweets pertaining to a single event, such as a national election, can number
in the hundreds of millions. Automatically analyzing them is beneficial in many
downstream natural language applications such as question answering and
summarization. In this paper, we propose a new task: identifying the purpose
behind electoral tweets--why do people post election-oriented tweets? We show
that identifying purpose is correlated with the related phenomenon of sentiment
and emotion detection, but yet significantly different. Detecting purpose has a
number of applications including detecting the mood of the electorate,
estimating the popularity of policies, identifying key issues of contention,
and predicting the course of events. We create a large dataset of electoral
tweets and annotate a few thousand tweets for purpose. We develop a system that
automatically classifies electoral tweets as per their purpose, obtaining an
accuracy of 43.56% on an 11-class task and an accuracy of 73.91% on a 3-class
task (both accuracies well above the most-frequent-class baseline). Finally, we
show that resources developed for emotion detection are also helpful for
detecting purpose
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Organized Behavior Classification of Tweet Sets using Supervised Learning Methods
During the 2016 US elections Twitter experienced unprecedented levels of
propaganda and fake news through the collaboration of bots and hired persons,
the ramifications of which are still being debated. This work proposes an
approach to identify the presence of organized behavior in tweets. The Random
Forest, Support Vector Machine, and Logistic Regression algorithms are each
used to train a model with a data set of 850 records consisting of 299 features
extracted from tweets gathered during the 2016 US presidential election. The
features represent user and temporal synchronization characteristics to capture
coordinated behavior. These models are trained to classify tweet sets among the
categories: organic vs organized, political vs non-political, and pro-Trump vs
pro-Hillary vs neither. The random forest algorithm performs better with
greater than 95% average accuracy and f-measure scores for each category. The
most valuable features for classification are identified as user based
features, with media use and marking tweets as favorite to be the most
dominant.Comment: 51 pages, 5 figure
Analyzing the Digital Traces of Political Manipulation: The 2016 Russian Interference Twitter Campaign
Until recently, social media was seen to promote democratic discourse on
social and political issues. However, this powerful communication platform has
come under scrutiny for allowing hostile actors to exploit online discussions
in an attempt to manipulate public opinion. A case in point is the ongoing U.S.
Congress' investigation of Russian interference in the 2016 U.S. election
campaign, with Russia accused of using trolls (malicious accounts created to
manipulate) and bots to spread misinformation and politically biased
information. In this study, we explore the effects of this manipulation
campaign, taking a closer look at users who re-shared the posts produced on
Twitter by the Russian troll accounts publicly disclosed by U.S. Congress
investigation. We collected a dataset with over 43 million election-related
posts shared on Twitter between September 16 and October 21, 2016, by about 5.7
million distinct users. This dataset included accounts associated with the
identified Russian trolls. We use label propagation to infer the ideology of
all users based on the news sources they shared. This method enables us to
classify a large number of users as liberal or conservative with precision and
recall above 90%. Conservatives retweeted Russian trolls about 31 times more
often than liberals and produced 36x more tweets. Additionally, most retweets
of troll content originated from two Southern states: Tennessee and Texas.
Using state-of-the-art bot detection techniques, we estimated that about 4.9%
and 6.2% of liberal and conservative users respectively were bots. Text
analysis on the content shared by trolls reveals that they had a mostly
conservative, pro-Trump agenda. Although an ideologically broad swath of
Twitter users was exposed to Russian Trolls in the period leading up to the
2016 U.S. Presidential election, it was mainly conservatives who helped amplify
their message
- …