64 research outputs found
On predictability of rare events leveraging social media: a machine learning perspective
Information extracted from social media streams has been leveraged to
forecast the outcome of a large number of real-world events, from political
elections to stock market fluctuations. An increasing amount of studies
demonstrates how the analysis of social media conversations provides cheap
access to the wisdom of the crowd. However, extents and contexts in which such
forecasting power can be effectively leveraged are still unverified at least in
a systematic way. It is also unclear how social-media-based predictions compare
to those based on alternative information sources. To address these issues,
here we develop a machine learning framework that leverages social media
streams to automatically identify and predict the outcomes of soccer matches.
We focus in particular on matches in which at least one of the possible
outcomes is deemed as highly unlikely by professional bookmakers. We argue that
sport events offer a systematic approach for testing the predictive power of
social media, and allow to compare such power against the rigorous baselines
set by external sources. Despite such strict baselines, our framework yields
above 8% marginal profit when used to inform simple betting strategies. The
system is based on real-time sentiment analysis and exploits data collected
immediately before the games, allowing for informed bets. We discuss the
rationale behind our approach, describe the learning framework, its prediction
performance and the return it provides as compared to a set of betting
strategies. To test our framework we use both historical Twitter data from the
2014 FIFA World Cup games, and real-time Twitter data collected by monitoring
the conversations about all soccer matches of four major European tournaments
(FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA
Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.Comment: 10 pages, 10 tables, 8 figure
On the influence of social bots in online protests. Preliminary findings of a Mexican case study
Social bots can affect online communication among humans. We study this
phenomenon by focusing on #YaMeCanse, the most active protest hashtag in the
history of Twitter in Mexico. Accounts using the hashtag are classified using
the BotOrNot bot detection tool. Our preliminary analysis suggests that bots
played a critical role in disrupting online communication about the protest
movement.Comment: 10 page
Assessing candidate preference through web browsing history
Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of challenges that must be overcome to effectively use Web browsing for assessing candidate preferenceâincluding the lack of suitable ground truth data and the heterogeneity of user populations in time and space. We address these challenges, and show that the resulting methods can shed considerable light on the dynamics of votersâ candidate preferences in ways that are difficult to achieve using polls.Accepted manuscrip
Beating the news using social media: the case study of American Idol
We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. This event can be considered as basic test in a simplified environment to assess the predictive power of Twitter signals. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it correlates with the contestants ranking and allows the anticipation of the voting outcome. Twitter data from the show and the voting period of the season finale have been analyzed to attempt the winner prediction ahead of the airing of the official result. We also show that the fraction of tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. The geolocalized data are crucial for the correct prediction of the final outcome of the show, pointing out the importance of considering information beyond the aggregated Twitter signal. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events
Trump vs. Hillary: What went Viral during the 2016 US Presidential Election
In this paper, we present quantitative and qualitative analysis of the top
retweeted tweets (viral tweets) pertaining to the US presidential elections
from September 1, 2016 to Election Day on November 8, 2016. For everyday, we
tagged the top 50 most retweeted tweets as supporting or attacking either
candidate or as neutral/irrelevant. Then we analyzed the tweets in each class
for: general trends and statistics; the most frequently used hashtags, terms,
and locations; the most retweeted accounts and tweets; and the most shared news
and links. In all we analyzed the 3,450 most viral tweets that grabbed the most
attention during the US election and were retweeted in total 26.3 million times
accounting over 40% of the total tweet volume pertaining to the US election in
the aforementioned period. Our analysis of the tweets highlights some of the
differences between the social media strategies of both candidates, the
penetration of their messages, and the potential effect of attacks on bothComment: Paper to appear in Springer SocInfo 201
Twitter-based analysis of the dynamics of collective attention to political parties
Large-scale data from social media have a significant potential to describe
complex phenomena in real world and to anticipate collective behaviors such as
information spreading and social trends. One specific case of study is
represented by the collective attention to the action of political parties. Not
surprisingly, researchers and stakeholders tried to correlate parties' presence
on social media with their performances in elections. Despite the many efforts,
results are still inconclusive since this kind of data is often very noisy and
significant signals could be covered by (largely unknown) statistical
fluctuations. In this paper we consider the number of tweets (tweet volume) of
a party as a proxy of collective attention to the party, identify the dynamics
of the volume, and show that this quantity has some information on the
elections outcome. We find that the distribution of the tweet volume for each
party follows a log-normal distribution with a positive autocorrelation of the
volume over short terms, which indicates the volume has large fluctuations of
the log-normal distribution yet with a short-term tendency. Furthermore, by
measuring the ratio of two consecutive daily tweet volumes, we find that the
evolution of the daily volume of a party can be described by means of a
geometric Brownian motion (i.e., the logarithm of the volume moves randomly
with a trend). Finally, we determine the optimal period of averaging tweet
volume for reducing fluctuations and extracting short-term tendencies. We
conclude that the tweet volume is a good indicator of parties' success in the
elections when considered over an optimal time window. Our study identifies the
statistical nature of collective attention to political issues and sheds light
on how to model the dynamics of collective attention in social media.Comment: 16 pages, 7 figures, 3 tables. Published in PLoS ON
A meta-analysis of state-of-the-art electoral prediction from Twitter data
Electoral prediction from Twitter data is an appealing research topic. It
seems relatively straightforward and the prevailing view is overly optimistic.
This is problematic because while simple approaches are assumed to be good
enough, core problems are not addressed. Thus, this paper aims to (1) provide a
balanced and critical review of the state of the art; (2) cast light on the
presume predictive power of Twitter data; and (3) depict a roadmap to push
forward the field. Hence, a scheme to characterize Twitter prediction methods
is proposed. It covers every aspect from data collection to performance
evaluation, through data processing and vote inference. Using that scheme,
prior research is analyzed and organized to explain the main approaches taken
up to date but also their weaknesses. This is the first meta-analysis of the
whole body of research regarding electoral prediction from Twitter data. It
reveals that its presumed predictive power regarding electoral prediction has
been rather exaggerated: although social media may provide a glimpse on
electoral outcomes current research does not provide strong evidence to support
it can replace traditional polls. Finally, future lines of research along with
a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table
Testing Propositions Derived from Twitter Studies: Generalization and Replication in Computational Social Science
Replication is an essential requirement for scientific discovery. The current study aims to generalize and replicate 10 propositions made in previous Twitter studies using a representative dataset. Our findings suggest 6 out of 10 propositions could not be replicated due to the variations of data collection, analytic strategies employed, and inconsistent measurements. The studyâs contributions are twofold: First, it systematically summarized and assessed some important claims in the field, which can inform future studies. Second, it proposed a feasible approach to generating a random sample of Twitter users and its associated ego networks, which might serve as a solution for answering social-scientific questions at the individual level without accessing the complete data archive.published_or_final_versio
Leaders in Social Networks, the Delicious Case
Finding pertinent information is not limited to search engines. Online communities can amplify the influence of a small number of power users for the benefit of all other users. Users' information foraging in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, users subscribe to leaders' collection which lead to a deeper and wider reach not achievable with search engines. To consolidate such collective search, it is essential to utilize the leadership topology and identify influential users. Google's PageRank, as a successful search algorithm in the World Wide Web, turns out to be less effective in networks of people. We thus devise an adaptive and parameter-free algorithm, the LeaderRank, to quantify user influence. We show that LeaderRank outperforms PageRank in terms of ranking effectiveness, as well as robustness against manipulations and noisy data. These results suggest that leaders who are aware of their clout may reinforce the development of social networks, and thus the power of collective search
- âŚ