64 research outputs found

    On predictability of rare events leveraging social media: a machine learning perspective

    Full text link
    Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be effectively leveraged are still unverified at least in a systematic way. It is also unclear how social-media-based predictions compare to those based on alternative information sources. To address these issues, here we develop a machine learning framework that leverages social media streams to automatically identify and predict the outcomes of soccer matches. We focus in particular on matches in which at least one of the possible outcomes is deemed as highly unlikely by professional bookmakers. We argue that sport events offer a systematic approach for testing the predictive power of social media, and allow to compare such power against the rigorous baselines set by external sources. Despite such strict baselines, our framework yields above 8% marginal profit when used to inform simple betting strategies. The system is based on real-time sentiment analysis and exploits data collected immediately before the games, allowing for informed bets. We discuss the rationale behind our approach, describe the learning framework, its prediction performance and the return it provides as compared to a set of betting strategies. To test our framework we use both historical Twitter data from the 2014 FIFA World Cup games, and real-time Twitter data collected by monitoring the conversations about all soccer matches of four major European tournaments (FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.Comment: 10 pages, 10 tables, 8 figure

    On the influence of social bots in online protests. Preliminary findings of a Mexican case study

    Full text link
    Social bots can affect online communication among humans. We study this phenomenon by focusing on #YaMeCanse, the most active protest hashtag in the history of Twitter in Mexico. Accounts using the hashtag are classified using the BotOrNot bot detection tool. Our preliminary analysis suggests that bots played a critical role in disrupting online communication about the protest movement.Comment: 10 page

    Assessing candidate preference through web browsing history

    Full text link
    Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of challenges that must be overcome to effectively use Web browsing for assessing candidate preference—including the lack of suitable ground truth data and the heterogeneity of user populations in time and space. We address these challenges, and show that the resulting methods can shed considerable light on the dynamics of voters’ candidate preferences in ways that are difficult to achieve using polls.Accepted manuscrip

    Beating the news using social media: the case study of American Idol

    Get PDF
    We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. This event can be considered as basic test in a simplified environment to assess the predictive power of Twitter signals. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it correlates with the contestants ranking and allows the anticipation of the voting outcome. Twitter data from the show and the voting period of the season finale have been analyzed to attempt the winner prediction ahead of the airing of the official result. We also show that the fraction of tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. The geolocalized data are crucial for the correct prediction of the final outcome of the show, pointing out the importance of considering information beyond the aggregated Twitter signal. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events

    Trump vs. Hillary: What went Viral during the 2016 US Presidential Election

    Get PDF
    In this paper, we present quantitative and qualitative analysis of the top retweeted tweets (viral tweets) pertaining to the US presidential elections from September 1, 2016 to Election Day on November 8, 2016. For everyday, we tagged the top 50 most retweeted tweets as supporting or attacking either candidate or as neutral/irrelevant. Then we analyzed the tweets in each class for: general trends and statistics; the most frequently used hashtags, terms, and locations; the most retweeted accounts and tweets; and the most shared news and links. In all we analyzed the 3,450 most viral tweets that grabbed the most attention during the US election and were retweeted in total 26.3 million times accounting over 40% of the total tweet volume pertaining to the US election in the aforementioned period. Our analysis of the tweets highlights some of the differences between the social media strategies of both candidates, the penetration of their messages, and the potential effect of attacks on bothComment: Paper to appear in Springer SocInfo 201

    Twitter-based analysis of the dynamics of collective attention to political parties

    Get PDF
    Large-scale data from social media have a significant potential to describe complex phenomena in real world and to anticipate collective behaviors such as information spreading and social trends. One specific case of study is represented by the collective attention to the action of political parties. Not surprisingly, researchers and stakeholders tried to correlate parties' presence on social media with their performances in elections. Despite the many efforts, results are still inconclusive since this kind of data is often very noisy and significant signals could be covered by (largely unknown) statistical fluctuations. In this paper we consider the number of tweets (tweet volume) of a party as a proxy of collective attention to the party, identify the dynamics of the volume, and show that this quantity has some information on the elections outcome. We find that the distribution of the tweet volume for each party follows a log-normal distribution with a positive autocorrelation of the volume over short terms, which indicates the volume has large fluctuations of the log-normal distribution yet with a short-term tendency. Furthermore, by measuring the ratio of two consecutive daily tweet volumes, we find that the evolution of the daily volume of a party can be described by means of a geometric Brownian motion (i.e., the logarithm of the volume moves randomly with a trend). Finally, we determine the optimal period of averaging tweet volume for reducing fluctuations and extracting short-term tendencies. We conclude that the tweet volume is a good indicator of parties' success in the elections when considered over an optimal time window. Our study identifies the statistical nature of collective attention to political issues and sheds light on how to model the dynamics of collective attention in social media.Comment: 16 pages, 7 figures, 3 tables. Published in PLoS ON

    A meta-analysis of state-of-the-art electoral prediction from Twitter data

    Full text link
    Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) depict a roadmap to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been rather exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can replace traditional polls. Finally, future lines of research along with a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table

    Testing Propositions Derived from Twitter Studies: Generalization and Replication in Computational Social Science

    Get PDF
    Replication is an essential requirement for scientific discovery. The current study aims to generalize and replicate 10 propositions made in previous Twitter studies using a representative dataset. Our findings suggest 6 out of 10 propositions could not be replicated due to the variations of data collection, analytic strategies employed, and inconsistent measurements. The study’s contributions are twofold: First, it systematically summarized and assessed some important claims in the field, which can inform future studies. Second, it proposed a feasible approach to generating a random sample of Twitter users and its associated ego networks, which might serve as a solution for answering social-scientific questions at the individual level without accessing the complete data archive.published_or_final_versio

    Leaders in Social Networks, the Delicious Case

    Get PDF
    Finding pertinent information is not limited to search engines. Online communities can amplify the influence of a small number of power users for the benefit of all other users. Users' information foraging in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, users subscribe to leaders' collection which lead to a deeper and wider reach not achievable with search engines. To consolidate such collective search, it is essential to utilize the leadership topology and identify influential users. Google's PageRank, as a successful search algorithm in the World Wide Web, turns out to be less effective in networks of people. We thus devise an adaptive and parameter-free algorithm, the LeaderRank, to quantify user influence. We show that LeaderRank outperforms PageRank in terms of ranking effectiveness, as well as robustness against manipulations and noisy data. These results suggest that leaders who are aware of their clout may reinforce the development of social networks, and thus the power of collective search
    • …
    corecore