44,709 research outputs found

    Real-Time Classification of Twitter Trends

    Get PDF
    Social media users give rise to social trends as they share about common interests, which can be triggered by different reasons. In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following four types: 'news', 'ongoing events', 'memes', and 'commemoratives'. While previous research has analyzed trending topics in a long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This would allow to provide a filtered subset of trends to end users. We analyze and experiment with a set of straightforward language-independent features based on the social spread of trends to categorize them into the introduced typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might enrich marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.Comment: Pre-print of article accepted for publication in Journal of the American Society for Information Science and Technology copyright @ 2013 (American Society for Information Science and Technology

    OMG U got flu? Analysis of shared health messages for bio-surveillance

    Get PDF
    Background: Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\'e (2009)'s behaviour response survey. Applying the guidelines to a corpus of 5283 Twitter messages related to influenza like illness showed a high level of inter-annotator agreement (kappa 0.86). We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a self-reported diagnosis. In addition to classification performance we report moderately strong Spearman's Rho correlation by comparing classifier output against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season. Conclusions: The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks. We believe that the signals we have modelled may be applicable to a wide range of diseases

    Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump

    Full text link
    Measuring and forecasting opinion trends from real-time social media is a long-standing goal of big-data analytics. Despite its importance, there has been no conclusive scientific evidence so far that social media activity can capture the opinion of the general population. Here we develop a method to infer the opinion of Twitter users regarding the candidates of the 2016 US Presidential Election by using a combination of statistical physics of complex networks and machine learning based on hashtags co-occurrence to develop an in-domain training set approaching 1 million tweets. We investigate the social networks formed by the interactions among millions of Twitter users and infer the support of each user to the presidential candidates. The resulting Twitter trends follow the New York Times National Polling Average, which represents an aggregate of hundreds of independent traditional polls, with remarkable accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls by 10 days, showing that Twitter can be an early signal of global opinion trends. Our analytics unleash the power of Twitter to uncover social trends from elections, brands to political movements, and at a fraction of the cost of national polls

    Crowdsourced real-world sensing: sentiment analysis and the real-time web

    Get PDF
    The advent of the real-time web is proving both challeng- ing and at the same time disruptive for a number of areas of research, notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis and discusses the motivations and challenges behind such a direction

    A Latent Source Model for Nonparametric Time Series Classification

    Full text link
    For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
    corecore