44,709 research outputs found
Real-Time Classification of Twitter Trends
Social media users give rise to social trends as they share about common
interests, which can be triggered by different reasons. In this work, we
explore the types of triggers that spark trends on Twitter, introducing a
typology with following four types: 'news', 'ongoing events', 'memes', and
'commemoratives'. While previous research has analyzed trending topics in a
long term, we look at the earliest tweets that produce a trend, with the aim of
categorizing trends early on. This would allow to provide a filtered subset of
trends to end users. We analyze and experiment with a set of straightforward
language-independent features based on the social spread of trends to
categorize them into the introduced typology. Our method provides an efficient
way to accurately categorize trending topics without need of external data,
enabling news organizations to discover breaking news in real-time, or to
quickly identify viral memes that might enrich marketing decisions, among
others. The analysis of social features also reveals patterns associated with
each type of trend, such as tweets about ongoing events being shorter as many
were likely sent from mobile devices, or memes having more retweets originating
from a few trend-setters.Comment: Pre-print of article accepted for publication in Journal of the
American Society for Information Science and Technology copyright @ 2013
(American Society for Information Science and Technology
OMG U got flu? Analysis of shared health messages for bio-surveillance
Background: Micro-blogging services such as Twitter offer the potential to
crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often
ambiguous and reactive to media trends. In order to ground user messages in
epidemic response we focused on tracking reports of self-protective behaviour
such as avoiding public gatherings or increased sanitation as the basis for
further risk analysis. Results: We created guidelines for tagging self
protective behaviour based on Jones and Salath\'e (2009)'s behaviour response
survey. Applying the guidelines to a corpus of 5283 Twitter messages related to
influenza like illness showed a high level of inter-annotator agreement (kappa
0.86). We employed supervised learning using unigrams, bigrams and regular
expressions as features with two supervised classifiers (SVM and Naive Bayes)
to classify tweets into 4 self-reported protective behaviour categories plus a
self-reported diagnosis. In addition to classification performance we report
moderately strong Spearman's Rho correlation by comparing classifier output
against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010
influenza season. Conclusions: The study adds to evidence supporting a high
degree of correlation between pre-diagnostic social media signals and
diagnostic influenza case data, pointing the way towards low cost sensor
networks. We believe that the signals we have modelled may be applicable to a
wide range of diseases
Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump
Measuring and forecasting opinion trends from real-time social media is a
long-standing goal of big-data analytics. Despite its importance, there has
been no conclusive scientific evidence so far that social media activity can
capture the opinion of the general population. Here we develop a method to
infer the opinion of Twitter users regarding the candidates of the 2016 US
Presidential Election by using a combination of statistical physics of complex
networks and machine learning based on hashtags co-occurrence to develop an
in-domain training set approaching 1 million tweets. We investigate the social
networks formed by the interactions among millions of Twitter users and infer
the support of each user to the presidential candidates. The resulting Twitter
trends follow the New York Times National Polling Average, which represents an
aggregate of hundreds of independent traditional polls, with remarkable
accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls
by 10 days, showing that Twitter can be an early signal of global opinion
trends. Our analytics unleash the power of Twitter to uncover social trends
from elections, brands to political movements, and at a fraction of the cost of
national polls
Crowdsourced real-world sensing: sentiment analysis and the real-time web
The advent of the real-time web is proving both challeng-
ing and at the same time disruptive for a number of areas of research,
notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis
and discusses the motivations and challenges behind such a direction
A Latent Source Model for Nonparametric Time Series Classification
For classifying time series, a nearest-neighbor approach is widely used in
practice with performance often competitive with or better than more elaborate
methods such as neural networks, decision trees, and support vector machines.
We develop theoretical justification for the effectiveness of
nearest-neighbor-like classification of time series. Our guiding hypothesis is
that in many applications, such as forecasting which topics will become trends
on Twitter, there aren't actually that many prototypical time series to begin
with, relative to the number of time series we have access to, e.g., topics
become trends on Twitter only in a few distinct manners whereas we can collect
massive amounts of Twitter data. To operationalize this hypothesis, we propose
a latent source model for time series, which naturally leads to a "weighted
majority voting" classification rule that can be approximated by a
nearest-neighbor classifier. We establish nonasymptotic performance guarantees
of both weighted majority voting and nearest-neighbor classification under our
model accounting for how much of the time series we observe and the model
complexity. Experimental results on synthetic data show weighted majority
voting achieving the same misclassification rate as nearest-neighbor
classification while observing less of the time series. We then use weighted
majority to forecast which news topics on Twitter become trends, where we are
able to detect such "trending topics" in advance of Twitter 79% of the time,
with a mean early advantage of 1 hour and 26 minutes, a true positive rate of
95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
- …