8,236 research outputs found
Can Cascades be Predicted?
On many social networking web sites such as Facebook and Twitter, resharing
or reposting functionality allows users to share others' content with their own
friends or followers. As content is reshared from user to user, large cascades
of reshares can form. While a growing body of research has focused on analyzing
and characterizing such cascades, a recent, parallel line of work has argued
that the future trajectory of a cascade may be inherently unpredictable. In
this work, we develop a framework for addressing cascade prediction problems.
On a large sample of photo reshare cascades on Facebook, we find strong
performance in predicting whether a cascade will continue to grow in the
future. We find that the relative growth of a cascade becomes more predictable
as we observe more of its reshares, that temporal and structural features are
key predictors of cascade size, and that initially, breadth, rather than depth
in a cascade is a better indicator of larger cascades. This prediction
performance is robust in the sense that multiple distinct classes of features
all achieve similar performance. We also discover that temporal features are
predictive of a cascade's eventual shape. Observing independent cascades of the
same content, we find that while these cascades differ greatly in size, we are
still able to predict which ends up the largest
360 Quantified Self
Wearable devices with a wide range of sensors have contributed to the rise of
the Quantified Self movement, where individuals log everything ranging from the
number of steps they have taken, to their heart rate, to their sleeping
patterns. Sensors do not, however, typically sense the social and ambient
environment of the users, such as general life style attributes or information
about their social network. This means that the users themselves, and the
medical practitioners, privy to the wearable sensor data, only have a narrow
view of the individual, limited mainly to certain aspects of their physical
condition.
In this paper we describe a number of use cases for how social media can be
used to complement the check-up data and those from sensors to gain a more
holistic view on individuals' health, a perspective we call the 360 Quantified
Self. Health-related information can be obtained from sources as diverse as
food photo sharing, location check-ins, or profile pictures. Additionally,
information from a person's ego network can shed light on the social dimension
of wellbeing which is widely acknowledged to be of utmost importance, even
though they are currently rarely used for medical diagnosis. We articulate a
long-term vision describing the desirable list of technical advances and
variety of data to achieve an integrated system encompassing Electronic Health
Records (EHR), data from wearable devices, alongside information derived from
social media data.Comment: QCRI Technical Repor
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
- …