247,074 research outputs found
Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms
In modeling social interaction online, it is important to understand when
people are reacting to each other. Many systems have explicit indicators of
replies, such as threading in discussion forums or replies and retweets in
Twitter. However, it is likely these explicit indicators capture only part of
people's reactions to each other, thus, computational social science approaches
that use them to infer relationships or influence are likely to miss the mark.
This paper explores the problem of detecting non-explicit responses, presenting
a new approach that uses tf-idf similarity between a user's own tweets and
recent tweets by people they follow. Based on a month's worth of posting data
from 449 ego networks in Twitter, this method demonstrates that it is likely
that at least 11% of reactions are not captured by the explicit reply and
retweet mechanisms. Further, these uncaptured reactions are not evenly
distributed between users: some users, who create replies and retweets without
using the official interface mechanisms, are much more responsive to followees
than they appear. This suggests that detecting non-explicit responses is an
important consideration in mitigating biases and building more accurate models
when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th
International Conference on e-Science (e-Science
PocketCare: Tracking the Flu with Mobile Phones using Partial Observations of Proximity and Symptoms
Mobile phones provide a powerful sensing platform that researchers may adopt
to understand proximity interactions among people and the diffusion, through
these interactions, of diseases, behaviors, and opinions. However, it remains a
challenge to track the proximity-based interactions of a whole community and
then model the social diffusion of diseases and behaviors starting from the
observations of a small fraction of the volunteer population. In this paper, we
propose a novel approach that tries to connect together these sparse
observations using a model of how individuals interact with each other and how
social interactions happen in terms of a sequence of proximity interactions. We
apply our approach to track the spreading of flu in the spatial-proximity
network of a 3000-people university campus by mobilizing 300 volunteers from
this population to monitor nearby mobile phones through Bluetooth scanning and
to daily report flu symptoms about and around them. Our aim is to predict the
likelihood for an individual to get flu based on how often her/his daily
routine intersects with those of the volunteers. Thus, we use the daily
routines of the volunteers to build a model of the volunteers as well as of the
non-volunteers. Our results show that we can predict flu infection two weeks
ahead of time with an average precision from 0.24 to 0.35 depending on the
amount of information. This precision is six to nine times higher than with a
random guess model. At the population level, we can predict infectious
population in a two-week window with an r-squared value of 0.95 (a random-guess
model obtains an r-squared value of 0.2). These results point to an innovative
approach for tracking individuals who have interacted with people showing
symptoms, allowing us to warn those in danger of infection and to inform health
researchers about the progression of contact-induced diseases
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity
Social networking websites allow users to create and share content. Big
information cascades of post resharing can form as users of these sites reshare
others' posts with their friends and followers. One of the central challenges
in understanding such cascading behaviors is in forecasting information
outbreaks, where a single post becomes widely popular by being reshared by many
users. In this paper, we focus on predicting the final number of reshares of a
given post. We build on the theory of self-exciting point processes to develop
a statistical model that allows us to make accurate predictions. Our model
requires no training or expensive feature engineering. It results in a simple
and efficiently computable formula that allows us to answer questions, in
real-time, such as: Given a post's resharing history so far, what is our
current estimate of its final number of reshares? Is the post resharing cascade
past the initial stage of explosive growth? And, which posts will be the most
reshared in the future? We validate our model using one month of complete
Twitter data and demonstrate a strong improvement in predictive accuracy over
existing approaches. Our model gives only 15% relative error in predicting
final size of an average information cascade after observing it for just one
hour.Comment: 10 pages, published in KDD 201
Prediction of Human Trajectory Following a Haptic Robotic Guide Using Recurrent Neural Networks
Social intelligence is an important requirement for enabling robots to
collaborate with people. In particular, human path prediction is an essential
capability for robots in that it prevents potential collision with a human and
allows the robot to safely make larger movements. In this paper, we present a
method for predicting the trajectory of a human who follows a haptic robotic
guide without using sight, which is valuable for assistive robots that aid the
visually impaired. We apply a deep learning method based on recurrent neural
networks using multimodal data: (1) human trajectory, (2) movement of the
robotic guide, (3) haptic input data measured from the physical interaction
between the human and the robot, (4) human depth data. We collected actual
human trajectory and multimodal response data through indoor experiments. Our
model outperformed the baseline result while using only the robot data with the
observed human trajectory, and it shows even better results when using
additional haptic and depth data.Comment: 6 pages, Submitted to IEEE World Haptics Conference 201
Improving customer churn prediction by data augmentation using pictorial stimulus-choice data
The purpose of this paper is to determine the added value of pictorial stimulus-choice data in customer churn prediction. Using Random Forests and 5 times 2 fold cross-validation, this study analyzes how much pictorial stimulus choice data and survey data increase the AUC of a churn model over and above administrative, operational and complaints data. The finding is that pictorial-stimulus choice data significantly increases AUC of models with administrative and operational data. The practical implication of this finding is that companies should start considering mining pictorial data from social media sites (e.g. Pinterest), in order to augment their internal customer database. This study is original in that it is the first that assesses the added value of pictorial stimulus-choice data in predictive models. This is important because more and more social media websites are focusing on pictures
Information is not a Virus, and Other Consequences of Human Cognitive Limits
The many decisions people make about what to pay attention to online shape
the spread of information in online social networks. Due to the constraints of
available time and cognitive resources, the ease of discovery strongly impacts
how people allocate their attention to social media content. As a consequence,
the position of information in an individual's social feed, as well as explicit
social signals about its popularity, determine whether it will be seen, and the
likelihood that it will be shared with followers. Accounting for these
cognitive limits simplifies mechanics of information diffusion in online social
networks and explains puzzling empirical observations: (i) information
generally fails to spread in social media and (ii) highly connected people are
less likely to re-share information. Studies of information diffusion on
different social media platforms reviewed here suggest that the interplay
between human cognitive limits and network structure differentiates the spread
of information from other social contagions, such as the spread of a virus
through a population.Comment: accepted for publication in Future Interne
Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump
Measuring and forecasting opinion trends from real-time social media is a
long-standing goal of big-data analytics. Despite its importance, there has
been no conclusive scientific evidence so far that social media activity can
capture the opinion of the general population. Here we develop a method to
infer the opinion of Twitter users regarding the candidates of the 2016 US
Presidential Election by using a combination of statistical physics of complex
networks and machine learning based on hashtags co-occurrence to develop an
in-domain training set approaching 1 million tweets. We investigate the social
networks formed by the interactions among millions of Twitter users and infer
the support of each user to the presidential candidates. The resulting Twitter
trends follow the New York Times National Polling Average, which represents an
aggregate of hundreds of independent traditional polls, with remarkable
accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls
by 10 days, showing that Twitter can be an early signal of global opinion
trends. Our analytics unleash the power of Twitter to uncover social trends
from elections, brands to political movements, and at a fraction of the cost of
national polls
- …