247,074 research outputs found

    Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms

    Full text link
    In modeling social interaction online, it is important to understand when people are reacting to each other. Many systems have explicit indicators of replies, such as threading in discussion forums or replies and retweets in Twitter. However, it is likely these explicit indicators capture only part of people's reactions to each other, thus, computational social science approaches that use them to infer relationships or influence are likely to miss the mark. This paper explores the problem of detecting non-explicit responses, presenting a new approach that uses tf-idf similarity between a user's own tweets and recent tweets by people they follow. Based on a month's worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th International Conference on e-Science (e-Science

    PocketCare: Tracking the Flu with Mobile Phones using Partial Observations of Proximity and Symptoms

    Full text link
    Mobile phones provide a powerful sensing platform that researchers may adopt to understand proximity interactions among people and the diffusion, through these interactions, of diseases, behaviors, and opinions. However, it remains a challenge to track the proximity-based interactions of a whole community and then model the social diffusion of diseases and behaviors starting from the observations of a small fraction of the volunteer population. In this paper, we propose a novel approach that tries to connect together these sparse observations using a model of how individuals interact with each other and how social interactions happen in terms of a sequence of proximity interactions. We apply our approach to track the spreading of flu in the spatial-proximity network of a 3000-people university campus by mobilizing 300 volunteers from this population to monitor nearby mobile phones through Bluetooth scanning and to daily report flu symptoms about and around them. Our aim is to predict the likelihood for an individual to get flu based on how often her/his daily routine intersects with those of the volunteers. Thus, we use the daily routines of the volunteers to build a model of the volunteers as well as of the non-volunteers. Our results show that we can predict flu infection two weeks ahead of time with an average precision from 0.24 to 0.35 depending on the amount of information. This precision is six to nine times higher than with a random guess model. At the population level, we can predict infectious population in a two-week window with an r-squared value of 0.95 (a random-guess model obtains an r-squared value of 0.2). These results point to an innovative approach for tracking individuals who have interacted with people showing symptoms, allowing us to warn those in danger of infection and to inform health researchers about the progression of contact-induced diseases

    SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

    Full text link
    Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.Comment: 10 pages, published in KDD 201

    Prediction of Human Trajectory Following a Haptic Robotic Guide Using Recurrent Neural Networks

    Full text link
    Social intelligence is an important requirement for enabling robots to collaborate with people. In particular, human path prediction is an essential capability for robots in that it prevents potential collision with a human and allows the robot to safely make larger movements. In this paper, we present a method for predicting the trajectory of a human who follows a haptic robotic guide without using sight, which is valuable for assistive robots that aid the visually impaired. We apply a deep learning method based on recurrent neural networks using multimodal data: (1) human trajectory, (2) movement of the robotic guide, (3) haptic input data measured from the physical interaction between the human and the robot, (4) human depth data. We collected actual human trajectory and multimodal response data through indoor experiments. Our model outperformed the baseline result while using only the robot data with the observed human trajectory, and it shows even better results when using additional haptic and depth data.Comment: 6 pages, Submitted to IEEE World Haptics Conference 201

    Improving customer churn prediction by data augmentation using pictorial stimulus-choice data

    Get PDF
    The purpose of this paper is to determine the added value of pictorial stimulus-choice data in customer churn prediction. Using Random Forests and 5 times 2 fold cross-validation, this study analyzes how much pictorial stimulus choice data and survey data increase the AUC of a churn model over and above administrative, operational and complaints data. The finding is that pictorial-stimulus choice data significantly increases AUC of models with administrative and operational data. The practical implication of this finding is that companies should start considering mining pictorial data from social media sites (e.g. Pinterest), in order to augment their internal customer database. This study is original in that it is the first that assesses the added value of pictorial stimulus-choice data in predictive models. This is important because more and more social media websites are focusing on pictures

    Information is not a Virus, and Other Consequences of Human Cognitive Limits

    Full text link
    The many decisions people make about what to pay attention to online shape the spread of information in online social networks. Due to the constraints of available time and cognitive resources, the ease of discovery strongly impacts how people allocate their attention to social media content. As a consequence, the position of information in an individual's social feed, as well as explicit social signals about its popularity, determine whether it will be seen, and the likelihood that it will be shared with followers. Accounting for these cognitive limits simplifies mechanics of information diffusion in online social networks and explains puzzling empirical observations: (i) information generally fails to spread in social media and (ii) highly connected people are less likely to re-share information. Studies of information diffusion on different social media platforms reviewed here suggest that the interplay between human cognitive limits and network structure differentiates the spread of information from other social contagions, such as the spread of a virus through a population.Comment: accepted for publication in Future Interne

    Validation of Twitter opinion trends with national polling aggregates: Hillary Clinton vs Donald Trump

    Full text link
    Measuring and forecasting opinion trends from real-time social media is a long-standing goal of big-data analytics. Despite its importance, there has been no conclusive scientific evidence so far that social media activity can capture the opinion of the general population. Here we develop a method to infer the opinion of Twitter users regarding the candidates of the 2016 US Presidential Election by using a combination of statistical physics of complex networks and machine learning based on hashtags co-occurrence to develop an in-domain training set approaching 1 million tweets. We investigate the social networks formed by the interactions among millions of Twitter users and infer the support of each user to the presidential candidates. The resulting Twitter trends follow the New York Times National Polling Average, which represents an aggregate of hundreds of independent traditional polls, with remarkable accuracy. Moreover, the Twitter opinion trend precedes the aggregated NYT polls by 10 days, showing that Twitter can be an early signal of global opinion trends. Our analytics unleash the power of Twitter to uncover social trends from elections, brands to political movements, and at a fraction of the cost of national polls
    corecore