9,336 research outputs found
A survey of location inference techniques on Twitter
The increasing popularity of the social networking service, Twitter, has made it more involved in day-to-day communications, strengthening social relationships and information dissemination. Conversations on Twitter are now being explored as indicators within early warning systems to alert of imminent natural disasters such as earthquakes and aid prompt emergency responses to crime. Producers are privileged to have limitless access to market perception from consumer comments on social media and microblogs. Targeted advertising can be made more effective based on user profile information such as demography, interests and location. While these applications have proven beneficial, the ability to effectively infer the location of Twitter users has even more immense value. However, accurately identifying where a message originated from or an author’s location remains a challenge, thus essentially driving research in that regard. In this paper, we survey a range of techniques applied to infer the location of Twitter users from inception to state of the art. We find significant improvements over time in the granularity levels and better accuracy with results driven by refinements to algorithms and inclusion of more spatial features
PocketCare: Tracking the Flu with Mobile Phones using Partial Observations of Proximity and Symptoms
Mobile phones provide a powerful sensing platform that researchers may adopt
to understand proximity interactions among people and the diffusion, through
these interactions, of diseases, behaviors, and opinions. However, it remains a
challenge to track the proximity-based interactions of a whole community and
then model the social diffusion of diseases and behaviors starting from the
observations of a small fraction of the volunteer population. In this paper, we
propose a novel approach that tries to connect together these sparse
observations using a model of how individuals interact with each other and how
social interactions happen in terms of a sequence of proximity interactions. We
apply our approach to track the spreading of flu in the spatial-proximity
network of a 3000-people university campus by mobilizing 300 volunteers from
this population to monitor nearby mobile phones through Bluetooth scanning and
to daily report flu symptoms about and around them. Our aim is to predict the
likelihood for an individual to get flu based on how often her/his daily
routine intersects with those of the volunteers. Thus, we use the daily
routines of the volunteers to build a model of the volunteers as well as of the
non-volunteers. Our results show that we can predict flu infection two weeks
ahead of time with an average precision from 0.24 to 0.35 depending on the
amount of information. This precision is six to nine times higher than with a
random guess model. At the population level, we can predict infectious
population in a two-week window with an r-squared value of 0.95 (a random-guess
model obtains an r-squared value of 0.2). These results point to an innovative
approach for tracking individuals who have interacted with people showing
symptoms, allowing us to warn those in danger of infection and to inform health
researchers about the progression of contact-induced diseases
DeepCity: A Feature Learning Framework for Mining Location Check-ins
Online social networks being extended to geographical space has resulted in
large amount of user check-in data. Understanding check-ins can help to build
appealing applications, such as location recommendation. In this paper, we
propose DeepCity, a feature learning framework based on deep learning, to
profile users and locations, with respect to user demographic and location
category prediction. Both of the predictions are essential for social network
companies to increase user engagement. The key contribution of DeepCity is the
proposal of task-specific random walk which uses the location and user
properties to guide the feature learning to be specific to each prediction
task. Experiments conducted on 42M check-ins in three cities collected from
Instagram have shown that DeepCity achieves a superior performance and
outperforms other baseline models significantly
Where You Are Is What You Do: On Inferring Offline Activities From Location Data
In this paper we investigate the ability of modern machine learning
algorithms in inferring basic offline activities,~e.g., shopping and dining,
from location data. Using anonymized data of thousands of users of a prominent
location-based social network, we empirically demonstrate that not only
state-of-the-art machine learning excels at the task at hand~(F1 score>0.9) but
also tabular models are among the best performers. The findings we report here
not only fill an existing gap in the literature, but also highlight the
potential risks of such capabilities given the ubiquity of location data and
the high accessibility of tabular machine learning models.Comment: Accepted to IEEE ICDM Workshops 202
Demographic Inference and Representative Population Estimates from Multilingual Social Media Data
Social media provide access to behavioural data at an unprecedented scale and
granularity. However, using these data to understand phenomena in a broader
population is difficult due to their non-representativeness and the bias of
statistical inference tools towards dominant languages and groups. While
demographic attribute inference could be used to mitigate such bias, current
techniques are almost entirely monolingual and fail to work in a global
environment. We address these challenges by combining multilingual demographic
inference with post-stratification to create a more representative population
sample. To learn demographic attributes, we create a new multimodal deep neural
architecture for joint classification of age, gender, and organization-status
of social media users that operates in 32 languages. This method substantially
outperforms current state of the art while also reducing algorithmic bias. To
correct for sampling biases, we propose fully interpretable multilevel
regression methods that estimate inclusion probabilities from inferred joint
population counts and ground-truth population counts. In a large experiment
over multilingual heterogeneous European regions, we show that our demographic
inference and bias correction together allow for more accurate estimates of
populations and make a significant step towards representative social sensing
in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web
Conference (WWW '19
Loglinear model selection and human mobility
Methods for selecting loglinear models were among Steve Fienberg’s research interests since the start of his long and fruitful career. After we dwell upon the string of papers focusing on loglinear models that can be partly attributed to Steve’s contributions and influential ideas, we develop a new algorithm for selecting graphical loglinear models that is suitable for analyzing hyper-sparse contingency tables. We show how multi-way contingency tables can be used to represent patterns of human mobility. We analyze a dataset of geolocated tweets from South Africa that comprises 46 million latitude/longitude locations of 476,601 Twitter users that is summarized as a contingency table with 214 variables
- …