1,759 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Characterizing Information Diets of Social Media Users
With the widespread adoption of social media sites like Twitter and Facebook,
there has been a shift in the way information is produced and consumed.
Earlier, the only producers of information were traditional news organizations,
which broadcast the same carefully-edited information to all consumers over
mass media channels. Whereas, now, in online social media, any user can be a
producer of information, and every user selects which other users she connects
to, thereby choosing the information she consumes. Moreover, the personalized
recommendations that most social media sites provide also contribute towards
the information consumed by individual users. In this work, we define a concept
of information diet -- which is the topical distribution of a given set of
information items (e.g., tweets) -- to characterize the information produced
and consumed by various types of users in the popular Twitter social media. At
a high level, we find that (i) popular users mostly produce very specialized
diets focusing on only a few topics; in fact, news organizations (e.g.,
NYTimes) produce much more focused diets on social media as compared to their
mass media diets, (ii) most users' consumption diets are primarily focused
towards one or two topics of their interest, and (iii) the personalized
recommendations provided by Twitter help to mitigate some of the topical
imbalances in the users' consumption diets, by adding information on diverse
topics apart from the users' primary topics of interest.Comment: In Proceeding of International AAAI Conference on Web and Social
Media (ICWSM), Oxford, UK, May 201
Creating Full Individual-level Location Timelines from Sparse Social Media Data
In many domain applications, a continuous timeline of human locations is
critical; for example for understanding possible locations where a disease may
spread, or the flow of traffic. While data sources such as GPS trackers or Call
Data Records are temporally-rich, they are expensive, often not publicly
available or garnered only in select locations, restricting their wide use.
Conversely, geo-located social media data are publicly and freely available,
but present challenges especially for full timeline inference due to their
sparse nature. We propose a stochastic framework, Intermediate Location
Computing (ILC) which uses prior knowledge about human mobility patterns to
predict every missing location from an individual's social media timeline. We
compare ILC with a state-of-the-art RNN baseline as well as methods that are
optimized for next-location prediction only. For three major cities, ILC
predicts the top 1 location for all missing locations in a timeline, at 1 and
2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all
compared methods). Specifically, ILC also outperforms the RNN in settings of
low data; both cases of very small number of users (under 50), as well as
settings with more users, but with sparser timelines. In general, the RNN model
needs a higher number of users to achieve the same performance as ILC. Overall,
this work illustrates the tradeoff between prior knowledge of heuristics and
more data, for an important societal problem of filling in entire timelines
using freely available, but sparse social media data.Comment: 10 pages, 8 figures, 2 table
#greysanatomy vs. #yankees: Demographics and Hashtag Use on Twitter
Demographics, in particular, gender, age, and race, are a key predictor of
human behavior. Despite the significant effect that demographics plays, most
scientific studies using online social media do not consider this factor,
mainly due to the lack of such information. In this work, we use
state-of-the-art face analysis software to infer gender, age, and race from
profile images of 350K Twitter users from New York. For the period from
November 1, 2014 to October 31, 2015, we study which hashtags are used by
different demographic groups. Though we find considerable overlap for the most
popular hashtags, there are also many group-specific hashtags.Comment: This is a preprint of an article appearing at ICWSM 201
What You Like: Generating Explainable Topical Recommendations for Twitter Using Social Annotations
With over 500 million tweets posted per day, in Twitter, it is difficult for
Twitter users to discover interesting content from the deluge of uninteresting
posts. In this work, we present a novel, explainable, topical recommendation
system, that utilizes social annotations, to help Twitter users discover
tweets, on topics of their interest. A major challenge in using traditional
rating dependent recommendation systems, like collaborative filtering and
content based systems, in high volume social networks is that, due to attention
scarcity most items do not get any ratings. Additionally, the fact that most
Twitter users are passive consumers, with 44% users never tweeting, makes it
very difficult to use user ratings for generating recommendations. Further, a
key challenge in developing recommendation systems is that in many cases users
reject relevant recommendations if they are totally unfamiliar with the
recommended item. Providing a suitable explanation, for why the item is
recommended, significantly improves the acceptability of recommendation. By
virtue of being a topical recommendation system our method is able to present
simple topical explanations for the generated recommendations. Comparisons with
state-of-the-art matrix factorization based collaborative filtering, content
based and social recommendations demonstrate the efficacy of the proposed
approach
- …