51 research outputs found
Accurate Local Estimation of Geo-Coordinates for Social Media Posts
Associating geo-coordinates with the content of social media posts can
enhance many existing applications and services and enable a host of new ones.
Unfortunately, a majority of social media posts are not tagged with
geo-coordinates. Even when location data is available, it may be inaccurate,
very broad or sometimes fictitious. Contemporary location estimation approaches
based on analyzing the content of these posts can identify only broad areas
such as a city, which limits their usefulness. To address these shortcomings,
this paper proposes a methodology to narrowly estimate the geo-coordinates of
social media posts with high accuracy. The methodology relies solely on the
content of these posts and prior knowledge of the wide geographical region from
where the posts originate. An ensemble of language models, which are smoothed
over non-overlapping sub-regions of a wider region, lie at the heart of the
methodology. Experimental evaluation using a corpus of over half a million
tweets from New York City shows that the approach, on an average, estimates
locations of tweets to within just 2.15km of their actual positions.Comment: In Proceedings of the 26th International Conference on Software
Engineering and Knowledge Engineering, pp. 642 - 647, 201
When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events
Predicting both the time and the location of human movements is valuable but
challenging for a variety of applications. To address this problem, we propose
an approach considering both the periodicity and the sociality of human
movements. We first define a new concept, Social Spatial-Temporal Event (SSTE),
to represent social interactions among people. For the time prediction, we
characterise the temporal dynamics of SSTEs with an ARMA (AutoRegressive Moving
Average) model. To dynamically capture the SSTE kinetics, we propose a Kalman
Filter based learning algorithm to learn and incrementally update the ARMA
model as a new observation becomes available. For the location prediction, we
propose a ranking model where the periodicity and the sociality of human
movements are simultaneously taken into consideration for improving the
prediction accuracy. Extensive experiments conducted on real data sets validate
our proposed approach
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Geographically annotated social media is extremely valuable for modern
information retrieval. However, when researchers can only access
publicly-visible data, one quickly finds that social media users rarely publish
location information. In this work, we provide a method which can geolocate the
overwhelming majority of active Twitter users, independent of their location
sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's
locations. We frame the geotagging problem as an optimization over a social
network with a total variation-based objective and provide a scalable and
distributed algorithm for its solution. Furthermore, we show how a robust
estimate of the geographic dispersion of each user's ego network can be used as
a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for
101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag
over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan,
David Jurgens, and David Allen. "Geotagging one hundred million twitter
accounts with total variation minimization." Big Data (Big Data), 2014 IEEE
International Conference on. IEEE, 201
Hete-CF: Social-Based Collaborative Filtering Recommendation using Heterogeneous Relations
Collaborative filtering algorithms haven been widely used in recommender
systems. However, they often suffer from the data sparsity and cold start
problems. With the increasing popularity of social media, these problems may be
solved by using social-based recommendation. Social-based recommendation, as an
emerging research area, uses social information to help mitigate the data
sparsity and cold start problems, and it has been demonstrated that the
social-based recommendation algorithms can efficiently improve the
recommendation performance. However, few of the existing algorithms have
considered using multiple types of relations within one social network. In this
paper, we investigate the social-based recommendation algorithms on
heterogeneous social networks and proposed Hete-CF, a Social Collaborative
Filtering algorithm using heterogeneous relations. Distinct from the exiting
methods, Hete-CF can effectively utilize multiple types of relations in a
heterogeneous social network. In addition, Hete-CF is a general approach and
can be used in arbitrary social networks, including event based social
networks, location based social networks, and any other types of heterogeneous
information networks associated with social information. The experimental
results on two real-world data sets, DBLP (a typical heterogeneous information
network) and Meetup (a typical event based social network) show the
effectiveness and efficiency of our algorithm
Location Prediction: Communities Speak Louder than Friends
Humans are social animals, they interact with different communities of
friends to conduct different activities. The literature shows that human
mobility is constrained by their social relations. In this paper, we
investigate the social impact of a person's communities on his mobility,
instead of all friends from his online social networks. This study can be
particularly useful, as certain social behaviors are influenced by specific
communities but not all friends. To achieve our goal, we first develop a
measure to characterize a person's social diversity, which we term `community
entropy'. Through analysis of two real-life datasets, we demonstrate that a
person's mobility is influenced only by a small fraction of his communities and
the influence depends on the social contexts of the communities. We then
exploit machine learning techniques to predict users' future movement based on
their communities' information. Extensive experiments demonstrate the
prediction's effectiveness.Comment: ACM Conference on Online Social Networks 2015, COSN 201
Creating Full Individual-level Location Timelines from Sparse Social Media Data
In many domain applications, a continuous timeline of human locations is
critical; for example for understanding possible locations where a disease may
spread, or the flow of traffic. While data sources such as GPS trackers or Call
Data Records are temporally-rich, they are expensive, often not publicly
available or garnered only in select locations, restricting their wide use.
Conversely, geo-located social media data are publicly and freely available,
but present challenges especially for full timeline inference due to their
sparse nature. We propose a stochastic framework, Intermediate Location
Computing (ILC) which uses prior knowledge about human mobility patterns to
predict every missing location from an individual's social media timeline. We
compare ILC with a state-of-the-art RNN baseline as well as methods that are
optimized for next-location prediction only. For three major cities, ILC
predicts the top 1 location for all missing locations in a timeline, at 1 and
2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all
compared methods). Specifically, ILC also outperforms the RNN in settings of
low data; both cases of very small number of users (under 50), as well as
settings with more users, but with sparser timelines. In general, the RNN model
needs a higher number of users to achieve the same performance as ILC. Overall,
this work illustrates the tradeoff between prior knowledge of heuristics and
more data, for an important societal problem of filling in entire timelines
using freely available, but sparse social media data.Comment: 10 pages, 8 figures, 2 table
Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the
increasing availability of large-scale heterogenous datasets capturing online
user interactions and behaviors. Traditional analysis of techno-social systems
data has focused mainly on describing either the dynamics of social
interactions, or the attributes and behaviors of the users. However,
overwhelming empirical evidence suggests that the two dimensions affect one
another, and therefore they should be jointly modeled and analyzed in a
multi-modal framework. The benefits of such an approach include the ability to
build better predictive models, leveraging social network information as well
as user behavioral signals. To this purpose, here we propose the Constrained
Latent Space Model (CLSM), a generalized framework that combines Mixed
Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA)
incorporating a constraint that forces the latent space to concurrently
describe the multiple data modalities. We derive an efficient inference
algorithm based on Variational Expectation Maximization that has a
computational cost linear in the size of the network, thus making it feasible
to analyze massive social datasets. We validate the proposed framework on two
problems: prediction of social interactions from user attributes and behaviors,
and behavior prediction exploiting network information. We perform experiments
with a variety of multi-modal social systems, spanning location-based social
networks (Gowalla), social media services (Instagram, Orkut), e-commerce and
review sites (Amazon, Ciao), and finally citation networks (Cora). The results
indicate significant improvement in prediction accuracy over state of the art
methods, and demonstrate the flexibility of the proposed approach for
addressing a variety of different learning problems commonly occurring with
multi-modal social data.Comment: 12 pages, 7 figures, 2 table
Diffusion of Lexical Change in Social Media
Computer-mediated communication is driving fundamental changes in the nature
of written language. We investigate these changes by statistical analysis of a
dataset comprising 107 million Twitter messages (authored by 2.7 million unique
user accounts). Using a latent vector autoregressive model to aggregate across
thousands of words, we identify high-level patterns in diffusion of linguistic
change over the United States. Our model is robust to unpredictable changes in
Twitter's sampling rate, and provides a probabilistic characterization of the
relationship of macro-scale linguistic influence to a set of demographic and
geographic predictors. The results of this analysis offer support for prior
arguments that focus on geographical proximity and population size. However,
demographic similarity -- especially with regard to race -- plays an even more
central role, as cities with similar racial demographics are far more likely to
share linguistic influence. Rather than moving towards a single unified
"netspeak" dialect, language evolution in computer-mediated communication
reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311
- …