2,424 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Impact of the spatial context on human communication activity
Technology development produces terabytes of data generated by hu- man
activity in space and time. This enormous amount of data often called big data
becomes crucial for delivering new insights to decision makers. It contains
behavioral information on different types of human activity influenced by many
external factors such as geographic infor- mation and weather forecast. Early
recognition and prediction of those human behaviors are of great importance in
many societal applications like health-care, risk management and urban
planning, etc. In this pa- per, we investigate relevant geographical areas
based on their categories of human activities (i.e., working and shopping)
which identified from ge- ographic information (i.e., Openstreetmap). We use
spectral clustering followed by k-means clustering algorithm based on TF/IDF
cosine simi- larity metric. We evaluate the quality of those observed clusters
with the use of silhouette coefficients which are estimated based on the
similari- ties of the mobile communication activity temporal patterns. The area
clusters are further used to explain typical or exceptional communication
activities. We demonstrate the study using a real dataset containing 1 million
Call Detailed Records. This type of analysis and its application are important
for analyzing the dependency of human behaviors from the external factors and
hidden relationships and unknown correlations and other useful information that
can support decision-making.Comment: 12 pages, 11 figure
- …