21,250 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives
How did the popularity of the Greek Prime Minister evolve in 2015? How did
the predominant sentiment about him vary during that period? Were there any
controversial sub-periods? What other entities were related to him during these
periods? To answer these questions, one needs to analyze archived documents and
data about the query entities, such as old news articles or social media
archives. In particular, user-generated content posted in social networks, like
Twitter and Facebook, can be seen as a comprehensive documentation of our
society, and thus meaningful analysis methods over such archived data are of
immense value for sociologists, historians and other interested parties who
want to study the history and evolution of entities and events. To this end, in
this paper we propose an entity-centric approach to analyze social media
archives and we define measures that allow studying how entities were reflected
in social media in different time periods and under different aspects, like
popularity, attitude, controversiality, and connectedness with other entities.
A case study using a large Twitter archive of four years illustrates the
insights that can be gained by such an entity-centric and multi-aspect
analysis.Comment: This is a preprint of an article accepted for publication in the
International Journal on Digital Libraries (2018
Social Media Text Processing and Semantic Analysis for Smart Cities
With the rise of Social Media, people obtain and share information almost
instantly on a 24/7 basis. Many research areas have tried to gain valuable
insights from these large volumes of freely available user generated content.
With the goal of extracting knowledge from social media streams that might be
useful in the context of intelligent transportation systems and smart cities,
we designed and developed a framework that provides functionalities for
parallel collection of geo-located tweets from multiple pre-defined bounding
boxes (cities or regions), including filtering of non-complying tweets, text
pre-processing for Portuguese and English language, topic modeling, and
transportation-specific text classifiers, as well as, aggregation and data
visualization.
We performed an exploratory data analysis of geo-located tweets in 5
different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and
Melbourne, comprising a total of more than 43 million tweets in a period of 3
months. Furthermore, we performed a large scale topic modelling comparison
between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are
shared between both cities which despite being in the same country are
considered very different regarding population, economy and lifestyle.
We take advantage of recent developments in word embeddings and train such
representations from the collections of geo-located tweets. We then use a
combination of bag-of-embeddings and traditional bag-of-words to train
travel-related classifiers in both Portuguese and English to filter
travel-related content from non-related. We created specific gold-standard data
to perform empirical evaluation of the resulting classifiers. Results are in
line with research work in other application areas by showing the robustness of
using word embeddings to learn word similarities that bag-of-words is not able
to capture
- …