1,143 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
LORE: a model for the detection of fine-grained locative references in tweets
[EN] Extracting geospatially rich knowledge from tweets is of utmost importance for location-based systems in emergency services to raise situational awareness about a given crisis-related incident, such as earthquakes, floods, car accidents, terrorist attacks, shooting attacks, etc. The problem is that the majority of tweets are not geotagged, so we need to resort to the messages in the search of geospatial evidence. In this context, we present LORE, a location-detection system for tweets that leverages the geographic database GeoNames together with linguistic knowledge through NLP techniques. One of the main contributions of this model is to capture fine-grained complex locative references, ranging from geopolitical entities and natural geographic references to points of interest and traffic ways. LORE outperforms state-of-the-art open-source location-extraction systems (i.e. Stanford NER, spaCy, NLTK and OpenNLP), achieving an unprecedented trade-off between precision and recall. Therefore, our model provides not only a quantitative advantage over other well-known systems in terms of performance but also a qualitative advantage in terms of the diversity and semantic granularity of the locative references extracted from the tweets.Financial support for this research has been provided by the Spanish Ministry of Science, Innovation and Universities [grant number RTC 2017-6389-5], and the European Union's Horizon 2020 research and innovation program [grant number 101017861: project SMARTLAGOON]. We also thank Universidad de Granada for their financial support to the first author through the Becas de Iniciacion para estudiantes de Master 2018 del Plan Propio de la UGR.Fernández-MartĂnez, NJ.; Periñán-Pascual, C. (2021). LORE: a model for the detection of fine-grained locative references in tweets. Onomázein. (52):195-225. https://doi.org/10.7764/onomazein.52.111952255
Predictive Analysis on Twitter: Techniques and Applications
Predictive analysis of social media data has attracted considerable attention
from the research community as well as the business world because of the
essential and actionable information it can provide. Over the years, extensive
experimentation and analysis for insights have been carried out using Twitter
data in various domains such as healthcare, public health, politics, social
sciences, and demographics. In this chapter, we discuss techniques, approaches
and state-of-the-art applications of predictive analysis of Twitter data.
Specifically, we present fine-grained analysis involving aspects such as
sentiment, emotion, and the use of domain knowledge in the coarse-grained
analysis of Twitter data for making decisions and taking actions, and relate a
few success stories
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
Location Reference Recognition from Texts: A Survey and Comparison
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
Sentiment analysis during Hurricane Sandy in emergency response
Sentiment analysis has been widely researched in the domain of online review sites with the aim of generating summarized opinions of users about different aspects of products. However, there has been little work focusing on identifying the polarity of sentiments expressed by users during disaster events. Identifying such sentiments from online social networking sites can help emergency responders understand the dynamics of the network, e.g., the main users' concerns, panics, and the emotional impacts of interactions among members. In this paper, we perform a sentiment analysis of tweets posted on Twitter during the disastrous Hurricane Sandy and visualize online users' sentiments on a geographical map centered around the hurricane. We show how users' sentiments change according not only to their locations, but also based on the distance from the disaster. In addition, we study how the divergence of sentiments in a tweet posted during the hurricane affects the tweet retweetability. We find that extracting sentiments during a disaster may help emergency responders develop stronger situational awareness of the disaster zone itself
- …