228 research outputs found
Exploiting contextual information for fine-grained tweet geolocation
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiative; DSO Laboratorie
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
A Transformer-based Framework for POI-level Social Post Geolocation
POI-level geo-information of social posts is critical to many location-based
applications and services. However, the multi-modality, complexity and diverse
nature of social media data and their platforms limit the performance of
inferring such fine-grained locations and their subsequent applications. To
address this issue, we present a transformer-based general framework, which
builds upon pre-trained language models and considers non-textual data, for
social post geolocation at the POI level. To this end, inputs are categorized
to handle different social data, and an optimal combination strategy is
provided for feature representations. Moreover, a uniform representation of
hierarchy is proposed to learn temporal information, and a concatenated version
of encodings is employed to capture feature-wise positions better. Experimental
results on various social datasets demonstrate that three variants of our
proposed framework outperform multiple state-of-art baselines by a large margin
in terms of accuracy and distance error metrics.Comment: Full papers are 12 pages in length plus additional 4 pages for
references (turns to 18 pages in total after submitting to arxiv). One figure
and 5 tables are contained. This paper was submitted to ECIR 2023 for revie
LORE: a model for the detection of fine-grained locative references in tweets
[EN] Extracting geospatially rich knowledge from tweets is of utmost importance for location-based systems in emergency services to raise situational awareness about a given crisis-related incident, such as earthquakes, floods, car accidents, terrorist attacks, shooting attacks, etc. The problem is that the majority of tweets are not geotagged, so we need to resort to the messages in the search of geospatial evidence. In this context, we present LORE, a location-detection system for tweets that leverages the geographic database GeoNames together with linguistic knowledge through NLP techniques. One of the main contributions of this model is to capture fine-grained complex locative references, ranging from geopolitical entities and natural geographic references to points of interest and traffic ways. LORE outperforms state-of-the-art open-source location-extraction systems (i.e. Stanford NER, spaCy, NLTK and OpenNLP), achieving an unprecedented trade-off between precision and recall. Therefore, our model provides not only a quantitative advantage over other well-known systems in terms of performance but also a qualitative advantage in terms of the diversity and semantic granularity of the locative references extracted from the tweets.Financial support for this research has been provided by the Spanish Ministry of Science, Innovation and Universities [grant number RTC 2017-6389-5], and the European Union's Horizon 2020 research and innovation program [grant number 101017861: project SMARTLAGOON]. We also thank Universidad de Granada for their financial support to the first author through the Becas de Iniciacion para estudiantes de Master 2018 del Plan Propio de la UGR.Fernández-MartÃnez, NJ.; Periñán-Pascual, C. (2021). LORE: a model for the detection of fine-grained locative references in tweets. Onomázein. (52):195-225. https://doi.org/10.7764/onomazein.52.111952255
Exploiting user and venue characteristics for fine-grained tweet geolocation
National Research Foundation (NRF) Singapore under International Research Centre @ Singapore Funding Initiative; DS
Fine-grained geolocation of tweets in temporal proximity
Singapore National Research Foundation under International Research Centres in Singapore Funding Initiativ
The role of geographic knowledge in sub-city level geolocation algorithms
Geolocation of microblog messages has been largely investigated in the lit-
erature. Many solutions have been proposed that achieve good results at the
city-level. Existing approaches are mainly data-driven (i.e., they rely on a
training phase). However, the development of algorithms for geolocation at
sub-city level is still an open problem also due to the absence of good training
datasets. In this thesis, we investigate the role that external geographic know-
ledge can play in geolocation approaches. We show how di)erent geographical
data sources can be combined with a semantic layer to achieve reasonably
accurate sub-city level geolocation. Moreover, we propose a knowledge-based
method, called Sherloc, to accurately geolocate messages at sub-city level, by
exploiting the presence in the message of toponyms possibly referring to the
speci*c places in the target geographical area. Sherloc exploits the semantics
associated with toponyms contained in gazetteers and embeds them into a
metric space that captures the semantic distance among them. This allows
toponyms to be represented as points and indexed by a spatial access method,
allowing us to identify the semantically closest terms to a microblog message,
that also form a cluster with respect to their spatial locations. In contrast to
state-of-the-art methods, Sherloc requires no prior training, it is not limited
to geolocating on a *xed spatial grid and it experimentally demonstrated its
ability to infer the location at sub-city level with higher accuracy
LOCATION MENTION PREDICTION FROM DISASTER TWEETS
While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract and resolve geolocation information. This dissertation focuses on the Location Mention Prediction (LMP) problem that consists of Location Mention Recognition (LMR) and Location Mention Disambiguation (LMD) tasks. Our work contributes to studying two main factors that influence the robustness of LMP systems: (i) the dataset used to train the model, and (ii) the learning model. As for the training dataset, we study the best training and evaluation strategies to exploit existing datasets and tools at the onset of disaster events. We emphasize that the size of training data matters and recommend considering the data domain, the disaster domain, and geographical proximity when training LMR models. We further construct the public IDRISI datasets, the largest to date English and first Arabic datasets for the LMP tasks. Rigorous analysis and experiments show that the IDRISI datasets are diverse, and domain and geographically generalizable, compared to existing datasets. As for the learning models, the LMP tasks are understudied in the disaster management domain. To address this, we reformulate the LMR and LMD modeling and evaluation to better suit the requirements of the response authorities. Moreover, we introduce competitive and state-of-the-art LMR and LMD models that are compared against a representative set of baselines for both Arabic and English languages
- …