44 research outputs found
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
LORE: a model for the detection of fine-grained locative references in tweets
[EN] Extracting geospatially rich knowledge from tweets is of utmost importance for location-based systems in emergency services to raise situational awareness about a given crisis-related incident, such as earthquakes, floods, car accidents, terrorist attacks, shooting attacks, etc. The problem is that the majority of tweets are not geotagged, so we need to resort to the messages in the search of geospatial evidence. In this context, we present LORE, a location-detection system for tweets that leverages the geographic database GeoNames together with linguistic knowledge through NLP techniques. One of the main contributions of this model is to capture fine-grained complex locative references, ranging from geopolitical entities and natural geographic references to points of interest and traffic ways. LORE outperforms state-of-the-art open-source location-extraction systems (i.e. Stanford NER, spaCy, NLTK and OpenNLP), achieving an unprecedented trade-off between precision and recall. Therefore, our model provides not only a quantitative advantage over other well-known systems in terms of performance but also a qualitative advantage in terms of the diversity and semantic granularity of the locative references extracted from the tweets.Financial support for this research has been provided by the Spanish Ministry of Science, Innovation and Universities [grant number RTC 2017-6389-5], and the European Union's Horizon 2020 research and innovation program [grant number 101017861: project SMARTLAGOON]. We also thank Universidad de Granada for their financial support to the first author through the Becas de Iniciacion para estudiantes de Master 2018 del Plan Propio de la UGR.Fernández-MartĂnez, NJ.; Periñán-Pascual, C. (2021). LORE: a model for the detection of fine-grained locative references in tweets. Onomázein. (52):195-225. https://doi.org/10.7764/onomazein.52.111952255
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
A survey of location inference techniques on Twitter
The increasing popularity of the social networking service, Twitter, has made it more involved in day-to-day communications, strengthening social relationships and information dissemination. Conversations on Twitter are now being explored as indicators within early warning systems to alert of imminent natural disasters such as earthquakes and aid prompt emergency responses to crime. Producers are privileged to have limitless access to market perception from consumer comments on social media and microblogs. Targeted advertising can be made more effective based on user profile information such as demography, interests and location. While these applications have proven beneficial, the ability to effectively infer the location of Twitter users has even more immense value. However, accurately identifying where a message originated from or an author’s location remains a challenge, thus essentially driving research in that regard. In this paper, we survey a range of techniques applied to infer the location of Twitter users from inception to state of the art. We find significant improvements over time in the granularity levels and better accuracy with results driven by refinements to algorithms and inclusion of more spatial features