152 research outputs found
Geocoding location expressions in Twitter messages: A preference learning method
Resolving location expressions in text to the correct physical location, also known as geocoding or grounding, is complicated by the fact that so many places around the world share the same name. Correct resolution is made even more difficult when there is little context to determine which place is intended, as in a 140-character Twitter message, or when location cues from different sources conflict, as may be the case among different metadata fields of a Twitter message. We used supervised machine learning to weigh the different fields of the Twitter message and the features of a world gazetteer to create a model that will prefer the correct gazetteer candidate to resolve the extracted expression. We evaluated our model using the F1 measure and compared it to similar algorithms. Our method achieved results higher than state-of-the-art competitors
A Coherent Unsupervised Model for Toponym Resolution
Toponym Resolution, the task of assigning a location mention in a document to
a geographic referent (i.e., latitude/longitude), plays a pivotal role in
analyzing location-aware content. However, the ambiguities of natural language
and a huge number of possible interpretations for toponyms constitute
insurmountable hurdles for this task. In this paper, we study the problem of
toponym resolution with no additional information other than a gazetteer and no
training data. We demonstrate that a dearth of large enough annotated data
makes supervised methods less capable of generalizing. Our proposed method
estimates the geographic scope of documents and leverages the connections
between nearby place names as evidence to resolve toponyms. We explore the
interactions between multiple interpretations of mentions and the relationships
between different toponyms in a document to build a model that finds the most
coherent resolution. Our model is evaluated on three news corpora, two from the
literature and one collected and annotated by us; then, we compare our methods
to the state-of-the-art unsupervised and supervised techniques. We also examine
three commercial products including Reuters OpenCalais, Yahoo! YQL Placemaker,
and Google Cloud Natural Language API. The evaluation shows that our method
outperforms the unsupervised technique as well as Reuters OpenCalais and Google
Cloud Natural Language API on all three corpora; also, our method shows a
performance close to that of the state-of-the-art supervised method and
outperforms it when the test data has 40% or more toponyms that are not seen in
the training data.Comment: 9 pages (+1 page reference), WWW '18 Proceedings of the 2018 World
Wide Web Conferenc
How Do People Describe Locations During a Natural Disaster: An Analysis of Tweets from Hurricane Harvey
Social media platforms, such as Twitter, have been increasingly used by people during natural disasters to share information and request for help. Hurricane Harvey was a category 4 hurricane that devastated Houston, Texas, USA in August 2017 and caused catastrophic flooding in the Houston metropolitan area. Hurricane Harvey also witnessed the widespread use of social media by the general public in response to this major disaster, and geographic locations are key information pieces described in many of the social media messages. A geoparsing system, or a geoparser, can be utilized to automatically extract and locate the described locations, which can help first responders reach the people in need. While a number of geoparsers have already been developed, it is unclear how effective they are in recognizing and geo-locating the locations described by people during natural disasters. To fill this gap, this work seeks to understand how people describe locations during a natural disaster by analyzing a sample of tweets posted during Hurricane Harvey. We then identify the limitations of existing geoparsers in processing these tweets, and discuss possible approaches to overcoming these limitations
A pragmatic guide to geoparsing evaluation
Abstract: Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the different types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained Pragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models
- …