6 research outputs found
Location reference recognition from texts: A survey and comparison
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs
Recommended from our members
Emergent Forms of Online Sociality in Disasters Arising from Natural Hazards
Disasters arising from natural hazards are associated with breakdown of existing structures, but they also result in creation of new social ties in the process of self-organization and problem solving by those affected. This dissertation focuses on emergent forms of sociality that arise in the context of crisis. Specifically, it considers collaborative work practices, social network structures, and organizational forms that emerge on social media during disasters arising from natural hazards. Social media platforms support highly-distributed social environments, and the forms of sociality that emerge in these contexts are affected by the affordances of their technical features, especially those that more or less successfully facilitate the creation of a shared information space. Thus, this dissertation is organized around two important aspects of social media spaces: the availability of an explicitly-shared site of work and the availability of a visible, legible record of activity.This dissertation investigates the forms of sociality that emerge during disasters in three social media activities: retweeting, crisis mapping in OpenStreetMap (OSM), and Twitter reply conversations. These three social media activities highlight various availability of an explicitly-shared site of work and visible record of activity. The studies of retweeting and reply conversations investigate the Twitter activity in response to the 2012 Hurricane Sandy—the second costliest hurricane in US history and the most tweeted about event to date at the time. Analysis of crisis mapping in OpenStreetMap—an open, editable, volunteer-based map of the world—focuses on the OSM activity after the 2010 Haiti earthquake, which was the first major disaster event supported by OpenStreetMap. For these investigations, the dissertation elaborates and develops human-centered data science methods—a set of methodological approaches that both harness the power of computational techniques and account for the highly-situated nature of the social activity in crisis. Finally, the dissertation positions the findings from the three studies within the larger context of high-tempo, high-volume social media activity and highlights how the framework of the two intersecting dimensions of the shared information space reveals larger patterns within the emergent forms of sociality across contexts
Automated Assessment of the Aftermath of Typhoons Using Social Media Texts
Disasters are one of the major threats to economics and human societies, causing substantial losses of human lives, properties and infrastructures. It has been our persistent endeavors to understand, prevent and reduce such disasters, and the popularization of social media is offering new opportunities to enhance disaster management in a crowd-sourcing approach. However, social media data is also characterized by its undue brevity, intense noise, and informality of language. The existing literature has not completely addressed these disadvantages, otherwise vast manual efforts are devoted to tackling these problems.
The major focus of this research is on constructing a holistic framework to exploit social media data in typhoon damage assessment. The scope of this research covers data collection, relevance classification, location extraction and damage assessment while assorted approaches are utilized to overcome the disadvantages of social media data. Moreover, a semi-supervised or unsupervised approach is prioritized in forming the framework to minimize manual intervention.
In data collection, query expansion strategy is adopted to optimize the search recall of typhoon-relevant information retrieval. Multiple filtering strategies are developed to screen the keywords and maintain the relevance to search topics in the keyword updates. A classifier based on a convolutional neural network is presented for relevance classification, with hashtags and word clusters as extra input channels to augment the information. In location extraction, a model is constructed by integrating Bidirectional Long Short-Time Memory and Conditional Random Fields. Feature noise correction layers and label smoothing are leveraged to handle the noisy training data. Finally, a multi-instance multi-label classifier identifies the damage relations in four categories, and the damage categories of a message are integrated with the damage descriptions score to obtain damage severity score for the message.
A case study is conducted to verify the effectiveness of the framework. The outcomes indicate that the approaches and models developed in this study significantly improve in the classification of social media texts especially under the framework of semi-supervised or unsupervised learning. Moreover, the results of damage assessment from social media data are remarkably consistent with the official statistics, which demonstrates the practicality of the proposed damage scoring scheme