Data and methods for Gazetteer Independent Toponym Resolution

Abstract

This thesis looks at the computational task of Toponym Resolution from multiple perspectives. In its common form the task requires transforming a place name--e.g. Washington--into some grounded representation of that place, typically a point (latitude, longitude) geometry. In recent years Toponym Resolution (TR) systems have advanced beyond heuristic techniques into more complex machine learned classifiers and impressive gains have been made. Despite these advances, a number of issues remain with the task. This thesis looks at aspects of typical TR approaches in a critical light and proposes solutions and new methods. In particular, I'm critical of the dependence of existing approaches on gazetteer matching and under-utilization of complex geometric data types. I also outline some of the shortcomings in existing toponym corpora and detail a new corpus and annotation tool which I helped to develop.In earlier work I explored whether TR systems could be built without dependencies on gazetteer lookups. That work, which I expand and review in this thesis, showed that competitive accuracies can be achieved without using these human curated resources. Additionally, I demonstrate through error analysis that the largest advantage of a gazetteer matching component is with ontology correction and matching, and not with disambiguation or grounding.These new approaches are tested on pre-existing TR corpora, as well as a new corpus in a novel domain. In the process of detailing the new corpus, I remark on many challenges and design decisions that must be made in Toponym Resolution and propose a new evaluation metric.Linguistic

    Similar works