Geographically constrained information retrieval

Abstract

Eighteen percent of information seekers demand geographically intelligent information retrieval systems (Sanderson and Kohler, 2004). State-of-the-art information retrieval (IR) systems lack the geographical intelligence needed to effectively answer geography-dependent questions. Two specific research objectives are addressed in this thesis: (1) how to mine and analyze the geographical information (GI) implicit in texts, and (2) how to use the geographical knowledge obtained in this way to build models for answering geography-dependent questions. We assume that every document and search query have a geographical scope (i.e., where the events described are situated). In order to exploit the notion geographical scope we first developed techniques to detect the geographical scope of documents, and resolve the scopes in case the indications are complex or inconsistent. The thesis then turns to problems whose solution may be improved by incorporating the notion geographical scope, namely (i) toponym resolution, i.e. determining which place is referred to when ambiguous place names (toponyms) are used, (ii) query expansion, the enrichment of queries often used in IR, and relevance ranking strategies. The toponym resolution strategy prefers candidate places in top ranked scopes, and the query expansion strategy prefers place names in commonly shared scopes. The relevance ranking strategy incorporates scope information in score calculation. New evaluation metrics that measure small discrepancies among toponym and scope resolution systems are also proposed. The scope and toponym resolution strategies achieved scores of 70% ~ 90% against human annotators. The query expansion and relevance ranking strategies out-performed state-of-the-art IR systems by 9%.

    Similar works