Extracting geographical tags from webpages is a well-motivated application in
many domains. In illicit domains with unusual language models, like human
trafficking, extracting geotags with both high precision and recall is a
challenging problem. In this paper, we describe a geotag extraction framework
in which context, constraints and the openly available Geonames knowledge base
work in tandem in an Integer Linear Programming (ILP) model to achieve good
performance. In preliminary empirical investigations, the framework improves
precision by 28.57% and F-measure by 36.9% on a difficult human trafficking
geotagging task compared to a machine learning-based baseline. The method is
already being integrated into an existing knowledge base construction system
widely used by US law enforcement agencies to combat human trafficking.Comment: 6 pages, GeoRich 2017 workshop at ACM SIGMOD conferenc