1,155 research outputs found
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview
After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The
purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for
topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the
organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were
translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly
more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and
117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes,
named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)
GeoCLEF 2007: the CLEF 2007 cross-language geographic information retrieval track overview
GeoCLEF ran as a regular track for the second time within the Cross
Language Evaluation Forum (CLEF) 2007. The purpose of GeoCLEF is to test
and evaluate cross-language geographic information retrieval (GIR): retrieval
for topics with a geographic specification. GeoCLEF 2007 consisted of two sub
tasks. A search task ran for the third time and a query classification task was
organized for the first. For the GeoCLEF 2007 search task, twenty-five search
topics were defined by the organizing groups for searching English, German,
Portuguese and Spanish document collections. All topics were translated into
English, Indonesian, Portuguese, Spanish and German. Several topics in 2007
were geographically challenging. Thirteen groups submitted 108 runs. The
groups used a variety of approaches. For the classification task, a query log
from a search engine was provided and the groups needed to identify the
queries with a geographic scope and the geographic components within the
local queries
An evaluation resource for geographic information retrieval
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation
Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource
encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic
information retrieval requires an evaluation resource which represents realistic information needs and which is geographically
challenging. Some experimental results and analysis are reported
Development and evaluation of a geographic information retrieval system using fine grained toponyms
Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spatial component linked by a spatial relationship. However, evaluation initiatives have often failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial indexes) at retrieving documents from a corpus describing mountaineering expeditions, centred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collection of queries and judgments. The test collection allowed us to demonstrate that a GIR-based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR
POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset
Many services that perform information retrieval for Points of Interest (POI)
utilize a Lucene-based setup with spatial filtering. While this type of system
is easy to implement it does not make use of semantics but relies on direct
word matches between a query and reviews leading to a loss in both precision
and recall. To study the challenging task of semantically enriching POIs from
unstructured data in order to support open-domain search and question answering
(QA), we introduce a new dataset POIReviewQA. It consists of 20k questions
(e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each
question we sampled 10 reviews, and annotated each sentence in the reviews
whether it answers the question and what the corresponding answer is. To test a
system's ability to understand the text we adopt an information retrieval
evaluation by ranking all the review sentences for a question based on the
likelihood that they answer this question. We build a Lucene-based baseline
model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model
achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a
challenging problem for future research by the GIR community. The result
technology can help exploit the thematic content of web documents and social
media for characterisation of locations
Unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation of water-based locations
This short paper investigates how locations in or close to
water masses in topics and documents (e.g. rivers, seas,
oceans) are referred to. For this study, 13 topics from the
GeoCLEF topics 2005-2008 aiming at documents on rivers,
oceans, or sea names were selected and the corresponding
relevant documents retrieved and manually annotated. Results of the geographic annotation indicate that i) topics aiming at locations close to water contain a wide variety of spatial relations (indicated by dierent prepositions), ii)
unnamed locations can be generated on-the-fly by referring
to movable objects (e.g. ships, planes) travelling along a
path, iii) underspecied regions are referenced by proximity
or distance or directional relations. In addition, several
generic expressions (e.g. "in international waters") are frequently used, but refer to different underspecified regions
Challenges to evaluation of multilingual geographic information retrieval in GeoCLEF
This is the third year of the evaluation of
geographic information retrieval (GeoCLEF)
within the Cross-Language Evaluation Forum
(CLEF). GeoCLEF 2006 presented topics and
documents in four languages (English,
German, Portuguese and Spanish). After two
years of evaluation we are beginning to
understand the challenges to both Geographic
Information Retrieval from text and of
evaluation of the results of geographic
information retrieval. This poster enumerates
some of these challenges to evaluation and
comments on the limitations encountered in the
first two evaluations
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
- …