21 research outputs found
Extending a geo-catalogue with matching capabilities
To achieve semantic interoperability, geo-spatial applications need to be equipped with tools able to understand user terminology that is typically different from the one enforced by standards. In this paper we summarize our experience in providing a semantic extension to the geo-catalogue of the Autonomous Province of Trento (PAT) in Italy. The semantic extension is based on the adoption of the S-Match semantic matching tool and on the use of a specifically designed faceted ontology codifying domain specific knowledge. We also briefly report our experience in the integration of the ontology with the geo-spatial ontology GeoWordNet
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
Linking geographic vocabularies through WordNet
The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharing geospatial information. One of the major obstacles to this vision lies in the difficulties found in the automatic integration between heterogeneous vocabularies and ontologies that provides the semantic backbone of the growing constellation of open geo-knowledge bases. In this article, we show how to utilize WordNet as a semantic hub to increase the integration of LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping technique between a given vocabulary and WordNet, combining intensional and extensional aspects of the geographic terms. Voc2WordNet is evaluated against a sample of human-generated alignments with the OpenStreetMap (OSM) Semantic Network, a crowdsourced geospatial resource, and the GeoNames ontology, the vocabulary of a large digital gazetteer. These empirical results indicate that the approach can obtain high precision and recall
{YAGO}2: A Spatially and Temporally Enhanced Knowledge Base from {Wikipedia}
We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 80 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95\% of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple model to time and space
GEIR: a Full-Fledged Geographically Enhanced Information Retrieval Solution
With the development of search engines (e.g. Google, Bing, Yahoo, etc.), people is ambitiously expecting higher quality and improvements of current technologies. Bringing human intelligence features to these tools, like the ability to find implicit information through semantics, is one of the must prominent research lines in Computer Science. Information semantics is a very wide concept, as wide as the human capability to interpret, in particular, the analysis of geographical semantics gives the possibility to associate information with a place. It is estimated that more than 70\% of all information in the world has some kind of geographic features \cite{Jones04}. In 2012, Ed Parsons, a GeoSpatial Technologist from Google, reported that between 30\% and 40\% of the user queries at Google search engine contain geographic references \cite{Parsons12}.
This thesis addresses the field of geographic information extraction and retrieval in unstructured texts. This process includes the identification of spatial features in textual documents, the data indexing, the manipulation of the relevance of the identified geographic entities and the multi-criteria retrieval according to the thematic and geographic information.
The main contributions of this work include a custom geographic knowledge base, built from the combination of GeoNames and WordNet; a Natural Language Processing and knowledge based heuristics for Toponym Recognition and Toponym Disambiguation; and a geographic relevance weighting model that supports non-spatial indexing and simple ranking combination approaches. The validity of each one of these components is supported by practical experiments that show their effectiveness in different scenarios and their alignment with state of the art solutions.
In addition, it also constitutes a main contribution of this work GEIR, a general purpose GIR framework that includes the implementations of the above described components and brings the possibility of implementing new ones and test their performance within an end to end GIR system
Enhancing the online discovery of geospatial data through taxonomy, folksonomy and semantic annotations
Spatial data infrastructures (SDIs) are meant to facilitate dissemination and consumption of spatial data, amongst others, through publication and discovery of spatial metadata in geoportals. However, geoportals are often known to geoinformation communities only and present technological limitations which make it difficult for general purpose web search engines to discover and index the data catalogued in (or registered with) a geoportal. The mismatch between standard spatial metadata content and the search terms that Web users employ when looking for spatial data, presents a further barrier to spatial data discovery. The need arises for creating and sharing spatial metadata that is discoverable by general purpose web search engines and users alike. Using folksonomies and semantic annotations appears as an option to eliminate the mismatch and to publish the metadata for discovery on the Web. Based on an analysis of search query terms employed when searching for spatial data on the Web, a taxonomy of search terms is constructed. The taxonomy constitutes the basis towards understanding how web resources in general, and HTML pages with standard spatial metadata in particular, can be documented so that they are discoverable by general purpose web search engines. We illustrate the use of the constructed taxonomy in semantic annotation of web resources, such as HTML pages with spatial metadata on the Web
Toponym Disambiguation in Information Retrieval
In recent years, geography has acquired a great importance in the context of
Information Retrieval (IR) and, in general, of the automated processing of
information in text. Mobile devices that are able to surf the web and at the
same time inform about their position are now a common reality, together
with applications that can exploit this data to provide users with locally
customised information, such as directions or advertisements. Therefore,
it is important to deal properly with the geographic information that is
included in electronic texts. The majority of such kind of information is
contained as place names, or toponyms.
Toponym ambiguity represents an important issue in Geographical Information
Retrieval (GIR), due to the fact that queries are geographically constrained.
There has been a struggle to nd speci c geographical IR methods
that actually outperform traditional IR techniques. Toponym ambiguity
may constitute a relevant factor in the inability of current GIR systems to
take advantage from geographical knowledge. Recently, some Ph.D. theses
have dealt with Toponym Disambiguation (TD) from di erent perspectives,
from the development of resources for the evaluation of Toponym Disambiguation
(Leidner (2007)) to the use of TD to improve geographical scope
resolution (Andogah (2010)). The Ph.D. thesis presented here introduces
a TD method based on WordNet and carries out a detailed study of the
relationship of Toponym Disambiguation to some IR applications, such as
GIR, Question Answering (QA) and Web retrieval.
The work presented in this thesis starts with an introduction to the applications
in which TD may result useful, together with an analysis of the
ambiguity of toponyms in news collections. It could not be possible to
study the ambiguity of toponyms without studying the resources that are
used as placename repositories; these resources are the equivalent to language
dictionaries, which provide the di erent meanings of a given word.Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912Palanci
Geographical queries reformulation using a parallel association rules generator to build spatial taxonomies
Geographical queries need a special process of reformulation by information retrieval systems (IRS) due to their specificities and hierarchical structure. This fact is ignored by most of web search engines. In this paper, we propose an automatic approach for building a spatial taxonomy, that models’ the notion of adjacency that will be used in the reformulation of the spatial part of a geographical query. This approach exploits the documents that are in top of the retrieved list when submitting a spatial entity, which is composed of a spatial relation and a noun of a city. Then, a transactional database is constructed, considering each document extracted as a transaction that contains the nouns of the cities sharing the country of the submitted query’s city. The algorithm frequent pattern growth (FP-growth) is applied to this database in his parallel version (parallel FP-growth: PFP) in order to generate association rules, that will form the country’s taxonomy in a Big Data context. Experiments has been conducted on Spark and their results show that query reformulation using the taxonomy constructed based on our proposed approach improves the precision and the effectiveness of the IRS