716 research outputs found
Web based knowledge extraction and consolidation for automatic ontology instantiation
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation
Spatio-textual indexing for geographical search on the web
Many web documents refer to specific geographic localities and many
people include geographic context in queries to web search engines. Standard
web search engines treat the geographical terms in the same way as other terms.
This can result in failure to find relevant documents that refer to the place of
interest using alternative related names, such as those of included or nearby
places. This can be overcome by associating text indexing with spatial indexing
methods that exploit geo-tagging procedures to categorise documents with
respect to geographic space. We describe three methods for spatio-textual
indexing based on multiple spatially indexed text indexes, attaching spatial
indexes to the document occurrences of a text index, and merging text index
access results with results of access to a spatial index of documents. These
schemes are compared experimentally with a conventional text index search
engine, using a collection of geo-tagged web documents, and are shown to be
able to compete in speed and storage performance with pure text indexing
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
Flabase: towards the creation of a flamenco music knowledge base
Online information about flamenco music is scattered overdifferent sites and knowledge bases. Unfortunately, thereis no common repository that indexes all these data. Inthis work, information related to flamenco music is gath-ered from general knowledge bases (e.g., Wikipedia, DB-pedia), music encyclopedias (e.g., MusicBrainz), and spe-cialized flamenco websites, and is then integrated into anew knowledge base called FlaBase. As resources fromdifferent data sources do not share common identifiers, aprocess of pair-wise entity resolution has been performed.FlaBase contains information about 1,174 artists, 76pa-los(flamenco genres), 2,913 albums, 14,078 tracks, and771 Andalusian locations. It is freely available in RDF andJSON formats. In addition, a method for entity recognitionand disambiguation for FlaBase has been created. The sys-tem can recognize and disambiguate FlaBase entity refer-ences in Spanish texts with an f-measure value of 0.77. Weapplied it to biographical texts present in Flabase. By usingthe extracted information, the knowledge base is populatedwith relevant information and a semantic graph is createdconnecting the entities of FlaBase. Artists relevance is thencomputed over the graph and evaluated according to a fla-menco expert criteria. Accuracy of results shows a highdegree of quality and completeness of the knowledge base
DBpedia Spotlight: Shedding Light on the Web of Documents
Interlinking text documents with Linked Open Data enables
the Web of Data to be used as background knowledge within
document-oriented applications such as search and faceted
browsing. As a step towards interconnecting the Web of
Documents with the Web of Data, we developed DBpedia
Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users
to congure the annotations to their specic needs through
the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation condence. We compare our approach with the
state of the art in disambiguation, and evaluate our results
in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our
system. DBpedia Spotlight is shared as open source and
deployed as a Web Service freely available for public use
Yago - a core of semantic knowledge
We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains roughly 900,000 entities and 5,000,000 facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as relation{hasWonPrize}). The facts have been automatically extracted from the unification of Wikipedia and WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships -- and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques
Geospatial Semantics
Geospatial semantics is a broad field that involves a variety of research
areas. The term semantics refers to the meaning of things, and is in contrast
with the term syntactics. Accordingly, studies on geospatial semantics usually
focus on understanding the meaning of geographic entities as well as their
counterparts in the cognitive and digital world, such as cognitive geographic
concepts and digital gazetteers. Geospatial semantics can also facilitate the
design of geographic information systems (GIS) by enhancing the
interoperability of distributed systems and developing more intelligent
interfaces for user interactions. During the past years, a lot of research has
been conducted, approaching geospatial semantics from different perspectives,
using a variety of methods, and targeting different problems. Meanwhile, the
arrival of big geo data, especially the large amount of unstructured text data
on the Web, and the fast development of natural language processing methods
enable new research directions in geospatial semantics. This chapter,
therefore, provides a systematic review on the existing geospatial semantic
research. Six major research areas are identified and discussed, including
semantic interoperability, digital gazetteers, geographic information
retrieval, geospatial Semantic Web, place semantics, and cognitive geographic
concepts.Comment: Yingjie Hu (2017). Geospatial Semantics. In Bo Huang, Thomas J. Cova,
and Ming-Hsiang Tsou et al. (Eds): Comprehensive Geographic Information
Systems, Elsevier. Oxford, U
Automatic reconstruction of itineraries from descriptive texts
Esta tesis se inscribe dentro del marco del proyecto PERDIDO donde los objetivos son la extracción y reconstrucción de itinerarios a partir de documentos textuales. Este trabajo se ha realizado en colaboración entre el laboratorio LIUPPA de l' Université de Pau et des Pays de l' Adour (France), el grupo de Sistemas de Información Avanzados (IAAA) de la Universidad de Zaragoza y el laboratorio COGIT de l' IGN (France). El objetivo de esta tesis es concebir un sistema automático que permita extraer, a partir de guÃas de viaje o descripciones de itinerarios, los desplazamientos, además de representarlos sobre un mapa. Se propone una aproximación para la representación automática de itinerarios descritos en lenguaje natural. Nuestra propuesta se divide en dos tareas principales. La primera pretende identificar y extraer de los textos describiendo itinerarios información como entidades espaciales y expresiones de desplazamiento o percepción. El objetivo de la segunda tarea es la reconstrucción del itinerario. Nuestra propuesta combina información local extraÃda gracias al procesamiento del lenguaje natural con datos extraÃdos de fuentes geográficas externas (por ejemplo, gazetteers). La etapa de anotación de informaciones espaciales se realiza mediante una aproximación que combina el etiquetado morfo-sintáctico y los patrones léxico-sintácticos (cascada de transductores) con el fin de anotar entidades nombradas espaciales y expresiones de desplazamiento y percepción. Una primera contribución a la primera tarea es la desambiguación de topónimos, que es un problema todavÃa mal resuelto dentro del reconocimiento de entidades nombradas (Named Entity Recognition - NER) y esencial en la recuperación de información geográfica. Se plantea un algoritmo no supervisado de georreferenciación basado en una técnica de clustering capaz de proponer una solución para desambiguar los topónimos los topónimos encontrados en recursos geográficos externos, y al mismo tiempo, la localización de topónimos no referenciados. Se propone un modelo de grafo genérico para la reconstrucción automática de itinerarios, donde cada nodo representa un lugar y cada arista representa un camino enlazando dos lugares. La originalidad de nuestro modelo es que además de tener en cuenta los elementos habituales (caminos y puntos del recorrido), permite representar otros elementos involucrados en la descripción de un itinerario, como por ejemplo los puntos de referencia visual. Se calcula de un árbol de recubrimiento mÃnimo a partir de un grafo ponderado para obtener automáticamente un itinerario bajo la forma de un grafo. Cada arista del grafo inicial se pondera mediante un método de análisis multicriterio que combina criterios cualitativos y cuantitativos. El valor de estos criterios se determina a partir de informaciones extraÃdas del texto e informaciones provenientes de recursos geográficos externos. Por ejemplo, se combinan las informaciones generadas por el procesamiento del lenguaje natural como las relaciones espaciales describiendo una orientación (ej: dirigirse hacia el sur) con las coordenadas geográficas de lugares encontrados dentro de los recursos para determinar el valor del criterio ``relación espacial''. Además, a partir de la definición del concepto de itinerario y de las informaciones utilizadas en la lengua para describir un itinerario, se ha modelado un lenguaje de anotación de información espacial adaptado a la descripción de desplazamientos, apoyándonos en las recomendaciones del consorcio TEI (Text Encoding and Interchange). Finalmente, se ha implementado y evaluado las diferentes etapas de nuestra aproximación sobre un corpus multilingüe de descripciones de senderos y excursiones (francés, español, italiano)
- …