Search CORE

592 research outputs found

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

Author: Borbinha José
Manguinhas H.
Martins Bruno
Siabato Vaca Willington Libardo
Publication venue: E.T.S.I. en Topografía, Geodesia y Cartografía (UPM)
Publication date: 01/01/2009
Field of study

In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

Archivo Digital UPM

Named Entity Extraction and Disambiguation: The Reinforcement Effect.

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Entity-Centric Text Mining for Historical Documents

Author: Coll Ardanuy Maria
Publication venue
Publication date: 07/07/2017
Field of study

Georg-August-University Göttingen

Defining and identifying the roles of geographic references within text

Author: Southall Humphrey
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2003
Field of study

Crossref

Portsmouth University Research Portal (Pure)

Annotation of Toponyms in TEI Digital Literary Editions and Linking to the Web of Data

Author: Brando Carmen
Frontini Francesca
Jacquot Clémence
Jolivet Vincent
Riguet Marine
Publication venue: 'Coimbra University Press'
Publication date: 01/07/2016
Field of study

International audienceThis paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym markup , in particular to provide mentions with precise geo-referencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.Este artigo aborda as dificuldades e as vantagens da anotação dos nomes de lugar em textos literários e de crítica literária. Começamos por realçar os problemas de codifi-cação da informação espacial em edições digitais usando o formato TEI, através de duas experiências de anotação manual e da análise de diversos casos. Isto conduzirá à questão de como utilizar os recursos da web semântica para complementar e enrique-cer a marcação de topónimos, em particular com georreferenciação rigorosa. Por último, a anotação automática de um grande corpus irá mostrar o potencial de visuali-zação de locais a partir de textos, ilustrando a análise da evolução da vida literária segundo um ponto de vista espacial e geográfico. Palavras-chave: estudos literários digitais; topónimos; web semântica; bases de dados geográficas; mapas e visualizações

Directory of Open Access Journals

An Algorithm to Extract Jamaican Geographic Locations from News Articles – Using NLP Techniques

Author: Mansingh Gunjan
Wright Jean-Mark
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2014
Field of study

Natural Language Processing (NLP) has long been used to extract information from large bodies of text. NLP is often used to intelligently parse large volumes of data where the manual alternative may be infeasible. Named Entity Recognition (NER) is used to extract named entities such as people, places or organizations from text written in natural language. Using NER, NLP algorithms can be created to extract the mentions of geographic locations of different types from current and archived news articles. This information can be used to add a spatial window into previously flat datasets, allowing users to access information by filtering location information. Information that is derived can be used to support intelligent decision making and influence expert systems. This paper describes the development of an algorithm that uses the principles of both NLP and NER to extract references to geographic locations within news articles. The algorithm has been developed using the NLTK and Pattern Web Toolkit for Python and performs with a precision and accuracy above eighty (80) percent

AIS Electronic Library (AISeL)

Thesauri Design to improve access to Cartographic Heritage in the context of the Spatial Data Infrastructures

Author: Bernabe Poveda Miguel Angel
Fernández Wyttenbach Alberto
Vilches-Blázquez LM.
Álvarez Mabel
Publication venue: E.T.S.I. en Topografía, Geodesia y Cartografía (UPM)
Publication date: 01/01/2008
Field of study

Tecnologías de la Información Geográfica, ontologías, Información Geográfica, interoperabilidad. The access of historians and document management specialists to documentary funds and old cartography is at times arduous due to the scattering of maps throughout different map libraries. Thus, it would be of interest to be able to access all available information remotely, following the Spatial Data Infrastructures (SDI) guidelines. DIGMAP is a cooperative project between six European countries that proposes to develop a solution for indexing, searching and browsing, through a thesaurus, in the European collections of digitized historical maps. It will be possible to match them with the geographic areas covered by each map in the collection and to find it on them, based on standard and open data models. These results will be useful for local digital map libraries, especially if it is of historical – cultural importance, or as interoperable components for wider and distributed systems. This project takes advantage of thesauri technologies to develop three major subsystems: On one hand, the gazetteer for managing and providing access to multilingual information relating to geographic features like geographic coordinates, names of places or areas and historical events related with the geographic points, places or areas (with dates or time intervals); The authority file to maintain and provide easy access to the author’s information and identification and disambiguating from similar and duplicate authors; and on the other hand, the map services for providing access to map information in raster and vector formats. The paper presents an early overview on the DIGMAP project, particularly focusing on the multilingual Thesauri aspects with the aim of providing solutions for the cartographic heritage framework, applying the main Spatial Data Infrastructures guidelines. ISO 19112 (Spatial Reference by Geographic Identifiers) will also be considered and its geographic identifiers issues

CiteSeerX

Archivo Digital UPM