592 research outputs found

    A geo-temporal information extraction service for processing descriptive metadata in digital libraries

    Get PDF
    In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

    Named Entity Extraction and Disambiguation: The Reinforcement Effect.

    Get PDF
    Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

    Defining and identifying the roles of geographic references within text

    Get PDF

    Annotation of Toponyms in TEI Digital Literary Editions and Linking to the Web of Data

    Get PDF
    International audienceThis paper aims to discuss the challenges and benefits of the annotation of place names in literary texts and literary criticism. We shall first highlight the problems of encoding spatial information in digital editions using the TEI format by means of two manual annotation experiments and the discussion of various cases. This will lead to the question of how to use existing semantic web resources to complement and enrich toponym markup , in particular to provide mentions with precise geo-referencing. Finally the automatic annotation of a large corpus will show the potential of visualizing places from texts, by illustrating an analysis of the evolution of literary life from the spatial and geographical point of view.Este artigo aborda as dificuldades e as vantagens da anotação dos nomes de lugar em textos literários e de crítica literária. Começamos por realçar os problemas de codifi-cação da informação espacial em edições digitais usando o formato TEI, através de duas experiências de anotação manual e da análise de diversos casos. Isto conduzirá à questão de como utilizar os recursos da web semântica para complementar e enrique-cer a marcação de topónimos, em particular com georreferenciação rigorosa. Por último, a anotação automática de um grande corpus irá mostrar o potencial de visuali-zação de locais a partir de textos, ilustrando a análise da evolução da vida literária segundo um ponto de vista espacial e geográfico. Palavras-chave: estudos literários digitais; topónimos; web semântica; bases de dados geográficas; mapas e visualizações

    An Algorithm to Extract Jamaican Geographic Locations from News Articles – Using NLP Techniques

    Get PDF
    Natural Language Processing (NLP) has long been used to extract information from large bodies of text. NLP is often used to intelligently parse large volumes of data where the manual alternative may be infeasible. Named Entity Recognition (NER) is used to extract named entities such as people, places or organizations from text written in natural language. Using NER, NLP algorithms can be created to extract the mentions of geographic locations of different types from current and archived news articles. This information can be used to add a spatial window into previously flat datasets, allowing users to access information by filtering location information. Information that is derived can be used to support intelligent decision making and influence expert systems. This paper describes the development of an algorithm that uses the principles of both NLP and NER to extract references to geographic locations within news articles. The algorithm has been developed using the NLTK and Pattern Web Toolkit for Python and performs with a precision and accuracy above eighty (80) percent

    Thesauri Design to improve access to Cartographic Heritage in the context of the Spatial Data Infrastructures

    Get PDF
    Tecnologías de la Información Geográfica, ontologías, Información Geográfica, interoperabilidad. The access of historians and document management specialists to documentary funds and old cartography is at times arduous due to the scattering of maps throughout different map libraries. Thus, it would be of interest to be able to access all available information remotely, following the Spatial Data Infrastructures (SDI) guidelines. DIGMAP is a cooperative project between six European countries that proposes to develop a solution for indexing, searching and browsing, through a thesaurus, in the European collections of digitized historical maps. It will be possible to match them with the geographic areas covered by each map in the collection and to find it on them, based on standard and open data models. These results will be useful for local digital map libraries, especially if it is of historical – cultural importance, or as interoperable components for wider and distributed systems. This project takes advantage of thesauri technologies to develop three major subsystems: On one hand, the gazetteer for managing and providing access to multilingual information relating to geographic features like geographic coordinates, names of places or areas and historical events related with the geographic points, places or areas (with dates or time intervals); The authority file to maintain and provide easy access to the author’s information and identification and disambiguating from similar and duplicate authors; and on the other hand, the map services for providing access to map information in raster and vector formats. The paper presents an early overview on the DIGMAP project, particularly focusing on the multilingual Thesauri aspects with the aim of providing solutions for the cartographic heritage framework, applying the main Spatial Data Infrastructures guidelines. ISO 19112 (Spatial Reference by Geographic Identifiers) will also be considered and its geographic identifiers issues
    corecore