Search CORE

118 research outputs found

Toponym matching through deep neural networks

Author: Calado Pável
Martins Bruno
Murrieta-Flores Patricia
Santos Rui
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics

Crossref

Lancaster E-Prints

FigShare

Personal Names Records Detection And Linkage In Unstructured Dutch Text For Anonymisation

Author: Mohamed Rodaina
Publication venue
Publication date: 04/09/2021
Field of study

Pure OAI Repository

Automatic Identification of Addresses: A Systematic Literature Review

Author: Cruz Paula
Painho Marco
Rita Paulo
Vanneschi Leonardo
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

Cruz, P., Vanneschi, L., Painho, M., & Rita, P. (2022). Automatic Identification of Addresses: A Systematic Literature Review. ISPRS International Journal of Geo-Information, 11(1), 1-27. https://doi.org/10.3390/ijgi11010011 -----------------------------------------------------------------------The work by Leonardo Vanneschi, Marco Painho and Paulo Rita was supported by Fundação para a Ciência e a Tecnologia (FCT) within the Project: UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC). The work by Prof. Leonardo Vanneschi was also partially supported by FCT, Portugal, through funding of project AICE (DSAIPA/DS/0113/2019).Address matching continues to play a central role at various levels, through geocoding and data integration from different sources, with a view to promote activities such as urban planning, location-based services, and the construction of databases like those used in census operations. However, the task of address matching continues to face several challenges, such as non-standard or incomplete address records or addresses written in more complex languages. In order to better understand how current limitations can be overcome, this paper conducted a systematic literature review focused on automated approaches to address matching and their evolution across time. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed, resulting in a final set of 41 papers published between 2002 and 2021, the great majority of which are after 2017, with Chinese authors leading the way. The main findings revealed a consistent move from more traditional approaches to deep learning methods based on semantics, encoder-decoder architectures, and attention mechanisms, as well as the very recent adoption of hybrid approaches making an increased use of spatial constraints and entities. The adoption of evolutionary-based approaches and privacy preserving methods stand as some of the research gaps to address in future studies.publishersversionpublishe

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repositório da Universidade Nova de Lisboa

An automated approach for geocoding tabular itineraries

Author: Martins Bruno
Murrieta-Flores Patricia
Santos Rui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/11/2017
Field of study

Historical itineraries, often accessible as lists or tables describing places visited in sequence, are abundant resources and also important objects of study for humanities scholars. This article advances a novel method for automatically geocoding tabular itineraries, combining approximate string matching with a cost optimization algorithm based on dynamic programming. Experiments with a dataset of historical itineraries, with ground-truth geocoding annotations provided by domain experts and leveraging also the GeoNames gazetteer, attest to the effectiveness of the proposed method. The obtained results show that while approximate string matching can already achieve very low median errors, with many toponyms matching exactly against GeoNames entries, the combination with cost optimization can significantly improve results in terms of the average distance towards the correct disambiguations

Crossref

Lancaster E-Prints

Multi-Source Spatial Entity Extraction and Linkage

Author: Isaj Suela
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2021
Field of study

VBN

Geocoding location expressions in Twitter messages: A preference learning method

Author: Gelernter Judith
Zhang Wei
Publication venue: DigitalCommons@UMaine
Publication date: 22/12/2014
Field of study

Resolving location expressions in text to the correct physical location, also known as geocoding or grounding, is complicated by the fact that so many places around the world share the same name. Correct resolution is made even more difficult when there is little context to determine which place is intended, as in a 140-character Twitter message, or when location cues from different sources conflict, as may be the case among different metadata fields of a Twitter message. We used supervised machine learning to weigh the different fields of the Twitter message and the features of a world gazetteer to create a model that will prefer the correct gazetteer candidate to resolve the extracted expression. We evaluated our model using the F1 measure and compared it to similar algorithms. Our method achieved results higher than state-of-the-art competitors

University of Maine

Entity-Centric Text Mining for Historical Documents

Author: Coll Ardanuy Maria
Publication venue
Publication date: 07/07/2017
Field of study

Georg-August-University Göttingen

Geospatial Semantics

Author: Abadi
Abdalla
Abdalla
Adams
Adams
Adams
Adams
Agarwal
Agirre
Alameh
Alani
Allen
Amitay
Ashburner
Athanasis
Auer
Auer
Auer
Aumueller
Baeza-Yates
Baglioni
Ballatore
Ballatore
Battle
Battle
Bellini
Berners-Lee
Bishr
Bishr
Bishr
Bittner
Bizer
Bizer
Bizer
Blei
Bolstad
Bowers
Brauner
Brickley
Brodaric
Brosset
Buitelaar
Buitelaar
Bunescu
Burrough
Buscaldi
Buscaldi
Buscaldi
Carral
Chen
Clarke
Clarke
Cohn
Compton
Couclelis
Couclelis
Couclelis
Cresswell
Cresswell
Cruz
Cruz
Cucerzan
Davies
Di Donato
Droegemeier
Duckham
Egenhofer
Egenhofer
Egenhofer
Egenhofer
Erling
Ermilov
Euzenat
Fallahi
Feng
Fisher
Fonseca
Fonseca
Fonseca
Fonseca
Fonseca
Fox
Frank
Frank
Frank
Frank
Freksa
Freksa
Frontiera
Gangemi
Gangemi
Gangemi
Gao
Gao
Gates
Gelernter
Gelernter
Gelsey
Gey
Gibson
Golledge
Golledge
Goodchild
Goodchild
Goodchild
Goodchild
Goodwin
Grenon
Grothe
Gruber
Guarino
Guarino
Gutierrez
Gärdenfors
Hakimpour
Hart
Harvey
Hastings
Heath
Hess
Hill
Hitzler
Hollenstein
Hu
Hu
Hu
Hu
Hu
Hu
Hu
Hu
Intagorn
Janowicz
Janowicz
Janowicz
Janowicz
Janowicz
Janowicz
Janowicz
Jones
Jones
Jones
Jones
Jones
Jones
Ju
Kennedy
Keßler
Keßler
Keßler
Keßler
Kim
Klippel
Klippel
Krisnadhi
Krisnadhi
Kuhn
Kuhn
Kuhn
Lambrix
Lecun
Lehmann
Leidner
Leidner
Lemmens
Li
Li
Li
Li
Ligozat
Lin
Longley
Lutz
Lutz
Maceachren
Maedche
Mai
Mallenby
Manning
Manning
Mark
Mark
Martins
Mata
Mata-Rivera
McCurley
Mckenzie
Mckenzie
Mckenzie
Mendes
Meyer
Miller
Moncla
MONTELLO
Montello
Montello
Montello
Mostern
Navarrete
Nowak
Ouksel
Overell
Patroumpas
Perry
Perry
Prieto-Díaz
Pundt
Purves
Purves
Pérez
Randell
Raskin
Rattenbury
Renz
Rice
Rodríguez
Rodríguez
Rogers
Rosch
Rosch
Russell
Samal
Sanderson
Sankoff
Scheider
Schlieder
Schuurman
Schuurman
Sehgal
Sen
Shamsfard
Shankar
Shvaiko
Shyu
Silva
Sinha
Smith
Smith
Smith
Sorokine
Southall
Stevens
Stoeckl
Stokes
Sunna
Third
Tomai
Tomai
Tuan
Uryupina
Usery
Vasardani
Visser
Wallgrün
Wang
Wang
Wang
Wang
White
Wiegand
Winter
Winter
Worboys
Yang
Ye
Yue
Zhang
Zhao
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 10/08/2017
Field of study

Geospatial semantics is a broad field that involves a variety of research areas. The term semantics refers to the meaning of things, and is in contrast with the term syntactics. Accordingly, studies on geospatial semantics usually focus on understanding the meaning of geographic entities as well as their counterparts in the cognitive and digital world, such as cognitive geographic concepts and digital gazetteers. Geospatial semantics can also facilitate the design of geographic information systems (GIS) by enhancing the interoperability of distributed systems and developing more intelligent interfaces for user interactions. During the past years, a lot of research has been conducted, approaching geospatial semantics from different perspectives, using a variety of methods, and targeting different problems. Meanwhile, the arrival of big geo data, especially the large amount of unstructured text data on the Web, and the fast development of natural language processing methods enable new research directions in geospatial semantics. This chapter, therefore, provides a systematic review on the existing geospatial semantic research. Six major research areas are identified and discussed, including semantic interoperability, digital gazetteers, geographic information retrieval, geospatial Semantic Web, place semantics, and cognitive geographic concepts.Comment: Yingjie Hu (2017). Geospatial Semantics. In Bo Huang, Thomas J. Cova, and Ming-Hsiang Tsou et al. (Eds): Comprehensive Geographic Information Systems, Elsevier. Oxford, U

arXiv.org e-Print Archive

Crossref

Recommended from our members

Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding

Author: Ju Yiting
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Place name disambiguation, i.e., toponym disambiguation or toponym resolution, is the task of correctly identifying a place from a set of places sharing a common name. It contributes to a variety of tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here I propose a novel approach to the disambiguation of place names from short texts that integrates three models: entity co-occurrence, topic modeling, and word embedding. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. The third model uses word embeddings to uncover the semantic relatedness between places and contexts. I evaluate this approach using a corpus of short texts collected through web scraping, determine the suitable weights for the models, and demonstrate that the combined model, i.e., Things and Strings Model, outperforms benchmark systems such as DBpedia Spotlight, TextRazor, and Open Calais by up to 85% in F-score and 46% in Precision at 1. A web service is built to demonstrate the proposed method and it can be a building block for those applications that need place name recognition and disambiguation

eScholarship - University of California