8 research outputs found

    A Comparison of Cartographic and Toponymic Databases in a Multilingual Environment: A Methodology for Detecting Redundancies Using ETL and GIS Tools

    Get PDF
    Toponymy, a transversal discipline for geography, linguistics, and history, finds one of its main supports in cartography. Due to exhaustiveness on the territory, cadastral cartography and its toponymy have the ideal characteristics to develop systematic geographical analyses. Moreover, cadastre and geographical names are part of the geographic reference data according to Annex 1 of the INSPIRE directive. This work presents the design, implementation, and application of a methodology based on Geographic Information Systems and Extract, Transform, and Load (ETL) tools for detecting coincidences between the cadastral geoinformation and the official gazetteer corresponding to the province of Gipuzkoa, Spain. Methodologically, this study proposes a solution to the issues raised by bilingualism in the study area. This problem is approached a priori, in the previous data treatment, and a posteriori, applying semantic criteria. The results show a match between the datasets of close to 40%. In this way, the uniqueness and richness of the analyzed source and its outstanding contribution to the potential integration of the official toponymic corpus are evidenced

    AN APPLICATION-ORIENTED IMPLEMENTATION OF HEXAGONAL ON-THE-FLY BINNING METRICS FOR CITY-SCALE GEOREFERENCED SOCIAL MEDIA DATA

    Get PDF
    The use of georeferenced social media data (GSMD) for informing municipal policy-making has significant potential, particularly in addressing pressing socio-environmental challenges. Geospatial dashboards have emerged as a powerful tool for knowledge communication and supporting urban sustainability. However, there has been little emphasis on how to display and make GSMD more accessible, partly due to their complex nature. Existing visualization tools lack sophisticated methods, especially for complex urban contexts, and the methodological choice can significantly impact the interpretation of results. In this study, we propose the use of hexagonal binning as an interactive visualization method and assess three different on-the-fly binning metrics for mapping GSMD. We expand the use of the signed chi metric for spatial purposes and apply it in a case study in Bonn, Germany. We evaluate the advantages and disadvantages of the proposed metrics as well as visualizations and highlight the challenges of visualizing GSMD particularly in the context of Instagram. Our findings highlight the importance of using appropriate context-dependent visualization methods when analyzing data at the municipal level

    Accommodations deduplication

    Get PDF
    The problem to address is the accommodations deduplication. The deduplication is a special case of entity resolution (ER) consisting in grouping different representa- tions of the same entity, usually coming from different sources. The deduplication is a complex process that requires several phases, being the most common ones, block- ing and pair resolution. A new phase is introduced in addition to the previous ones, clustering, that was not considered in previous work. We aim to build a framework able to cover the different phases and design a strategy of clustering maximizing the precision with the maximal possible recall

    Accommodations deduplication

    Get PDF
    The problem to address is the accommodations deduplication. The deduplication is a special case of entity resolution (ER) consisting in grouping different representa- tions of the same entity, usually coming from different sources. The deduplication is a complex process that requires several phases, being the most common ones, block- ing and pair resolution. A new phase is introduced in addition to the previous ones, clustering, that was not considered in previous work. We aim to build a framework able to cover the different phases and design a strategy of clustering maximizing the precision with the maximal possible recall

    Multi-Source Spatial Entity Extraction and Linkage

    Get PDF

    Deduplicating a places database

    No full text
    We consider the problem of resolving duplicates in a database of places, where a place is defined as any entity that has a name and a physical location. When other auxiliary at-tributes like phone and full address are not available, dedu-plication based solely on names and approximate location becomes an exceptionally challenging problem that requires both domain knowledge as well an local geographical knowl-edge. For example, the pairs “Newpark Mall Gap Outlet” and “Newpark Mall Sears Outlet ” have a high string simi-larity, but determining that they are different requires the domain knowledge that they represent two different store names in the same mall. Similarly, in most parts of the world, a local business called“Central Park Cafe”might sim-ply be referred to by “Central Park”, except in New York, where the keyword “Cafe ” in the name becomes important to differentiate it from the famous park in the city. In this paper, we present a language model that can encap-sulate both domain knowledge as well as local geographical knowledge. We also present unsupervised techniques that can learn such a model from a database of places. Finally, we present deduplication techniques based on such a model, and we demonstrate, using real datasets, that our techniques are much more effective than simple TF-IDF based models in resolving duplicates. Our techniques are used in produc-tion at Facebook for deduplicating the Places database
    corecore