Search CORE

9 research outputs found

The role of geographic knowledge in sub-city level geolocation

Author: Bertolotto Michela
Buscaldi Davide
CATANIA BARBARA
DI ROCCO LAURA
GUERRINI GIOVANNA
Publication venue
Publication date: 01/01/2019
Field of study

Geolocation of microblog messages has been largely investigated in the literature. Many solutions have been proposed that achieve good results at the city level. Existing approaches are mainly data-driven (i.e., they rely on a training phase). However, the development of algorithms for geolocation at sub-city level is still an open problem. In this paper, we investigate the role that external geographic knowledge can play in geolocation approaches. We show how different geographical data sources can be combined with a semantic layer within a knowledge base to achieve reasonably accurate sub-city level geolocation

Crossref

Archivio istituzionale della ricerca - Università di Genova

HAL-Paris 13

A Coherent Unsupervised Model for Toponym Resolution

Author: Kamalloo Ehsan
Rafiei Davood
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Toponym Resolution, the task of assigning a location mention in a document to a geographic referent (i.e., latitude/longitude), plays a pivotal role in analyzing location-aware content. However, the ambiguities of natural language and a huge number of possible interpretations for toponyms constitute insurmountable hurdles for this task. In this paper, we study the problem of toponym resolution with no additional information other than a gazetteer and no training data. We demonstrate that a dearth of large enough annotated data makes supervised methods less capable of generalizing. Our proposed method estimates the geographic scope of documents and leverages the connections between nearby place names as evidence to resolve toponyms. We explore the interactions between multiple interpretations of mentions and the relationships between different toponyms in a document to build a model that finds the most coherent resolution. Our model is evaluated on three news corpora, two from the literature and one collected and annotated by us; then, we compare our methods to the state-of-the-art unsupervised and supervised techniques. We also examine three commercial products including Reuters OpenCalais, Yahoo! YQL Placemaker, and Google Cloud Natural Language API. The evaluation shows that our method outperforms the unsupervised technique as well as Reuters OpenCalais and Google Cloud Natural Language API on all three corpora; also, our method shows a performance close to that of the state-of-the-art supervised method and outperforms it when the test data has 40% or more toponyms that are not seen in the training data.Comment: 9 pages (+1 page reference), WWW '18 Proceedings of the 2018 World Wide Web Conferenc

arXiv.org e-Print Archive

Crossref

A Comparison of Cartographic and Toponymic Databases in a Multilingual Environment: A Methodology for Detecting Redundancies Using ETL and GIS Tools

Author: Amaro Mellado José Lázaro
Mitxelena Hoyos Oihana
Publication venue: 'MDPI AG'
Publication date: 01/02/2023
Field of study

Toponymy, a transversal discipline for geography, linguistics, and history, finds one of its main supports in cartography. Due to exhaustiveness on the territory, cadastral cartography and its toponymy have the ideal characteristics to develop systematic geographical analyses. Moreover, cadastre and geographical names are part of the geographic reference data according to Annex 1 of the INSPIRE directive. This work presents the design, implementation, and application of a methodology based on Geographic Information Systems and Extract, Transform, and Load (ETL) tools for detecting coincidences between the cadastral geoinformation and the official gazetteer corresponding to the province of Gipuzkoa, Spain. Methodologically, this study proposes a solution to the issues raised by bilingualism in the study area. This problem is approached a priori, in the previous data treatment, and a posteriori, applying semantic criteria. The results show a match between the datasets of close to 40%. In this way, the uniqueness and richness of the analyzed source and its outstanding contribution to the potential integration of the official toponymic corpus are evidenced

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

idUS. Depósito de Investigación Universidad de Sevilla

Geocoding location expressions in Twitter messages: A preference learning method

Author: Gelernter Judith
Zhang Wei
Publication venue: DigitalCommons@UMaine
Publication date: 15/12/2014
Field of study

Resolving location expressions in text to the correct physical location, also known as geocoding or grounding, is complicated by the fact that so many places around the world share the same name. Correct resolution is made even more difficult when there is little context to determine which place is intended, as in a 140-character Twitter message, or when location cues from different sources conflict, as may be the case among different metadata fields of a Twitter message. We used supervised machine learning to weigh the different fields of the Twitter message and the features of a world gazetteer to create a model that will prefer the correct gazetteer candidate to resolve the extracted expression. We evaluated our model using the F1 measure and compared it to similar algorithms. Our method achieved results higher than state-of-the-art competitors

Crossref

University of Maine

Elucidating the Role of Neighborhood Deprivation in Hypertensive Disorders of Pregnancy

Author: Winter Kelly M
Publication venue: FIU Digital Commons
Publication date: 01/01/2018
Field of study

This dissertation examined risk factors for hypertensive disorders of pregnancy (HDP) — specifically whether neighborhood socioeconomic deprivation exacerbates individual socioeconomic disadvantage (deprivation amplification) to increase the likelihood of developing HDP. To select the optimal areal unit at which to investigate HDP, geographic proxies for neighborhoods were explored. A thematic review qualitatively examined nontraditional neighborhood boundaries identified through internet sources. Data from 2008–2012 Miami-Dade County, Florida birth records (n=121,421) and the U.S. Census Bureau were used for the remaining analyses. Ordinary least squares (OLS) and geographically weighted regression (GWR) analysis empirically compared the proportion of HDP prevalence explained by six areal units: census block groups, census tracts, ZIP code tabulation areas (ZCTAs), and three types of natural neighborhood — census units clustered based on an eight-item Neighborhood Deprivation Index. Multilevel logistic regression examined relationships between HDP, neighborhood deprivation, and individual-level factors. Odds ratios (OR) and adjusted odds ratios (aOR) were calculated. The thematic review found 22 potential alternatives to census boundaries developed through techniques such as crowd-sourcing and qualitative research. In the sensitivity analysis, census tracts aggregated at the scale of ZCTAs performed twice as well as any other model (GWR2 = 0.27) and were used as the Aim 3 unit of analysis. In the multilevel logistic regression, HDP was associated with moderate (aOR=1.13; CI: 1.05, 1.21) and high neighborhood deprivation (aOR=1.16; CI: 1.07, 1.26). Compared with mothers with private insurance, uninsured women (aOR=1.69; CI: 1.56, 1.84) and Medicaid recipients (aOR=1.12; CI: 1.05, 1.18) had higher HDP odds. Non-Hispanic Black women’s HDP odds were 1.58 times those of non-Hispanic White women. Cross-level interactions — between neighborhood deprivation and educational attainment and neighborhood deprivation and insurance status — did not reach statistical significance. Private sector neighborhood boundaries hold promise for developing new public health tools. Because they are relatively easy to generate from census data, natural neighborhoods may balance tradition and innovation. While no evidence of deprivation amplification was found, results suggested that individual-level and neighborhood deprivation are HDP risk factors. Interventions that target expectant mothers in deprived neighborhoods — particularly non-Hispanic Black and Hispanic women who lack health insurance — may help reduce HDP prevalence and disparities

DigitalCommons@Florida International University

Automatic gazetteer enrichment with user-geocoded data

Author: Gautam Ganesh
Hamsini Krishnakumar
Judith Gelernter
Wei Zhang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Geographical knowledge resources or gazetteers that are enriched with local information have the potential to add geographic precision to information retrieval. We have identified sources of novel local gazetteer entries in crowd-sourced OpenStreetMap and Wikimapia geotags that include geo-coordinates. We created a fuzzy match algorithm using machine learning (SVM) that checks both for approximate spelling and approximate geocoding in order to find duplicates between the crowd-sourced tags and the gazetteer in effort to absorb those tags that are novel. For each crowd-sourced tag, our algorithm generates candidate matches from the gazetteer and then ranks those candidates based on word form or geographical relations between each tag and gazetteer candidate. We compared a baseline of edit distance for candidate ranking to an SVM-trained candidate ranking model on a city level location tag match task. Experiment results show that the SVM greatly outperforms the baseline

CiteSeerX

Crossref

Ontology-driven urban issues identification from social media.

Author: OLIVEIRA Maxwell Guimarães de.
Publication venue: UFCG
Publication date: 05/06/2018
Field of study

As cidades em todo o mundo enfrentam muitos problemas diretamente relacionados ao espaço urbano, especialmente nos aspectos de infraestrutura. A maioria desses problemas urbanos geralmente afeta a vida de residentes e visitantes. Por exemplo, as pessoas podem relatar um carro estacionado em uma calçada que está forçando os pedestres a andar na via, ou um enorme buraco que está causando congestionamento. Além de estarem relacionados com o espaço urbano, os problemas urbanos geralmente demandam ações das autoridades municipais. Existem diversas Redes Sociais Baseadas em Localização (LBSN, em inglês) no domínio das cidades inteligentes em todo o mundo, onde as pessoas relatam problemas urbanos de forma estruturada e as autoridades locais tomam conhecimento para então solucioná-los. Com o advento das redes sociais como Facebook e Twitter, as pessoas tendem a reclamar de forma não estruturada, esparsa e imprevisível, sendo difícil identificar problemas urbanos eventualmente relatados. Dados de mídia social, especialmente mensagens do Twitter, fotos e check-ins, tem desempenhado um papel importante nas cidades inteligentes. Um problema chave é o desafio de identificar conversas específicas e relevantes ao processar dados crowdsourcing ruidosos. Neste contexto, esta pesquisa investiga métodos computacionais a fim de fornecer uma identificação automatizada de problemas urbanos compartilhados em mídias sociais. A maioria dos trabalhos relacionados depende de classificadores baseados em técnicas de aprendizado de máquina, como SVM, Naïve Bayes e Árvores de Decisão; e enfrentam problemas relacionados à representação do conhecimento semântico, legibilidade humana e capacidade de inferência. Com o objetivo de superar essa lacuna semântica, esta pesquisa investiga a Extração de Informação baseada em ontologias, a partir da perspectiva de problemas urbanos, uma vez que tais problemas podem ser semanticamente interligados em plataformas LBSN. Dessa forma, este trabalho propõe uma ontologia no domínio de Problemas Urbanos (UIDO) para viabilizar a identificação e classificação dos problemas urbanos em uma abordagem automatizada que foca principalmente nas facetas temática e geográfica. Uma avaliação experimental demonstra que o desempenho da abordagem proposta é competitivo com os algoritmos de aprendizado de máquina mais utilizados, quando aplicados a este domínio em particular.The cities worldwide face with many issues directly related to the urban space, especially in the infrastructure aspects. Most of these urban issues generally affect the life of both resident and visitant people. For example, people can report a car parked on a footpath which is forcing pedestrians to walk on the road or a huge pothole that is causing traffic congestion. Besides being related to the urban space, urban issues generally demand actions from city authorities. There are many Location-Based Social Networks (LBSN) in the smart cities domain worldwide where people complain about urban issues in a structured way and local authorities are aware to fix them. With the advent of social networks such as Facebook and Twitter, people tend to complain in an unstructured, sparse and unpredictable way, being difficult to identify urban issues eventually reported. Social media data, especially Twitter messages, photos, and check-ins, have played an important role in the smart cities. A key problem is the challenge in identifying specific and relevant conversations on processing the noisy crowdsourced data. In this context, this research investigates computational methods in order to provide automated identification of urban issues shared in social media streams. Most related work rely on classifiers based on machine learning techniques such as Support Vector Machines (SVM), Naïve Bayes and Decision Trees; and face problems concerning semantic knowledge representation, human readability and inference capability. Aiming at overcoming this semantic gap, this research investigates the ontology-driven Information Extraction (IE) from the perspective of urban issues; as such issues can be semantically linked in LBSN platforms. Therefore, this work proposes an Urban Issues Domain Ontology (UIDO) to enable the identification and classification of urban issues in an automated approach that focuses mainly on the thematic and geographical facets. Experimental evaluation demonstrates the proposed approach performance is competitive with most commonly used machine learning algorithms applied for that particular domain.CNP

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de Teses e Dissertações da Universidade Federal de Campina Grande