875 research outputs found

    Historical collaborative geocoding

    Full text link
    The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

    Developing a global risk engine

    Get PDF
    Risk analysis is a critical link in the reduction of casualties and damages due to earthquakes. Recognition of this relation has led to a rapid rise in demand for accurate, reliable and flexible risk assessment software. However, there is a significant disparity between the high quality scientific data developed by researchers and the availability of versatile, open and user-friendly risk analysis tools to meet the demands of end-users. In the past few years several open-source software have been developed that play an important role in the seismic research, such as OpenSHA and OpenSEES. There is however still a gap when it comes to open-source risk assessment tools and software. In order to fill this gap, the Global Earthquake Model (GEM) has been created. GEM is an internationally sanctioned program initiated by the OECD that aims to build independent, open standards to calculate and communicate earthquake risk around the world. This initiative started with a one-year pilot project named GEM1, during which an evaluation of a number of existing risk software was carried out. After a critical review of the results it was concluded that none of the software were adequate for GEM requirements and therefore, a new object-oriented tool was to be developed. This paper presents a summary of some of the most well known applications used in risk analysis, highlighting the main aspects that were considered for the development of this risk platform. The research that was carried out in order to gather all of the necessary information to build this tool was distributed in four different areas: information technology approach, seismic hazard resources, vulnerability assessment methodologies and sources of exposure data. The main aspects and findings for each of these areas will be presented as well as how these features were incorporated in the up-to-date risk engine. Currently, the risk engine is capable of predicting human or economical losses worldwide considering both deterministic and probabilistic-based events, using vulnerability curves. A first version of GEM will become available at the end of 2013. Until then the risk engine will continue to be developed by a growing community of developers, using a dedicated open-source platform

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    APREGOAR: Development of a geospatial database applied to local news in Lisbon

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceHá informações valiosas em formato de texto não estruturado sobre a localização, calendarização e a essências dos eventos disponíveis no conteúdo de notícias digitais. Vários trabalhos em curso já tentam extrair detalhes de eventos de fontes de notícias digitais, mas muitas vezes não com a nuance necssária para representar com precisão onde as coisas realmente acontecem. Alternativamente, os jornalistas poderiam associar manualmente atributos a eventos descritos nos seus artigos enquanto publicam, melhorando a exatidão e a confiança nestes atributos espaciais e temporais. Estes atributos poderiam então estar imediatamente disponíveis para avaliar a cobertura temática, temporal e espacial do conteúdo de uma agência, bem como melhorar a experiência do utilizador na exploração do conteúdo, fornecendo dimensões adicionais que podem ser filtradas. Embora a tecnologia de atribuição de dimensões geoespaciais e temporais para o emprego de aplicaçãoes voltadas para o consumidor não seja novidade, tem ainda de ser aplicada à escala das notícias. Além disso, a maioria dos sistemas existentes suporta apenas uma definição pontual da localização dos artigos, que pode não representar bem o(s) local(is) real(ais) dos eventos descritos. Este trabalho define uma aplicação web de código aberto e uma base de dados espacial subjacente que suporta i) a associação de múltiplos polígonos a representar o local onde cada evento ocorre, os prazos associados aos eventos, em linha com os atributos temáticos tradicionais associados aos artigos de notícias; ii) a contextualização de cada artigo através da adição de mapas de eventos em linha para esclarecer aos leitores onde os eventos do artigo ocorrem; e iii) a exploração dos corpora adicionados através de filtros temáticos, espaciais e temporais que exibem os resultados em mapas de cobertura interactivos e listas de artigos e eventos. O projeto foi aplicado na área da grande Lisboa de Portugal. Para além da funcionalidade acima referida, este projeto constroi gazetteers progressivos que podem ser reutilizados como associações de lugares, ou para uma meta-análise mais aprofundada do lugar, tal como é percebido coloquialmente. Demonstra a facilidade com que estas dimensões adicionais podem ser incorporadas com grade confiança na precisão da definição, geridas, e alavancadas para melhorar a gestão de conteúdo das agências noticiosas, a compreensão dos leitores, a exploração dos investigadores, ou extraídas para combinação com outros conjuntos dos dados para fornecer conhecimentos adicionais.There is valuable information in unstructured text format about the location, timing, and nature of events available in digital news content. Several ongoing efforts already attempt to extract event details from digital news sources, but often not with the nuance needed to accurately represent the where things actually happen. Alternatively, journalists could manually associate attributes to events described in their articles while publishing, improving accuracy and confidence in these spatial and temporal attributes. These attributes could then be immediately available for evaluating thematic, temporal, and spatial coverage of an agency’s content, as well as improve the user experience of content exploration by providing additional dimensions that can be filtered. Though the technology of assigning geospatial and temporal dimensions for the employ of consumer-facing applications is not novel, it has yet to be applied at scale to the news. Additionally, most existing systems only support a single point definition of article locations, which may not well represent the actual place(s) of events described within. This work defines an open source web application and underlying spatial database that supports i) the association of multiple polygons representing where each event occurs, time frames associated with the events, inline with the traditional thematic attributes associated with news articles; ii) the contextualization of each article via the addition of inline event maps to clarify to readers where the events of the article occur; and iii) the exploration of the added corpora via thematic, spatial, and temporal filters that display results in interactive coverage maps and lists of articles and events. The project was applied to the greater Lisbon area of Portugal. In addition to the above functionality, this project builds progressive gazetteers that can be reused as place associations, or for further meta analysis of place as it is colloquially understood. It demonstrates the ease of which these additional dimensions may be incorporated with a high confidence in definition accuracy, managed, and leveraged to improve news agency content management, reader understanding, researcher exploration, or extracted for combination with other datasets to provide additional insights

    A workflow for geocoding South African addresses

    Get PDF
    There are many industries that have long been utilizing Geographical Information Systems (GIS) for spatial analysis. In many parts of the world, it has gained less popularity because of inaccurate geocoding methods and a lack of data standardization. Commercial services can also be expensive and as such, smaller businesses have been reluctant to make a financial commitment to spatial analytics. This thesis discusses the challenges specific to South Africa as well as the challenges inherent in bad address data. The main goal of this research is to highlight the potential error rates of geocoded user-captured address data and to provide a workflow that can be followed to reduce the error rate without intensive manual data cleansing. We developed a six step workflow and software package to prepare address data for spatial analysis and determine the potential error rate. We used three methods of geocoding: a gazetteer postal code file, a free web API and an international commercial product. To protect the privacy of the clients and the businesses, addresses were aggregated with precision to a postcode or suburb centroid. Geocoding results were analysed before and after each step. Two businesses were analysed, a mid-large scale business with a large structured client address database and a small private business with a 20 year old unstructured client address database. The companies are from two completely different industries, the larger being in the financial industry and the smaller company an independent magazine in publishing

    Suomenkielisen geojäsentimen kehittäminen: kuinka hankkia sijaintitietoa jäsentelemättömistä tekstiaineistoista

    Get PDF
    Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset. Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti. Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista. Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet. Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle.Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored. This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly. Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively. Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer. Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research

    Placenames analysis in historical texts: tools, risks and side effects

    Get PDF
    International audienceThis article presents an approach combining linguistic analysis, geographic information retrieval and visualization in order to go from toponym extraction in historical texts to projection on customizable maps. The toolkit is released under an open source license, it features bootstrapping options, geocod-ing and disambiguation algorithms, as well as cartographic processing. The software setting is designed to be adaptable to various historical contexts, it can be extended by further automatically processed or user-curated gazetteers, used directly on texts or plugged-in on a larger processing pipeline. I provide an example of the issues raised by generic extraction and show the benefits of integrated knowledge-based approach, data cleaning and filtering

    ORÁCULO: Detection of Spatiotemporal Hot Spots of Conflict-Related Events Extracted from Online News Sources

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceAchieving situational awareness in peace operations requires understanding where and when conflict-related activity is most intense. However, the irregular nature of most factions hinders the use of remote sensing, while winning the trust of the host populations to allow the collection of wide-ranging human intelligence is a slow process. Thus, our proposed solution, ORÁCULO, is an information system which detects spatiotemporal hot spots of conflict-related activity by analyzing the patterns of events extracted from online news sources, allowing immediate situational awareness. To do so, it combines a closed-domain supervised event extractor with emerging hot spots analysis of event space-time cubes. The prototype of ORÁCULO was tested on tweets scraped from the Twitter accounts of local and international news sources covering the Central African Republic Civil War, and its test results show that it achieved near state-of-theart event extraction performance, significant overlap with a reference event dataset, and strong correlation with the hot spots space-time cube generated from the reference event dataset, proving the viability of the proposed solution. Future work will focus on improving the event extraction performance and on testing ORÁCULO in cooperation with peacekeeping organizations. Keywords: event extraction, natural language understanding, spatiotemporal analysis, peace operations, open-source intelligence.Atingir e manter a consciência situacional em operações de paz requer o conhecimento de quando e onde é que a atividade relacionada com o conflito é mais intensa. Porém, a natureza irregular da maioria das fações dificulta o uso de deteção remota, e ganhar a confiança das populações para permitir a recolha de informações é um processo moroso. Assim, a nossa solução proposta, ORÁCULO, consiste num sistema de informações que deteta “hot spots” espácio-temporais de atividade relacionada com o conflito através da análise dos padrões de eventos extraídos de fontes noticiosas online, (incluindo redes sociais), permitindo consciência situacional imediata. Nesse sentido, a nossa solução combina um extrator de eventos de domínio limitado baseado em aprendizagem supervisionada com a análise de “hot spots” emergentes de cubos espaçotempo de eventos. O protótipo de ORÁCULO foi testado em tweets recolhidos de fontes noticiosas locais e internacionais que cobrem a Guerra Civil da República Centro- Africana. Os resultados dos seus testes demonstram que foram conseguidos um desempenho de extração de eventos próximo do estado da arte, uma sobreposição significativa com um conjunto de eventos de referência e uma correlação forte com o cubo espaço-tempo de “hot spots” gerado a partir desse conjunto de referência, comprovando a viabilidade da solução proposta. Face aos resultados atingidos, o trabalho futuro focar-se-á em melhorar o desempenho de extração de eventos e em testar o sistema ORÁCULO em cooperação com organizações que conduzam operações paz

    Elucidating the Role of Neighborhood Deprivation in Hypertensive Disorders of Pregnancy

    Get PDF
    This dissertation examined risk factors for hypertensive disorders of pregnancy (HDP) — specifically whether neighborhood socioeconomic deprivation exacerbates individual socioeconomic disadvantage (deprivation amplification) to increase the likelihood of developing HDP. To select the optimal areal unit at which to investigate HDP, geographic proxies for neighborhoods were explored. A thematic review qualitatively examined nontraditional neighborhood boundaries identified through internet sources. Data from 2008–2012 Miami-Dade County, Florida birth records (n=121,421) and the U.S. Census Bureau were used for the remaining analyses. Ordinary least squares (OLS) and geographically weighted regression (GWR) analysis empirically compared the proportion of HDP prevalence explained by six areal units: census block groups, census tracts, ZIP code tabulation areas (ZCTAs), and three types of natural neighborhood — census units clustered based on an eight-item Neighborhood Deprivation Index. Multilevel logistic regression examined relationships between HDP, neighborhood deprivation, and individual-level factors. Odds ratios (OR) and adjusted odds ratios (aOR) were calculated. The thematic review found 22 potential alternatives to census boundaries developed through techniques such as crowd-sourcing and qualitative research. In the sensitivity analysis, census tracts aggregated at the scale of ZCTAs performed twice as well as any other model (GWR2 = 0.27) and were used as the Aim 3 unit of analysis. In the multilevel logistic regression, HDP was associated with moderate (aOR=1.13; CI: 1.05, 1.21) and high neighborhood deprivation (aOR=1.16; CI: 1.07, 1.26). Compared with mothers with private insurance, uninsured women (aOR=1.69; CI: 1.56, 1.84) and Medicaid recipients (aOR=1.12; CI: 1.05, 1.18) had higher HDP odds. Non-Hispanic Black women’s HDP odds were 1.58 times those of non-Hispanic White women. Cross-level interactions — between neighborhood deprivation and educational attainment and neighborhood deprivation and insurance status — did not reach statistical significance. Private sector neighborhood boundaries hold promise for developing new public health tools. Because they are relatively easy to generate from census data, natural neighborhoods may balance tradition and innovation. While no evidence of deprivation amplification was found, results suggested that individual-level and neighborhood deprivation are HDP risk factors. Interventions that target expectant mothers in deprived neighborhoods — particularly non-Hispanic Black and Hispanic women who lack health insurance — may help reduce HDP prevalence and disparities

    Detecting the Boundaries of Urban Areas in India: A Dataset for Pixel-Based Image Classification in Google Earth Engine

    Get PDF
    Urbanization often occurs in an unplanned and uneven manner, resulting in profound changes in patterns of land cover and land use. Understanding these changes is fundamental for devising environmentally responsible approaches to economic development in the rapidly urbanizing countries of the emerging world. One indicator of urbanization is built-up land cover that can be detected and quantified at scale using satellite imagery and cloud-based computational platforms. This process requires reliable and comprehensive ground-truth data for supervised classification and for validation of classification products. We present a new dataset for India, consisting of 21,030 polygons from across the country that were manually classified as “built-up” or “not built-up,” which we use for supervised image classification and detection of urban areas. As a large and geographically diverse country that has been undergoing an urban transition, India represents an ideal context to develop and test approaches for the detection of features related to urbanization. We perform the analysis in Google Earth Engine (GEE) using three types of classifiers, based on imagery from Landsat 7 and Landsat 8 as inputs. The methodology produces high-quality maps of built-up areas across space and time. Although the dataset can facilitate supervised image classification in any platform, we highlight its potential use in GEE for temporal large-scale analysis of the urbanization process. Our methodology can easily be applied to other countries and regions
    corecore