406 research outputs found

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Development and evaluation of a geographic information retrieval system using fine grained toponyms

    Get PDF
    Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spatial component linked by a spatial relationship. However, evaluation initiatives have often failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial indexes) at retrieving documents from a corpus describing mountaineering expeditions, centred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collection of queries and judgments. The test collection allowed us to demonstrate that a GIR-based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR

    Semantically-Enriched Search Engine for Geoportals: A Case Study with ArcGIS Online

    Full text link
    Many geoportals such as ArcGIS Online are established with the goal of improving geospatial data reusability and achieving intelligent knowledge discovery. However, according to previous research, most of the existing geoportals adopt Lucene-based techniques to achieve their core search functionality, which has a limited ability to capture the user's search intentions. To better understand a user's search intention, query expansion can be used to enrich the user's query by adding semantically similar terms. In the context of geoportals and geographic information retrieval, we advocate the idea of semantically enriching a user's query from both geospatial and thematic perspectives. In the geospatial aspect, we propose to enrich a query by using both place partonomy and distance decay. In terms of the thematic aspect, concept expansion and embedding-based document similarity are used to infer the implicit information hidden in a user's query. This semantic query expansion 1 2 G. Mai et al. framework is implemented as a semantically-enriched search engine using ArcGIS Online as a case study. A benchmark dataset is constructed to evaluate the proposed framework. Our evaluation results show that the proposed semantic query expansion framework is very effective in capturing a user's search intention and significantly outperforms a well-established baseline-Lucene's practical scoring function-with more than 3.0 increments in DCG@K (K=3,5,10).Comment: 18 pages; Accepted to AGILE 2020 as a full paper GitHub Code Repository: https://github.com/gengchenmai/arcgis-online-search-engin

    Geospatial Semantics

    Full text link
    Geospatial semantics is a broad field that involves a variety of research areas. The term semantics refers to the meaning of things, and is in contrast with the term syntactics. Accordingly, studies on geospatial semantics usually focus on understanding the meaning of geographic entities as well as their counterparts in the cognitive and digital world, such as cognitive geographic concepts and digital gazetteers. Geospatial semantics can also facilitate the design of geographic information systems (GIS) by enhancing the interoperability of distributed systems and developing more intelligent interfaces for user interactions. During the past years, a lot of research has been conducted, approaching geospatial semantics from different perspectives, using a variety of methods, and targeting different problems. Meanwhile, the arrival of big geo data, especially the large amount of unstructured text data on the Web, and the fast development of natural language processing methods enable new research directions in geospatial semantics. This chapter, therefore, provides a systematic review on the existing geospatial semantic research. Six major research areas are identified and discussed, including semantic interoperability, digital gazetteers, geographic information retrieval, geospatial Semantic Web, place semantics, and cognitive geographic concepts.Comment: Yingjie Hu (2017). Geospatial Semantics. In Bo Huang, Thomas J. Cova, and Ming-Hsiang Tsou et al. (Eds): Comprehensive Geographic Information Systems, Elsevier. Oxford, U

    Geospatial crowdsourced data fitness analysis for spatial data infrastructure based disaster management actions

    Get PDF
    The reporting of disasters has changed from official media reports to citizen reporters who are at the disaster scene. This kind of crowd based reporting, related to disasters or any other events, is often identified as 'Crowdsourced Data' (CSD). CSD are freely and widely available thanks to the current technological advancements. The quality of CSD is often problematic as it is often created by the citizens of varying skills and backgrounds. CSD is considered unstructured in general, and its quality remains poorly defined. Moreover, the CSD's location availability and the quality of any available locations may be incomplete. The traditional data quality assessment methods and parameters are also often incompatible with the unstructured nature of CSD due to its undocumented nature and missing metadata. Although other research has identified credibility and relevance as possible CSD quality assessment indicators, the available assessment methods for these indicators are still immature. In the 2011 Australian floods, the citizens and disaster management administrators used the Ushahidi Crowd-mapping platform and the Twitter social media platform to extensively communicate flood related information including hazards, evacuations, help services, road closures and property damage. This research designed a CSD quality assessment framework and tested the quality of the 2011 Australian floods' Ushahidi Crowdmap and Twitter data. In particular, it explored a number of aspects namely, location availability and location quality assessment, semantic extraction of hidden location toponyms and the analysis of the credibility and relevance of reports. This research was conducted based on a Design Science (DS) research method which is often utilised in Information Science (IS) based research. Location availability of the Ushahidi Crowdmap and the Twitter data assessed the quality of available locations by comparing three different datasets i.e. Google Maps, OpenStreetMap (OSM) and Queensland Department of Natural Resources and Mines' (QDNRM) road data. Missing locations were semantically extracted using Natural Language Processing (NLP) and gazetteer lookup techniques. The Credibility of Ushahidi Crowdmap dataset was assessed using a naive Bayesian Network (BN) model commonly utilised in spam email detection. CSD relevance was assessed by adapting Geographic Information Retrieval (GIR) relevance assessment techniques which are also utilised in the IT sector. Thematic and geographic relevance were assessed using Term Frequency – Inverse Document Frequency Vector Space Model (TF-IDF VSM) and NLP based on semantic gazetteers. Results of the CSD location comparison showed that the combined use of non-authoritative and authoritative data improved location determination. The semantic location analysis results indicated some improvements of the location availability of the tweets and Crowdmap data; however, the quality of new locations was still uncertain. The results of the credibility analysis revealed that the spam email detection approaches are feasible for CSD credibility detection. However, it was critical to train the model in a controlled environment using structured training including modified training samples. The use of GIR techniques for CSD relevance analysis provided promising results. A separate relevance ranked list of the same CSD data was prepared through manual analysis. The results revealed that the two lists generally agreed which indicated the system's potential to analyse relevance in a similar way to humans. This research showed that the CSD fitness analysis can potentially improve the accuracy, reliability and currency of CSD and may be utilised to fill information gaps available in authoritative sources. The integrated and autonomous CSD qualification framework presented provides a guide for flood disaster first responders and could be adapted to support other forms of emergencies

    Generating approximate region boundaries from heterogeneous spatial information: an evolutionary approach

    Get PDF
    Spatial information takes different forms in different applications, ranging from accurate coordinates in geographic information systems to the qualitative abstractions that are used in artificial intelligence and spatial cognition. As a result, existing spatial information processing techniques tend to be tailored towards one type of spatial information, and cannot readily be extended to cope with the heterogeneity of spatial information that often arises in practice. In applications such as geographic information retrieval, on the other hand, approximate boundaries of spatial regions need to be constructed, using whatever spatial information that can be obtained. Motivated by this observation, we propose a novel methodology for generating spatial scenarios that are compatible with available knowledge. By suitably discretizing space, this task is translated to a combinatorial optimization problem, which is solved using a hybridization of two well-known meta-heuristics: genetic algorithms and ant colony optimization. What results is a flexible method that can cope with both quantitative and qualitative information, and can easily be adapted to the specific needs of specific applications. Experiments with geographic data demonstrate the potential of the approach

    Web-based discovery and dissemination of multidimensional geographic information

    Get PDF
    A spatial data clearinghouse is an electronic facility for searching, viewing, transferring, ordering, advertising, and disseminating spatial data from numerous sources via the Internet. Governments and other institutions have been implementing spatial data clearinghouses to minimise data duplication and thus reduce the cost of spatial data acquisition. Underlying these clearinghouses are geoportals and databases of geospatial metadata.A geoportal is an access point of a spatial data clearinghouse and metadata is data that describes data. The success of a clearinghouse's spatial data discovery system is dependent on its ability to communicate the contents of geospatial metadata by providing both visual and analytical assistancet o a user. The model currently adopted by the geographic information community was inherited from generic information systems and thus to an extent ignores spatial characteristics of geographic data. Consequently, research in Geographic Information Retrieval (GIR) has focussed on spatial aspects of webbased data discovery and acquisition. This thesis considers how the process of GIR from geoportals can be enhanced through multidimensional visualisation served by web-based geographic data sources. An approach is proposed for the presentation of search results in ontology assisted GIR. Also proposed is an approach for the visualisation of multidimensional geographic data from web-based data sources. These approaches are implemented in two prototypes, the Geospatial Database Online Visualisation Environment (GeoDOVE) and the Spatio-Temporal Ontological Relevance Model (STORM). A discussion of their design, implementation and evaluation is presented. The results suggest that ontology-assisted visualisation can improve a user's ability to identify the most relevant multidimensional geographic datasets from a set of search results. Additional results suggest that it is possible to offer the proposed visualisation approaches on existing geoportal frameworks. The implication of the results is that multidimensional visualisation should be considered by the wider geographic information community as an alternative to historic approaches for presenting search results on geoportals, such as the textual ranked list and two-dimensional maps.EThOS - Electronic Theses Online ServiceUniversity of Newcastle upon TyneGBUnited Kingdo

    Investigating behavioural and computational approaches for defining imprecise regions

    Get PDF
    People often communicate with reference to informally agreedplaces, such as “the city centre”. However, views of the spatial extent of such areas may vary, resulting in imprecise regions. We compare perceptions of Sheffield’s City Centre from a street survey to extents derived from various web-based sources. Such automated approaches have advantages of speed, cost and repeatability. We show that footprints from web sources are often in concordance with models derived from more labour-intensive methods. Notable exceptions however were found with sources advertising or selling residential property. Agreement between sources was measured by aggregating them to identify locations of consensus
    • …
    corecore