16,208 research outputs found

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    A Geospatial Cyberinfrastructure for Urban Economic Analysis and Spatial Decision-Making

    Get PDF
    abstract: Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the effectiveness of decision-making processes. In this paper, we report on our efforts to design and develop a geospatial cyberinfrastructure (GCI) for urban economic analysis and simulation. This GCI provides an operational graphic user interface, built upon a service-oriented architecture to allow (1) widespread sharing and seamless integration of distributed geospatial data; (2) an effective way to address the uncertainty and positional errors encountered in fusing data from diverse sources; (3) the decomposition of complex planning questions into atomic spatial analysis tasks and the generation of a web service chain to tackle such complex problems; and (4) capturing and representing provenance of geospatial data to trace its flow in the modeling task. The Greater Los Angeles Region serves as the test bed. We expect this work to contribute to effective spatial policy analysis and decision-making through the adoption of advanced GCI and to broaden the application coverage of GCI to include urban economic simulations

    Conflating point of interest (POI) data: A systematic review of matching methods

    Full text link
    Point of interest (POI) data provide digital representations of places in the real world, and have been increasingly used to understand human-place interactions, support urban management, and build smart cities. Many POI datasets have been developed, which often have different geographic coverages, attribute focuses, and data quality. From time to time, researchers may need to conflate two or more POI datasets in order to build a better representation of the places in the study areas. While various POI conflation methods have been developed, there lacks a systematic review, and consequently, it is difficult for researchers new to POI conflation to quickly grasp and use these existing methods. This paper fills such a gap. Following the protocol of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conduct a systematic review by searching through three bibliographic databases using reproducible syntax to identify related studies. We then focus on a main step of POI conflation, i.e., POI matching, and systematically summarize and categorize the identified methods. Current limitations and future opportunities are discussed afterwards. We hope that this review can provide some guidance for researchers interested in conflating POI datasets for their research

    Management and Conflation of Multiple Representations within an Open Federation Platform

    Get PDF
    Building up spatial data infrastructures involves the task of dealing with heterogeneous data sources which often bear inconsistencies and contradictions, respectively. One main reason for those inconsistencies emerges from the fact that one and the same real world phenomenon is often stored in multiple representations within different databases. It is the special goal of this paper to describe how the problems arising from multiple representations can be dealt with in spatial data infrastructures, especially focusing on the concepts that have been developed within the Nexus project of the University of Stuttgart that is implementing an open, federated infrastructure for context-aware applications. A main part of this contribution consists of explaining the efforts which have been conducted in order to solve the conflicts that occur between multiple representations within conflation or merging processes to achieve consolidated views on the underlying data for the applications

    A System for Aligning Geographical Entities from Large Heterogeneous Sources

    Get PDF
    Aligning points of interest (POIs) from heterogeneous geographical data sources is an important task that helps extend map data with information from different datasets. This task poses several challenges, including differences in type hierarchies, labels (different formats, languages, and levels of detail), and deviations in the coordinates. Scalability is another major issue, as global-scale datasets may have tens or hundreds of millions of entities. In this paper, we propose the GeographicaL Entities AligNment (GLEAN) system for efficiently matching large geographical datasets based on spatial partitioning with an adaptable margin. In particular, we introduce a text similarity measure based on the local-context relevance of tokens used in combination with sentence embeddings. We then come up with a scalable type embedding model. Finally, we demonstrate that our proposed system can efficiently handle the alignment of large datasets while improving the quality of alignments using the proposed entity similarity measure

    Acronyms as an integral part of multi–word term recognition - A token of appreciation

    Get PDF
    Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain–specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multi–word terms from a domain–specific corpus. It uses a range of methods to normalize three types of term variation – orthographic, morphological and syntactic variation. Acronyms, which represent a highly productive type of term variation, were not supported. In this study, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. The main contribution of this study is not acronym recognition per se, but rather its integration with other types of term variation into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 percent points, whereas index compression factor increased by 7 percent points. Therefore, evidence suggests that integration of acronyms provides non–trivial improvement of term conflation

    Conceptual and application issues in the implementation of object-oriented GIS

    Get PDF
    The adoption of object-oriented technology for spatial data modeling is becoming a significant trend in GIS. This research explores the concepts of Object-Oriented GIS (OOGIS) and illustrates its versatility in two case studies. OOGIS provides a feature-based, intuitive representation of real world features. The study emphasizes the fundamental concepts of inheritance, polymorphism, and encapsulation in OOGIS and explores schema design, long transactions, and versioning. Further, the study discusses the advantages of OOGIS in the management and analysis of geospatial data. The case studies demonstrate both the conceptual basis of OOGIS and specific functionality including behavior, methods, versioning, long transactions and data locking. OOGIS demonstrates many advantages over the traditional entity-relationship model in database maintenance and functionality

    Automatic Geospatial Data Conflation Using Semantic Web Technologies

    Get PDF
    Duplicate geospatial data collections and maintenance are an extensive problem across Australia government organisations. This research examines how Semantic Web technologies can be used to automate the geospatial data conflation process. The research presents a new approach where generation of OWL ontologies based on output data models and presenting geospatial data as RDF triples serve as the basis for the solution and SWRL rules serve as the core to automate the geospatial data conflation processes
    • …
    corecore