173 research outputs found

    W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment

    Get PDF
    Provenance, a metadata component referring to the origin and the processes undertaken to obtain a specific geographic digital feature or product, is crucial to evaluate the quality of spatial information and help in reproducing and replicating geospatial processes. However, the heterogeneity and complexity of the geospatial processes, which can potentially modify part or the complete content of datasets, make evident the necessity for describing geospatial provenance at dataset, feature and attribute levels. This paper presents the application of W3C PROV, which is a generic specification to express provenance records, for representing geospatial data provenance at these different levels. In particular, W3C PROV is applied to feature models, where geospatial phenomena are represented as individual features described with spatial (point, lines, polygons, etc.) and non-spatial (names, measures, etc.) attributes. This paper first analyses the potential for representing geospatial provenance in a distributed environment at the three levels of granularity using ISO 19115 and W3C PROV models. Next, an approach for applying the generic W3C PROV provenance model to the geospatial environment is presented. As a proof of concept, we provide an application of W3C PROV to describe geospatial provenance at the feature and attribute levels. The use case presented consists of a conflation of the U.S. Geological Survey dataset with the National Geospatial-Intelligence Agency dataset. Finally, an example of how to capture the provenance resulting from workflows and chain executions with PROV is also presented. The application uses a web processing service, which enables geospatial processing in a distributed system and allows to capture the provenance information based on the W3C PROV ontology at the feature and attribute levels

    Curating scientific information in knowledge infrastructures

    Get PDF
    Interpreting observational data is a fundamental task in the sciences, specifically in earth and environmental science where observational data are increasingly acquired, curated, and published systematically by environmental research infrastructures. Typically subject to substantial processing, observational data are used by research communities, their research groups and individual scientists, who interpret such primary data for their meaning in the context of research investigations. The result of interpretation is information—meaningful secondary or derived data—about the observed environment. Research infrastructures and research communities are thus essential to evolving uninterpreted observational data to information. In digital form, the classical bearer of information are the commonly known “(elaborated) data products,” for instance maps. In such form, meaning is generally implicit e.g., in map colour coding, and thus largely inaccessible to machines. The systematic acquisition, curation, possible publishing and further processing of information gained in observational data interpretation—as machine readable data and their machine readable meaning—is not common practice among environmental research infrastructures. For a use case in aerosol science, we elucidate these problems and present a Jupyter based prototype infrastructure that exploits a machine learning approach to interpretation and could support a research community in interpreting observational data and, more importantly, in curating and further using resulting information about a studied natural phenomenon. © 2018 The Author(s).Peer reviewe

    Geospatial Workflows and Trust: a Use Case for Provenance

    Get PDF
    At first glance the Astronomer by Vermeer, Tutankhamun’s burial mask, and a geospatial workflow may appear to have nothing in common. However, a commonality exists; each of these items can have a record of provenance detailing their history. Provenance is a record that shows who did what to an object, where this happened, and how and why these actions took place. In relation to the geospatial domain, provenance can be used to track and analyze the changes data has undergone in a workflow, and can facilitate scientific reproducibility. Collecting provenance from geospatial workflows and finding effective ways to use this provenance is an important application. When using geospatial data in a workflow it is important to determine if the data and workflow used are trustworthy. This study examines whether provenance can be collected from a geospatial workflow. Each workflow examined is a use case for a specific type of geospatial problem. In addition to this, the collected provenance is then used to determine workflow trust and content trust for each of the workflows examined in this study. The results of this study determined that provenance can be collected from a geospatial workflow in such a way as to be of use to additional applications, such as provenance interchange. From this collected provenance, content trust and workflow trust can be estimated. The simple workflow had a content trust value of .83 (trustworthy) and a workflow trust value of .44 (untrustworthy). Two additional workflows were examined for content trust and workflow trust. The methods used to calculate content trust and workflow trust could also be expanded to other types of geospatial data and workflows. Future research could include complete automation of the provenance collection and trust calculations, as well as examining additional techniques for deciding trust in relation to workflows

    Geospatial queries on data collection using a common provenance model

    Get PDF
    Altres ajuts: Xavier Pons is the recipient of an ICREA Academia Excellence in Research Grant (2016-2020)Lineage information is the part of the metadata that describes "what", "when", "who", "how", and "where" geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph

    A Geospatial Cyberinfrastructure for Urban Economic Analysis and Spatial Decision-Making

    Get PDF
    abstract: Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the effectiveness of decision-making processes. In this paper, we report on our efforts to design and develop a geospatial cyberinfrastructure (GCI) for urban economic analysis and simulation. This GCI provides an operational graphic user interface, built upon a service-oriented architecture to allow (1) widespread sharing and seamless integration of distributed geospatial data; (2) an effective way to address the uncertainty and positional errors encountered in fusing data from diverse sources; (3) the decomposition of complex planning questions into atomic spatial analysis tasks and the generation of a web service chain to tackle such complex problems; and (4) capturing and representing provenance of geospatial data to trace its flow in the modeling task. The Greater Los Angeles Region serves as the test bed. We expect this work to contribute to effective spatial policy analysis and decision-making through the adoption of advanced GCI and to broaden the application coverage of GCI to include urban economic simulations

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

    Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

    Get PDF
    This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions
    • 

    corecore