15,788 research outputs found

    A unified framework for managing provenance information in translational research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A critical aspect of the NIH <it>Translational Research </it>roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the <it>provenance </it>metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.</p> <p>Results</p> <p>We identify a common set of challenges in managing provenance information across the <it>pre-publication </it>and <it>post-publication </it>phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:</p> <p>(a) Provenance <b>collection </b>- during data generation</p> <p>(b) Provenance <b>representation </b>- to support interoperability, reasoning, and incorporate domain semantics</p> <p>(c) Provenance <b>storage </b>and <b>propagation </b>- to allow efficient storage and seamless propagation of provenance as the data is transferred across applications</p> <p>(d) Provenance <b>query </b>- to support queries with increasing complexity over large data size and also support knowledge discovery applications</p> <p>We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for <it>Trypanosoma cruzi </it>(<it>T.cruzi </it>SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.</p> <p>Conclusions</p> <p>The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis.</p

    A policy language definition for provenance in pervasive computing

    Get PDF
    Recent advances in computing technology have led to the paradigm of pervasive computing, which provides a means of simplifying daily life by integrating information processing into the everyday physical world. Pervasive computing draws its power from knowing the surroundings and creates an environment which combines computing and communication capabilities. Sensors that provide high-resolution spatial and instant measurement are most commonly used for forecasting, monitoring and real-time environmental modelling. Sensor data generated by a sensor network depends on several influences, such as the configuration and location of the sensors or the processing performed on the raw measurements. Storing sufficient metadata that gives meaning to the recorded observation is important in order to draw accurate conclusions or to enhance the reliability of the result dataset that uses this automatically collected data. This kind of metadata is called provenance data, as the origin of the data and the process by which it arrived from its origin are recorded. Provenance is still an exploratory field in pervasive computing and many open research questions are yet to emerge. The context information and the different characteristics of the pervasive environment call for different approaches to a provenance support system. This work implements a policy language definition that specifies the collecting model for provenance management systems and addresses the challenges that arise with stream data and sensor environments. The structure graph of the proposed model is mapped to the Open Provenance Model in order to facilitating the sharing of provenance data and interoperability with other systems. As provenance security has been recognized as one of the most important components in any provenance system, an access control language has been developed that is tailored to support the special requirements of provenance: fine-grained polices, privacy policies and preferences. Experimental evaluation findings show a reasonable overhead for provenance collecting and a reasonable time for provenance query performance, while a numerical analysis was used to evaluate the storage overhead

    The DAMES Metadata Approach

    Get PDF
    The DAMES project will provide high quality data management activities services to the social science research community based on an e-social science infrastructure. The infrastructure is supported by the collection and use of metadata to describe datasets and other social science resources. This report reviews the metadata requirements of the DAMES services, reviews a number of metadata standards, and discusses how the selected standards can be used to support the DAMES services. The kinds of metadata focussed upon in this report include metadata for describing social science microdatasets and other resources such as data analysis processing instruction files, metadata for grouping and linking datasets, and metadata for describing the provenance of data as it is transformed through analytical procedures. The social science metadata standards reviewed include: • The Common Warehouse Metamodel (CWM) • The Data Documentation Initiative (DDI) versions 2 and 3 • Dublin Core • Encoded Archival Description (EAD) • e-Government Metadata Standard (e-GMS) • ELSST and HASSET • MAchine-Readable Cataloging (MARC) • Metadata Encoding and Transmission Standard (METS) • MetaDater • Open Archives Initiative (OAI) • Open Archival Information System (OAIS) • Statistical Data and Metadata Exchange (SDMX) • Text Encoding Initiative (TEI) The review concludes that the DDI standard version 3.0 is the most appropriate one to be used in the DAMES project and explains how best to integrate the standard into the project. This includes a description of how to capture metadata upon resource registration, upgrade the metadata from accessible resources available throughthe GEODE project, use the metadata for resource discovery, and generate provenance metadata during data transformation procedures. In addition, a “metadata wizard” is described to help with data management activities

    Embedding Analytics within the Curation of Scientific Workflows

    Get PDF
    This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis1. Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow. These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse. &nbsp

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Querying and managing opm-compliant scientific workflow provenance

    Get PDF
    Provenance, the metadata that records the derivation history of scientific results, is important in scientific workflows to interpret, validate, and analyze the result of scientific computing. Recently, to promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) has been proposed and has played an important role in the community. In this dissertation, to efficiently query and manage OPM-compliant provenance, we first propose a provenance collection framework that collects both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation and retrospective provenance, which captures past workflow execution and data derivation information. We then propose a relational database-based provenance system, called OPMPROV that stores, reasons, and queries prospective and retrospective provenance, which is OPM-compliant provenance. We finally propose OPQL, an OPM-level provenance query language, that is directly defined over the OPM model. An OPQL query takes an OPM graph as input and produces an OPM graph as output; therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our provenance store, provenance collection framework, and provenance query language feature the native support of the OPM model

    NFDI MatWerk / Materials Data Infrastructure

    Get PDF
    The German National Research Data Infrastructure (NFDI) aims to systematically develop sustainably secure and make accessible the data holdings of science and research. It is being established as a networked structure of consortia acting on their own initiative. In NFDI-MatWerk, a reliable digital platform for the materials and nanosciences is being established, which enables the digital representation of materials data and specific metadata. Within NFDI-MatWerk the Task Area Materials Data Infrastructure will provide services to easily store, share, search, and analyze data and metadata while ensuring data integrity, provenance, and authorship. The concept of FAIR Digital Objects, developed in the Research Data Alliance and in the FAIR Data Commons of HMC, will be utilized to represent data objects. Data sets and metadata documents will be stored in research data repositories and metadata repositories, respectively. Metadata is one of the key elements to implement both human-readable as well as machine-actionable representations of materials-related information. Additional services will be provided for metadata enrichment and annotation, harvesting and indexing, as well as for documenting the provenance of the data objects. Collections of FAIR Digital Objects will be fed into a knowledge graph based on relevant Materials Science and Engineering ontologies connecting materials information and data. Web front-ends will provide access to data, optimized for the particular perspectives of the user groups. Support and training will be provided for the use as well as the operation of the Materials Data Infrastructure services and tools. First adopters of the research data and metadata infrastructures are participant projects providing data sets from various fields that will be transformed into exemplary reference data sets. This research has been supported by the research program ‘Engineering Digital Futures’ of the Helmholtz Association of German Research Centers, the Helmholtz Metadata Collaboration (HMC) Platform, the German National Research Data Infrastructure (NFDI), and the German Research Foundation (DFG)

    Modeling domain metadata beyond metadata standards

    Get PDF
    The Laser Interferometer Gravitational-wave Observatory (LIGO) project to detect gravitational waves represents a complex, distributed scientific endeavor posing specific challenges for reproducibility and data management. The integration of provenance and other metadata information into the workflow stands as one means of addressing such challenges. The goal of a metadata model for the LIGO workflow is the provision of metadata describing all the data products at each significant milestone in the data analysis pipeline. Given both the highly specific domain and the need to support current analysis tools, the development of such a model demands a more complex, comprehensive approach. For this reason, we pursued a multipronged approach to metadata modeling, gathering users’ conceptions, system information, research artifacts, and other organizational documents, and worked to combine the findings into one final model. This approach provided a thorough understanding of the overall research lifecycle and insight into scientific workflow metadata modeling
    corecore