8 research outputs found

    Towards capturing data curation provenance using Frictionless Data Package Pipelines [poster]

    Get PDF
    Presented at FORCE2018 Conference, Montreal, Canada, October 10-12, 2018. FORCE: Future of Research Communications and e-ScholarshipAt domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process.NSF #143557

    Towards Capturing Provenance of the Data Curation Process at Domain-specific Repositories

    Get PDF
    Presented at AGU Fall Meeting, American Geophysical Union, Washington, D.C., 10 – 14 Dec 2018Data repositories often transform submissions to improve understanding and reuse of data by researchers other than the original submitter. However, scientific workflows built by the data submitters often depend on the original data format. In some cases, this makes the repository’s final data product less useful to the submitter. As a result, these two workable but different versions of the data provide value to two disparate, non-interoperable research communities around what should be a single dataset. Data repositories could bridge these two communities by exposing provenance explaining the transform from original submission to final product. A subsequent benefit of this provenance would be the transparent value-add of domain repository data curation. To improve its data management process efficiency, the Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification defined by the Frictionless Data project (https://frictionlessdata.io). Recently, BCO-DMO has been using the Frictionless Data Package Pipelines Python library (https://github.com/frictionlessdata/datapackage-pipelines) to capture the data curation processing steps that transform original submissions to final data products. Because these processing steps are stored using a declarative language they can be converted to a structured provenance record using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). PROV-O abstracts the Frictionless Data elements of BCO-DMO’s workflow for capturing necessary curation provenance and enables interoperability with other external provenance sources and tools. Users who are familiar with PROV-O or the Frictionless Data Pipelines can use either record to reproduce the final data product in a machine-actionable way. While there may still be some curation steps that cannot be easily automated, this process is a step towards end-to-end reproducible transforms throughout the data curation process. In this presentation, BCO-DMO will demonstrate how Frictionless Data Package Pipelines can be used to capture data curation provenance from original submission to final data product exposing the concrete value-add of domain-specific repositories.NSF #143557

    Capturing Provenance of Data Curation at BCO-DMO

    Get PDF
    Presented at Data Curation Network, May 15, 2020At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easer for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.NSF #192461

    Capturing Provenance of Data Curation at BCO-DMO

    Get PDF
    Presented at USGS Data Management Working Group, 9, November 2020At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easier for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.NSF #192461

    Lowering uncertainty in crude oil measurement by selecting optimized envelope color of a pipeline

    Full text link
    Presented at OceanObs’19, Honolulu, HI, September 16-20 2019Oceanographic data, when well-documented and stewarded toward preservation, have the potential to accelerate new science and facilitate our understanding of complex natural systems. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) is funded by the NSF to document and manage marine biological, chemical, physical, and biogeochemical data, ensuring their discovery and access, and facilitating their reuse. The task of curating and providing access to research data is a collaborative process, with associated actors and critical activities occurring throughout the data’s life cycle. BCO-DMO supports all phases of the data life cycle and works closely with investigators to ensure open access of well-documented project data and information. Supporting this curation process is a flexible cyberinfrastructure that provides the means for data submission, discovery, and access; ultimately enabling reuse. Based upon community feedback, this infrastructure is undergoing evaluation and improvement to better meet oceanographic research needs. This poster will introduce the repository and describe some of the strategic enhancements coming to BCO-DMO, and presents an opportunity for you to provide feedback on enhancements yet to come. We invite you to think about your own research workflow of searching and accessing new data for research, and to provide your feedback through the poster’s interactive sections. Your input can help BCO-DMO improve its service to the research community.Award(s): NSF #192461

    Data Help Desk BCO-DMO Lightning Talk

    No full text
    Presented at Ocean Sciences Meeting (OSM), San Diego, CA, 16 - 21 February 2020BCO-DMO is the Biological and Chemical Oceanography Data Management Office. We help oceanography researchers who are funded by the National Science Foundation’s (NSF's) Division of Ocean Sciences' (OCE) Biological or Chemical Oceanography Sections or the Division of Polar Programs' Antarctic Organisms & Ecosystems Program manage their data, making them accessible over the internet. This lightning talk gives a brief overview of who we are, who we work with, and the types of data we manage.Award(s): NSF #192461

    Biological and Chemical Oceanography Data Management Office: Supporting a New Vision for Adaptive Management of Oceanographic Data [poster]

    Get PDF
    Presented at 2022 OCB Summer Workshop, Woods Hole, MA, 20 - 23, June 2022An unparalleled data catalog of well-documented, interoperable oceanographic data and information, openly accessible to all end-users through an intuitive web-based interface for the purposes of advancing marine research, education, and policy. Conference Website: https://web.whoi.edu/ocb-workshop/NSF #192461
    corecore