Search CORE

8 research outputs found

Towards capturing data curation provenance using Frictionless Data Package Pipelines [poster]

Author: Kinkade Danie
Schloer Conrad
Shepherd Adam
York Amber
Publication venue
Publication date: 10/10/2018
Field of study

Presented at FORCE2018 Conference, Montreal, Canada, October 10-12, 2018. FORCE: Future of Research Communications and e-ScholarshipAt domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process.NSF #143557

Woods Hole Open Access Server

FigShare

Towards Capturing Provenance of the Data Curation Process at Domain-specific Repositories

Author: Biddle Matt
Copley Nancy
Kinkade Danie
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Wiebe Peter
York Amber
Publication venue: 'MBLWHOI Library'
Publication date: 14/12/2018
Field of study

Presented at AGU Fall Meeting, American Geophysical Union, Washington, D.C., 10 – 14 Dec 2018Data repositories often transform submissions to improve understanding and reuse of data by researchers other than the original submitter. However, scientific workflows built by the data submitters often depend on the original data format. In some cases, this makes the repository’s final data product less useful to the submitter. As a result, these two workable but different versions of the data provide value to two disparate, non-interoperable research communities around what should be a single dataset. Data repositories could bridge these two communities by exposing provenance explaining the transform from original submission to final product. A subsequent benefit of this provenance would be the transparent value-add of domain repository data curation. To improve its data management process efficiency, the Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification defined by the Frictionless Data project (https://frictionlessdata.io). Recently, BCO-DMO has been using the Frictionless Data Package Pipelines Python library (https://github.com/frictionlessdata/datapackage-pipelines) to capture the data curation processing steps that transform original submissions to final data products. Because these processing steps are stored using a declarative language they can be converted to a structured provenance record using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). PROV-O abstracts the Frictionless Data elements of BCO-DMO’s workflow for capturing necessary curation provenance and enables interoperability with other external provenance sources and tools. Users who are familiar with PROV-O or the Frictionless Data Pipelines can use either record to reproduce the final data product in a machine-actionable way. While there may still be some curation steps that cannot be easily automated, this process is a step towards end-to-end reproducible transforms throughout the data curation process. In this presentation, BCO-DMO will demonstrate how Frictionless Data Package Pipelines can be used to capture data curation provenance from original submission to final data product exposing the concrete value-add of domain-specific repositories.NSF #143557

Woods Hole Open Access Server

Capturing Provenance of Data Curation at BCO-DMO

Author: Biddle Matt
Copley Nancy
Haskins Christina
Kinkade Danie
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Soenen Karen
Wiebe Peter
York Amber
Publication venue: 'MBLWHOI Library'
Publication date: 15/05/2020
Field of study

Presented at Data Curation Network, May 15, 2020At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easer for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.NSF #192461

Woods Hole Open Access Server

Capturing Provenance of Data Curation at BCO-DMO

Author: Copley Nancy
Gerlach Dana
Haskins Christina
Kinkade Danie
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Soenen Karen
Wiebe Peter
York Amber
Publication venue: 'MBLWHOI Library'
Publication date: 09/11/2020
Field of study

Presented at USGS Data Management Working Group, 9, November 2020At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easier for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.NSF #192461

Woods Hole Open Access Server

Lowering uncertainty in crude oil measurement by selecting optimized envelope color of a pipeline

Author: Biddle Matt
Copley Nancy
Haskins Christina
Kinkade Danie
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Soenen Karen
Wiebe Peter
York Amber
Publication venue: National Library of Serbia
Publication date: 01/01/2010
Field of study

Presented at OceanObs’19, Honolulu, HI, September 16-20 2019Oceanographic data, when well-documented and stewarded toward preservation, have the potential to accelerate new science and facilitate our understanding of complex natural systems. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) is funded by the NSF to document and manage marine biological, chemical, physical, and biogeochemical data, ensuring their discovery and access, and facilitating their reuse. The task of curating and providing access to research data is a collaborative process, with associated actors and critical activities occurring throughout the data’s life cycle. BCO-DMO supports all phases of the data life cycle and works closely with investigators to ensure open access of well-documented project data and information. Supporting this curation process is a flexible cyberinfrastructure that provides the means for data submission, discovery, and access; ultimately enabling reuse. Based upon community feedback, this infrastructure is undergoing evaluation and improvement to better meet oceanographic research needs. This poster will introduce the repository and describe some of the strategic enhancements coming to BCO-DMO, and presents an opportunity for you to provide feedback on enhancements yet to come. We invite you to think about your own research workflow of searching and accessing new data for research, and to provide your feedback through the poster’s interactive sections. Your input can help BCO-DMO improve its service to the research community.Award(s): NSF #192461

Crossref

Woods Hole Open Access Server

Data Help Desk BCO-DMO Lightning Talk

Author: Biddle Matt
Copley Nancy
Haskins Christina
Kinkade Danie
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Soenen Karen
Wiebe Peter
York Amber
Publication venue: 'MBLWHOI Library'
Publication date: 18/02/2020
Field of study

Presented at Ocean Sciences Meeting (OSM), San Diego, CA, 16 - 21 February 2020BCO-DMO is the Biological and Chemical Oceanography Data Management Office. We help oceanography researchers who are funded by the National Science Foundation’s (NSF's) Division of Ocean Sciences' (OCE) Biological or Chemical Oceanography Sections or the Division of Polar Programs' Antarctic Organisms & Ecosystems Program manage their data, making them accessible over the internet. This lightning talk gives a brief overview of who we are, who we work with, and the types of data we manage.Award(s): NSF #192461

Woods Hole Open Access Server

Biological and Chemical Oceanography Data Management Office: Supporting a New Vision for Adaptive Management of Oceanographic Data [poster]

Author: Gerlach Dana
Heyl Taylor
Kinkade Danie
Nagala Shravani
Newman Sawyer
Rauch Shannon
Saito Mak A.
Schloer Conrad
Shepherd Adam
Soenen Karen
Wiebe Peter
York Amber
Publication venue: 'MBLWHOI Library'
Publication date: 21/06/2022
Field of study

Presented at 2022 OCB Summer Workshop, Woods Hole, MA, 20 - 23, June 2022An unparalleled data catalog of well-documented, interoperable oceanographic data and information, openly accessible to all end-users through an intuitive web-based interface for the purposes of advancing marine research, education, and policy. Conference Website: https://web.whoi.edu/ocb-workshop/NSF #192461

Woods Hole Open Access Server

Heterotetrameric annexin A2/S100A10 (A2t) is essential for oncogenic human papillomavirus trafficking and capsid disassembly, and protects virions from lysosomal degradation

Crossref