31 research outputs found

    BCO-DMO Quick Guide

    Get PDF
    BCO-DMO, a repository funded by the National Science Foundation (NSF), supports the oceanographic research community’s data needs throughout the entire data life cycle. This guide describes the services available from BCO-DMO from proposal to preservation and highlights phases where researchers engage significantly with the office.Curating and providing open access to research data is a collaborative process. This process may be thought of as a life cycle with data passing through various phases. Each phase has its own associated actors, roles, and critical activities. Good data management practices are necessary for all phases, from proposal to preservation.NSF #143557

    Towards Capturing Provenance of the Data Curation Process at Domain-specific Repositories

    Get PDF
    Presented at AGU Fall Meeting, American Geophysical Union, Washington, D.C., 10 – 14 Dec 2018Data repositories often transform submissions to improve understanding and reuse of data by researchers other than the original submitter. However, scientific workflows built by the data submitters often depend on the original data format. In some cases, this makes the repository’s final data product less useful to the submitter. As a result, these two workable but different versions of the data provide value to two disparate, non-interoperable research communities around what should be a single dataset. Data repositories could bridge these two communities by exposing provenance explaining the transform from original submission to final product. A subsequent benefit of this provenance would be the transparent value-add of domain repository data curation. To improve its data management process efficiency, the Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification defined by the Frictionless Data project (https://frictionlessdata.io). Recently, BCO-DMO has been using the Frictionless Data Package Pipelines Python library (https://github.com/frictionlessdata/datapackage-pipelines) to capture the data curation processing steps that transform original submissions to final data products. Because these processing steps are stored using a declarative language they can be converted to a structured provenance record using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). PROV-O abstracts the Frictionless Data elements of BCO-DMO’s workflow for capturing necessary curation provenance and enables interoperability with other external provenance sources and tools. Users who are familiar with PROV-O or the Frictionless Data Pipelines can use either record to reproduce the final data product in a machine-actionable way. While there may still be some curation steps that cannot be easily automated, this process is a step towards end-to-end reproducible transforms throughout the data curation process. In this presentation, BCO-DMO will demonstrate how Frictionless Data Package Pipelines can be used to capture data curation provenance from original submission to final data product exposing the concrete value-add of domain-specific repositories.NSF #143557

    Biological & Chemical Oceanography Data Management Office : a domain-specific repository for oceanographic data from around the world [poster]

    Get PDF
    Presented at AGU Ocean Sciences, 11 - 16 February 2018, Portland, ORThe Biological and Chemical Oceanography Data Management Office (BCO-DMO) is a domain-specific digital data repository that works with investigators funded under the National Science Foundation’s Division of Ocean Sciences and Office of Polar Programs to manage their data free of charge. Data managers work closely with investigators to satisfy their data sharing requirements and to develop comprehensive Data Management Plans, as well as to ensure that their data will be well described with extensive metadata creation. Additionally, BCO-DMO offers tools to find and reuse these high-quality data and metadata packages, and services such as DOI generation for publication and attribution. These resources are free for all to discover, access, and utilize. As a repository embedded in our research community, BCO-DMO is well positioned to offer knowledge and expertise from both domain trained data managers and the scientific community at large. BCO-DMO is currently home to more than 9000 datasets and 900 projects, all of which are or will be submitted for archive at the National Centers for Environmental Information (NCEI). Our data holdings continue to grow, and encompass a wide range of oceanographic research areas, including biological, chemical, physical, and ecological. These data represent cruises and experiments from around the world, and are managed using community best practices, standards, and technologies to ensure accuracy and promote re-use. BCO-DMO is a repository and tool for investigators, offering both ocean science data and resources for data dissemination and publication.NSF #143557

    Capturing Provenance of Data Curation at BCO-DMO

    Get PDF
    Presented at Data Curation Network, May 15, 2020At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easer for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.NSF #192461

    Whole-genome sequencing reveals host factors underlying critical COVID-19

    Get PDF
    Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2–4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes—including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)—in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease

    Whole-genome sequencing reveals host factors underlying critical COVID-19

    Get PDF
    Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2,3,4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes—including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)—in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease

    BCO-DMO's migration to ERDDAP

    No full text
    Presented at Marine Data Cluster, Online, October 24, 2019As a domain specific repository, BCO-DMO supports data stewardship throughout the data lifecycle. One key aspect of that data lifecycle is making data and metadata available online in a variety of file formats. This presentation will walk through BCO-DMO's current data serving system, our migration to ERDDAP, and what that might mean for the future. There will be a focus on the nuts-and-bolts of our migration, the benefits of this activity, and some of the difficulties we've encountered along the way.NSF #143557

    The Data Management Process and Lessons Learned From U.S. GEOTRACES

    No full text
    Presented at BioGEOTRACES-like program planning workshop, National Academies of Sciences, Woods Hole MA, 8 November - 10 November 2018In an effort to explore and develop international community interest for a potential future "Biogeotraces-like" program, a working group of 28 scientists from 9 nations met in Woods Hole in November 2018. The result of this workshop is a new research effort termed "Biogeoscapes". This presentation highlighted data management lessons and recommendations from based on past experience handling data from a similarly-scaled global research project, GEOTRACES.NSF #143557
    corecore