40 research outputs found
Software Citation Implementation Challenges
The main output of the FORCE11 Software Citation working group
(https://www.force11.org/group/software-citation-working-group) was a paper on
software citation principles (https://doi.org/10.7717/peerj-cs.86) published in
September 2016. This paper laid out a set of six high-level principles for
software citation (importance, credit and attribution, unique identification,
persistence, accessibility, and specificity) and discussed how they could be
used to implement software citation in the scholarly community. In a series of
talks and other activities, we have promoted software citation using these
increasingly accepted principles. At the time the initial paper was published,
we also provided guidance and examples on how to make software citable, though
we now realize there are unresolved problems with that guidance. The purpose of
this document is to provide an explanation of current issues impacting
scholarly attribution of research software, organize updated implementation
guidance, and identify where best practices and solutions are still needed
Supplemental Information 2: Example dataset description
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets
FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows
Modern epidemiological analyses to understand and combat the spread of
disease depend critically on access to, and use of, data. Rapidly evolving
data, such as data streams changing during a disease outbreak, are particularly
challenging. Data management is further complicated by data being imprecisely
identified when used. Public trust in policy decisions resulting from such
analyses is easily damaged and is often low, with cynicism arising where claims
of "following the science" are made without accompanying evidence. Tracing the
provenance of such decisions back through open software to primary data would
clarify this evidence, enhancing the transparency of the decision-making
process. Here, we demonstrate a Findable, Accessible, Interoperable and
Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that
allows easy annotation of data as they are consumed by analyses, while tracing
the provenance of scientific outputs back through the analytical source code to
data sources. Such a tool provides a mechanism for the public, and fellow
scientists, to better assess the trust that should be placed in scientific
evidence, while allowing scientists to support policy-makers in openly
justifying their decisions. We believe that tools such as this should be
promoted for use across all areas of policy-facing research
The FAIR Guiding Principles for scientific data management and stewardship
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community
Multinational prospective cohort study of rates and risk factors for ventilator-associated pneumonia over 24 years in 42 countries of Asia, Africa, Eastern Europe, Latin America, and the Middle East: Findings of the International Nosocomial Infection Control Consortium (INICC)
Objective: Rates of ventilator-associated pneumonia (VAP) in low- and middle-income countries (LMIC) are several times above those of high-income countries. The objective of this study was to identify risk factors (RFs) for VAP cases in ICUs of LMICs. Design: Prospective cohort study. Setting: This study was conducted across 743 ICUs of 282 hospitals in 144 cities in 42 Asian, African, European, Latin American, and Middle Eastern countries. Participants: The study included patients admitted to ICUs across 24 years. Results: In total, 289,643 patients were followed during 1,951,405 patient days and acquired 8,236 VAPs. We analyzed 10 independent variables. Multiple logistic regression identified the following independent VAP RFs: male sex (adjusted odds ratio [aOR], 1.22; 95% confidence interval [CI], 1.16-1.28; P <.0001); longer length of stay (LOS), which increased the risk 7% per day (aOR, 1.07; 95% CI, 1.07-1.08; P <.0001); mechanical ventilation (MV) utilization ratio (aOR, 1.27; 95% CI, 1.23-1.31; P <.0001); continuous positive airway pressure (CPAP), which was associated with the highest risk (aOR, 13.38; 95% CI, 11.57-15.48; P <.0001)Revisión por pare
The health care and life sciences community profile for dataset descriptions
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets
The Ontology for Biomedical Investigations
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl
The center for expanded data annotation and retrieval
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments