36 research outputs found

    FAIR Computational Workflows

    Get PDF
    Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.Accepted for Data Intelligence special issue: FAIR best practices 2019. Carole Goble acknowledges funding by BioExcel2 (H2020 823830), IBISBA1.0 (H2020 730976) and EOSCLife (H2020 824087) . Daniel Schober's work was financed by Phenomenal (H2020 654241) at the initiation-phase of this effort, current work in kind contribution. Kristian Peters is funded by the German Network for Bioinformatics Infrastructure (de.NBI) and acknowledges BMBF funding under grant number 031L0107. Stian Soiland-Reyes is funded by BioExcel2 (H2020 823830). Daniel Garijo, Yolanda Gil, gratefully acknowledge support from DARPA award W911NF-18-1-0027, NIH award 1R01AG059874-01, and NSF award ICER-1740683

    U.S. Department of the Interior: Sharing FAIR Data Fairly

    Get PDF
    Government-produced data are consumed by thousands of scientists, researchers, industries, and students around the world daily, but are often difficult to locate because they are collected and stored in a duplicative state at varying levels of quality, inhibiting their usefulness for data science investigations and analysis. To address these challenges, the United States Department of the Interior bureaus have been implementing FAIR Data Principles into their data sharing strategies since 2016. Differing interpretations of the FAIR Data Principles are leading to data that are not documented uniformly and are not properly integrated for reuse. In order to establish a FAIR baseline, analysis of select datasets is being performed with peer-reviewed FAIR assessment tools. Delphi panels are being conducted with DOI Chief Data Officers and DOI Federal data consumers to gain insights as to how to affordably deliver this data according to the FAIR Data Principles

    FAIRsoft - A practical implementation of FAIR principles for research software

    Get PDF
    Computational tools are increasingly becoming constitutive parts of scientific research, from experimentation and data collection to the dissemination and storage of results. Unfortunately, however, research software is not subjected to the same requirements as other methods of scientific research: being peer-reviewed, being reproducible and allowing one to build upon another’s work. This situation is detrimental to the integrity and advancement of scientific research, leading to computational methods frequently being impossible to reproduce and/or verify [1]. Moreover, they are often opaque, direcly unavailable or impossible to use by others [2]. One step to address this problem could be formulating a set of principles that research software should meet to ensure its quality and sustainability, resembling the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles [3]. The FAIR Data Principles were created to solve similar issues affecting scholarly data, namely great difficulty of sharing and accessibility, and are currently widely recognized accross fileds. We present here FAIRsoft, our initial effort to assess the quality of research software using a FAIR-like framework, as a first step towards its implementation in OpenEBench [4], the ELIXIR benchmarking platform

    Unique, Persistent, Resolvable: Identifiers as the foundation of FAIR

    Get PDF
    The FAIR Principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects

    F*** workflows: when parts of FAIR are missing

    Full text link
    The FAIR principles for scientific data (Findable, Accessible, Interoperable, Reusable) are also relevant to other digital objects such as research software and scientific workflows that operate on scientific data. The FAIR principles can be applied to the data being handled by a scientific workflow as well as the processes, software, and other infrastructure which are necessary to specify and execute a workflow. The FAIR principles were designed as guidelines, rather than rules, that would allow for differences in standards for different communities and for different degrees of compliance. There are many practical considerations which impact the level of FAIR-ness that can actually be achieved, including policies, traditions, and technologies. Because of these considerations, obstacles are often encountered during the workflow lifecycle that trace directly to shortcomings in the implementation of the FAIR principles. Here, we detail some cases, without naming names, in which data and workflows were Findable but otherwise lacking in areas commonly needed and expected by modern FAIR methods, tools, and users. We describe how some of these problems, all of which were overcome successfully, have motivated us to push on systems and approaches for fully FAIR workflows.Comment: 6 pages, 0 figures, accepted to ERROR 2022 workshop (see https://error-workshop.org/ for more information), to be published in proceedings of IEEE eScience 202

    A Maturity Model for Operations in Neuroscience Research

    Full text link
    Scientists are adopting new approaches to scale up their activities and goals. Progress in neurotechnologies, artificial intelligence, automation, and tools for collaboration promises new bursts of discoveries. However, compared to other disciplines and the industry, neuroscience laboratories have been slow to adopt key technologies to support collaboration, reproducibility, and automation. Drawing on progress in other fields, we define a roadmap for implementing automated research workflows for diverse research teams. We propose establishing a five-level capability maturity model for operations in neuroscience research. Achieving higher levels of operational maturity requires new technology-enabled methodologies, which we describe as ``SciOps''. The maturity model provides guidelines for evaluating and upgrading operations in multidisciplinary neuroscience teams.Comment: 10 pages, one figur

    Normal Tissue Complication Probability (NTCP) Prediction Model for Osteoradionecrosis of the Mandible in Patients With Head and Neck Cancer After Radiation Therapy:Large-Scale Observational Cohort

    Get PDF
    Purpose: Osteoradionecrosis (ORN) of the mandible represents a severe, debilitating complication of radiation therapy (RT) for head and neck cancer (HNC). At present, no normal tissue complication probability (NTCP) models for risk of ORN exist. The aim of this study was to develop a multivariable clinical/dose-based NTCP model for the prediction of ORN any grade (ORNI-IV) and grade IV (ORNIV) after RT (+/- chemotherapy) in patients with HNC.Methods and Materials: Included patients with HNC were treated with (chemo-)RT between 2005 and 2015. Mandible bone radiation dose-volume parameters and clinical variables (ie, age, sex, tumor site, pre-RT dental extractions, chemotherapy history, postoperative RT, and smoking status) were considered as potential predictors. The patient cohort was randomly divided into a training (70%) and independent test (30%) cohort. Bootstrapped forward variable selection was performed in the training cohort to select the predictors for the NTCP models. Final NTCP model(s) were validated on the holdback test subset.Results: Of 1259 included patients with HNC, 13.7% (n = 173 patients) developed any grade ORN (ORNI-IV primary endpoint) and 5% (n = 65) ORNIV (secondary endpoint). All dose and volume parameters of the mandible bone were significantly associated with the development of ORN in univariable models. Multivariable analyses identified D30% and pre-RT dental extraction as independent predictors for both ORNI-IV and ORNIV best-performing NTCP models with an area under the curve (AUC) of 0.78 (AUCvalidation = 0.75 [0.69-0.82]) and 0.81 (AUCvalidation = 0.82 [0.74-0.89]), respectively.Conclusions: This study presented NTCP models based on mandible bone D30% and pre-RT dental extraction that predict ORNI-IV and ORNIV (ie, needing invasive surgical intervention) after HNC RT. Our results suggest that less than 30% of the mandible should receive a dose of 35 Gy or more for an ORNI-IV risk lower than 5%. These NTCP models can improve ORN prevention and management by identifying patients at risk of ORN. (C) 2021 The Author(s). Published by Elsevier Inc.</p

    Julia as a unifying end-to-end workflow language on the Frontier exascale system

    Full text link
    We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition language, as measured on the fastest supercomputer in the world.Comment: 11 pages, 8 figures, accepted at the 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23), IEEE/ACM The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC2
    corecore