36 research outputs found
FAIR Computational Workflows
Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products.
They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance.
These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right.
This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.Accepted for Data Intelligence special issue: FAIR best practices 2019.
Carole Goble acknowledges funding by BioExcel2 (H2020 823830), IBISBA1.0 (H2020 730976) and EOSCLife (H2020 824087) . Daniel Schober's work was financed by Phenomenal (H2020 654241) at the initiation-phase of this effort, current work in kind contribution. Kristian Peters is funded by the German Network for Bioinformatics Infrastructure (de.NBI)
and acknowledges BMBF funding under grant number 031L0107. Stian Soiland-Reyes is funded by BioExcel2 (H2020 823830). Daniel Garijo, Yolanda Gil, gratefully acknowledge support from DARPA award W911NF-18-1-0027, NIH award 1R01AG059874-01, and NSF award ICER-1740683
U.S. Department of the Interior: Sharing FAIR Data Fairly
Government-produced data are consumed by thousands of scientists, researchers, industries, and students around the world daily, but are often difficult to locate because they are collected and stored in a duplicative state at varying levels of quality, inhibiting their usefulness for data science investigations and analysis. To address these challenges, the United States Department of the Interior bureaus have been implementing FAIR Data Principles into their data sharing strategies since 2016. Differing interpretations of the FAIR Data Principles are leading to data that are not documented uniformly and are not properly integrated for reuse. In order to establish a FAIR baseline, analysis of select datasets is being performed with peer-reviewed FAIR assessment tools. Delphi panels are being conducted with DOI Chief Data Officers and DOI Federal data consumers to gain insights as to how to affordably deliver this data according to the FAIR Data Principles
FAIRsoft - A practical implementation of FAIR principles for research software
Computational tools are increasingly becoming constitutive parts of scientific research, from experimentation and data collection to the dissemination and storage of results. Unfortunately, however, research software is not subjected to the same requirements as other methods of scientific research: being peer-reviewed, being reproducible and allowing one to build upon another’s work. This situation is detrimental to the integrity and advancement of scientific research, leading to computational methods frequently being impossible to reproduce and/or verify [1]. Moreover, they are often opaque, direcly unavailable or impossible to use by others [2]. One step to address this problem could be formulating a set of principles that research software should meet to ensure its quality and sustainability, resembling the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles [3]. The FAIR Data Principles were created to solve similar issues affecting scholarly data, namely great difficulty of sharing and accessibility, and are currently widely recognized accross fileds. We present here FAIRsoft, our initial effort to assess the quality of research software using a FAIR-like framework, as a first step towards its implementation in OpenEBench [4], the ELIXIR benchmarking platform
Unique, Persistent, Resolvable: Identifiers as the foundation of FAIR
The FAIR Principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem. Persistent, globally unique identifiers, resolvable on the Web, and associated with a set of additional descriptive metadata, are foundational to FAIR data. Here we describe some basic principles and exemplars for their design, use and orchestration with other system elements to achieve FAIRness for digital research objects
F*** workflows: when parts of FAIR are missing
The FAIR principles for scientific data (Findable, Accessible, Interoperable,
Reusable) are also relevant to other digital objects such as research software
and scientific workflows that operate on scientific data. The FAIR principles
can be applied to the data being handled by a scientific workflow as well as
the processes, software, and other infrastructure which are necessary to
specify and execute a workflow. The FAIR principles were designed as
guidelines, rather than rules, that would allow for differences in standards
for different communities and for different degrees of compliance. There are
many practical considerations which impact the level of FAIR-ness that can
actually be achieved, including policies, traditions, and technologies. Because
of these considerations, obstacles are often encountered during the workflow
lifecycle that trace directly to shortcomings in the implementation of the FAIR
principles. Here, we detail some cases, without naming names, in which data and
workflows were Findable but otherwise lacking in areas commonly needed and
expected by modern FAIR methods, tools, and users. We describe how some of
these problems, all of which were overcome successfully, have motivated us to
push on systems and approaches for fully FAIR workflows.Comment: 6 pages, 0 figures, accepted to ERROR 2022 workshop (see
https://error-workshop.org/ for more information), to be published in
proceedings of IEEE eScience 202
A Maturity Model for Operations in Neuroscience Research
Scientists are adopting new approaches to scale up their activities and
goals. Progress in neurotechnologies, artificial intelligence, automation, and
tools for collaboration promises new bursts of discoveries. However, compared
to other disciplines and the industry, neuroscience laboratories have been slow
to adopt key technologies to support collaboration, reproducibility, and
automation. Drawing on progress in other fields, we define a roadmap for
implementing automated research workflows for diverse research teams. We
propose establishing a five-level capability maturity model for operations in
neuroscience research. Achieving higher levels of operational maturity requires
new technology-enabled methodologies, which we describe as ``SciOps''. The
maturity model provides guidelines for evaluating and upgrading operations in
multidisciplinary neuroscience teams.Comment: 10 pages, one figur
Normal Tissue Complication Probability (NTCP) Prediction Model for Osteoradionecrosis of the Mandible in Patients With Head and Neck Cancer After Radiation Therapy:Large-Scale Observational Cohort
Purpose: Osteoradionecrosis (ORN) of the mandible represents a severe, debilitating complication of radiation therapy (RT) for head and neck cancer (HNC). At present, no normal tissue complication probability (NTCP) models for risk of ORN exist. The aim of this study was to develop a multivariable clinical/dose-based NTCP model for the prediction of ORN any grade (ORNI-IV) and grade IV (ORNIV) after RT (+/- chemotherapy) in patients with HNC.Methods and Materials: Included patients with HNC were treated with (chemo-)RT between 2005 and 2015. Mandible bone radiation dose-volume parameters and clinical variables (ie, age, sex, tumor site, pre-RT dental extractions, chemotherapy history, postoperative RT, and smoking status) were considered as potential predictors. The patient cohort was randomly divided into a training (70%) and independent test (30%) cohort. Bootstrapped forward variable selection was performed in the training cohort to select the predictors for the NTCP models. Final NTCP model(s) were validated on the holdback test subset.Results: Of 1259 included patients with HNC, 13.7% (n = 173 patients) developed any grade ORN (ORNI-IV primary endpoint) and 5% (n = 65) ORNIV (secondary endpoint). All dose and volume parameters of the mandible bone were significantly associated with the development of ORN in univariable models. Multivariable analyses identified D30% and pre-RT dental extraction as independent predictors for both ORNI-IV and ORNIV best-performing NTCP models with an area under the curve (AUC) of 0.78 (AUCvalidation = 0.75 [0.69-0.82]) and 0.81 (AUCvalidation = 0.82 [0.74-0.89]), respectively.Conclusions: This study presented NTCP models based on mandible bone D30% and pre-RT dental extraction that predict ORNI-IV and ORNIV (ie, needing invasive surgical intervention) after HNC RT. Our results suggest that less than 30% of the mandible should receive a dose of 35 Gy or more for an ORNI-IV risk lower than 5%. These NTCP models can improve ORN prevention and management by identifying patients at risk of ORN. (C) 2021 The Author(s). Published by Elsevier Inc.</p
Julia as a unifying end-to-end workflow language on the Frontier exascale system
We evaluate Julia as a single language and ecosystem paradigm powered by LLVM
to develop workflow components for high-performance computing. We run a
Gray-Scott, 2-variable diffusion-reaction application using a memory-bound,
7-point stencil kernel on Frontier, the US Department of Energy's first
exascale supercomputer. We evaluate the performance, scaling, and trade-offs of
(i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to
4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the
ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis.
Results suggest that although Julia generates a reasonable LLVM-IR, a nearly
50% performance difference exists vs. native AMD HIP stencil codes when running
on the GPUs. As expected, we observed near-zero overhead when using MPI and
parallel I/O bindings for system-wide installed implementations. Consequently,
Julia emerges as a compelling high-performance and high-productivity workflow
composition language, as measured on the fastest supercomputer in the world.Comment: 11 pages, 8 figures, accepted at the 18th Workshop on Workflows in
Support of Large-Scale Science (WORKS23), IEEE/ACM The International
Conference for High Performance Computing, Networking, Storage, and Analysis,
SC2