Search CORE

439 research outputs found

Computational reproducibility of Jupyter notebooks from biomedical publications

Author: Mietchen Daniel
Samuel Sheeba
Publication venue
Publication date: 10/08/2023
Field of study

Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.Comment: arXiv admin note: substantial text overlap with arXiv:2209.0430

arXiv.org e-Print Archive

Notebook articles: towards a transformative publishing experience in nonlinear science

Author: Chandre Cristel
Dubois Jonathan
Publication venue: 'Elsevier BV'
Publication date: 13/02/2021
Field of study

Open Science, Reproducible Research, Findable, Accessible, Interoperable and Reusable (FAIR) data principles are long term goals for scientific dissemination. However, the implementation of these principles calls for a reinspection of our means of dissemination. In our viewpoint, we discuss and advocate, in the context of nonlinear science, how a notebook article represents an essential step toward this objective by fully embracing cloud computing solutions. Notebook articles as scholar articles offer an alternative, efficient and more ethical way to disseminate research through their versatile environment. This format invites the readers to delve deeper into the reported research. Through the interactivity of the notebook articles, research results such as for instance equations and figures are reproducible even for non-expert readers. The codes and methods are available, in a transparent manner, to interested readers. The methods can be reused and adapted to answer additional questions in related topics. The codes run on cloud computing services, which provide easy access, even to low-income countries and research groups. The versatility of this environment provides the stakeholders - from the researchers to the publishers - with opportunities to disseminate the research results in innovative ways.Comment: This article is an editorial viewpoin

arXiv.org e-Print Archive

HAL AMU

MPG.PuRe

The Story of an Open Science Experiment

Author: Samuel S.
Publication venue
Publication date: 01/01/2021
Field of study

MPG.PuRe

Recommended from our members

Ten simple rules for writing Dockerfiles for reproducible data science.

Author: Eglen Stephen J.
Evans Benjamin D.
Head Tim
Hirst Tony
Markel Scott
Marwick Ben
Nüst Daniel
Sochat Vanessa
Publication venue: PLoS Comput Biol
Publication date: 01/01/2020
Field of study

Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow's reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container's image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows

Open Research Online (The Open University)

Apollo (Cambridge)

Explore Bristol Research

HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction

Author: Ahring Jessica
Baumann Peter
Campos Adrian Rojas
Escobar Otoniel José Campos
Gong Bing
Langguth Michael
Mozaffari Amirpasha
Nieters Pascal
Schultz Martin G.
Wittenbrink Martin
Publication venue: MIT Press
Publication date: 03/08/2022
Field of study

Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, research communities are slow to adopt them. Canonical Workflow Framework for Research (CWFR) provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers. This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate, focusing on HPC and big data. Specifically, we discuss Fair Digital Object (FDO) and Research Object (RO) in the DeepRain project to achieve granular reproducibility. DeepRain is a project that aims to improve precipitation forecast in Germany by using ML. Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access. We suggest the Juypter notebook as a single reproducible experiment. In addition, we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface

KITopen

The Stark realities of reproducible statistically orientated sociological research:Some newer rules of the sociological method

Author: Connelly Roxanne
Gayle Vernon
Publication venue: 'SAGE Publications'
Publication date: 09/08/2022
Field of study

Edinburgh Research Explorer

Introducing Reproducibility to Citation Analysis: a Case Study in the Earth Sciences

Author: Teplitzky Samantha
Tranfield Wynn
Warren Mea
White Philip
Publication venue: eScholarship@UMassChan
Publication date: 01/05/2021
Field of study

Objectives: Replicate methods from a 2019 study of Earth Science researcher citation practices. Calculate programmatically whether researchers in Earth Science rely on a smaller subset of literature than estimated by the 80/20 rule. Determine whether these reproducible citation analysis methods can be used to analyze open access uptake. Methods: Replicated methods of a prior citation study provide an updated transparent, reproducible citation analysis protocol that can be replicated with Jupyter Notebooks. Results: This study replicated the prior citation study’s conclusions, and also adapted the author’s methods to analyze the citation practices of Earth Scientists at four institutions. We found that 80% of the citations could be accounted for by only 7.88% of journals, a key metric to help identify a core collection of titles in this discipline. We then demonstrated programmatically that 36% of these cited references were available as open access. Conclusions: Jupyter Notebooks are a viable platform for disseminating replicable processes for citation analysis. A completely open methodology is emerging and we consider this a step forward. Adherence to the 80/20 rule aligned with institutional research output, but citation preferences are evident. Reproducible citation analysis methods may be used to analyze open access uptake, however, results are inconclusive. It is difficult to determine whether an article was open access at the time of citation, or became open access after an embargo

Directory of Open Access Journals

eScholarship@UMMS