Search CORE

11,950 research outputs found

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Author: Altintas Ilkay
Birmingham Amanda
Huang Shih-Cheng
Knight Rob
Moshiri Niema
Nguyen Mai H.
Pérez Fernando
Rose Peter W.
Rosenthal Sara Brin
Rule Adam
Zuniga Cristal
Publication venue
Publication date: 13/10/2018
Field of study

Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with narrative text and visualization of results. Combined with software repositories and open source licensing, notebooks are powerful tools for transparent, collaborative, reproducible, and reusable data analyses

arXiv.org e-Print Archive

eScholarship - University of California

Bioconductor: open software development for computational biology and bioinformatics.

Author: Bates Douglas
Bolstad Ben
Carey Vincent
Dettling Marcel
Dudoit Sandrine
Ellis Byron
Gautier Laurent
Ge Yongchao
Gentleman Robert
Gentry Jeff
Hornik Kurt
Hothorn Torsten
Huber Wolfgang
Iacus Stefano
Irizarry Rafael
Leisch Friedrich
Li Cheng
Maechler Martin
Rossini Anthony
Sawitzki Gunther
Smith Colin
Smyth Gordon
Tierney Luke
Yang Jean
Zhang Jianhua
Publication venue: eScholarship, University of California
Publication date: 01/01/2004
Field of study

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples

Repository for Publications and Research Data

AIR Universita degli studi di Milano

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

ZHAW digitalcollection

Collection Of Biostatistics Research Archive

Online Research Database In Technology

University of Melbourne Institutional Repository

Open science in archaeology

Author: Barton C. Michael
Bates Lynsey A.
Baxter Michael
Bevan Andrew
Bocinsky R. Kyle
Bollwerk Elizabeth A.
Brughmans Tom
Carter Alison K.
Conrad Cyler
Contreras Daniel A.
Costa Stefano
Crema Enrico R.
Daggett Adrianne
Davies Benjamin
Drake B. Lee
Dye Thomas S.
d’Alpoim Guedes Jade
France Phoebe
Fullagar Richard
Giusti Domenico
Graham Shawn
Harris Matthew D.
Hawks John
Health Sebastian
Huffer Damien
Kansa Eric C.
Kansa Sarah Whitcher
Madsen Mark E.
Marwick Ben
Melcher Jennifer
Negre Joan
Neiman Fraser D.
Opitz Rachel
Orton David C.
Przstupa Paulina
Raviele Maria
Riel-Savatore Julien
Riris Philip
Romanowska Iza
Smith Jolene
Strupler Néhémie
Ullah Isaac I.
Van Vlack Hannah G.
VanValkenburgh Nathaniel
Watrall Ethan C.
Webster Chris
Wells Joshua
Winters Judith
Wren Colin D.
Publication venue: 'Society for American Archaeology'
Publication date: 01/09/2017
Field of study

No abstract available

Enlighten

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

Author: Baumer Benjamin S.
Publication venue
Publication date: 23/05/2018
Field of study

Many interesting data sets available on the Internet are of a medium size---too big to fit into a personal computer's memory, but not so large that they won't fit comfortably on its hard disk. In the coming years, data sets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality.Comment: 30 pages, plus supplementary material

arXiv.org e-Print Archive

FigShare

Smith College: Smith ScholarWorks

SOUND SOFTWARE: TOWARDS SOFTWARE REUSE IN AUDIO AND MUSIC RESEARCH

Author: Cannam C
Figueira LA
IEEE
Plumbley MD
Publication venue
Publication date: 01/01/2012
Field of study

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Queen Mary Research Online

Skills and Knowledge for Data-Intensive Environmental Research.

Author: Aukema Juliann
Boettiger Carl
Brun Julien
Budden Amber
Collins Scott
Fernández Denny
Gross Louis
Hampton Stephanie
Hernandez Rebecca
Jones Matthew
Labou Stephanie
Schildhauer Mark
Supp Sarah
Teal Tracy
Wasser Leah
White Ethan
Publication venue: eScholarship, University of California
Publication date: 03/05/2017
Field of study

The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap

Crossref

eScholarship - University of California

Report on the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2)

Author: Adams
Boettiger
Bourque
Chamberlain
Chue Hong
Chue Hong
Clune
Crusoe
Downs
Dubey
Dubey
Gil
Habermann
Hanwell
Hook
Hook
Howison
Howison
Katz
Kelley
Lenhardt
Marker
Patra
Piccolo
Pierce
Rosado de Souza
Slaughter
Venters
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 08/07/2015
Field of study

This technical report records and discusses the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). The report includes a description of the alternative, experimental submission and review process, two workshop keynote presentations, a series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of exploring sustainability; software development experiences; credit & incentives; reproducibility & reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible actions that were proposed and that would lead to potential change. The workshop recognized that reliance on scientific software is pervasive in all areas of world-leading research today. The workshop participants then proceeded to explore different perspectives on the concept of sustainability. Key enablers and barriers of sustainable scientific software were identified from their experiences. In addition, recommendations with new requirements such as software credit files and software prize frameworks were outlined for improving practices in sustainable software engineering. There was also broad consensus that formal training in software development or engineering was rare among the practitioners. Significant strides need to be made in building a sense of community via training in software and technical practices, on increasing their size and scope, and on better integrating them directly into graduate education programs. Finally, journals can define and publish policies to improve reproducibility, whereas reviewers can insist that authors provide sufficient information and access to data and software to allow them reproduce the results in the paper. Hence a list of criteria is compiled for journals to provide to reviewers so as to make it easier to review software submitted for publication as a “Software Paper.

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

University of Huddersfield Repository

BEAT: An Open-Source Web-Based Open-Science Platform

Author: Anjos André
El-Shafey Laurent
Marcel Sébastien
Publication venue
Publication date: 19/04/2017
Field of study

With the increased interest in computational sciences, machine learning (ML), pattern recognition (PR) and big data, governmental agencies, academia and manufacturers are overwhelmed by the constant influx of new algorithms and techniques promising improved performance, generalization and robustness. Sadly, result reproducibility is often an overlooked feature accompanying original research publications, competitions and benchmark evaluations. The main reasons behind such a gap arise from natural complications in research and development in this area: the distribution of data may be a sensitive issue; software frameworks are difficult to install and maintain; Test protocols may involve a potentially large set of intricate steps which are difficult to handle. Given the raising complexity of research challenges and the constant increase in data volume, the conditions for achieving reproducible research in the domain are also increasingly difficult to meet. To bridge this gap, we built an open platform for research in computational sciences related to pattern recognition and machine learning, to help on the development, reproducibility and certification of results obtained in the field. By making use of such a system, academic, governmental or industrial organizations enable users to easily and socially develop processing toolchains, re-use data, algorithms, workflows and compare results from distinct algorithms and/or parameterizations with minimal effort. This article presents such a platform and discusses some of its key features, uses and limitations. We overview a currently operational prototype and provide design insights.Comment: References to papers published on the platform incorporate

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne