11,950 research outputs found
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
Reproducibility of computational studies is a hallmark of scientific
methodology. It enables researchers to build with confidence on the methods and
findings of others, reuse and extend computational pipelines, and thereby drive
scientific progress. Since many experimental studies rely on computational
analyses, biologists need guidance on how to set up and document reproducible
data analyses or simulations.
In this paper, we address several questions about reproducibility. For
example, what are the technical and non-technical barriers to reproducible
computational studies? What opportunities and challenges do computational
notebooks offer to overcome some of these barriers? What tools are available
and how can they be used effectively?
We have developed a set of rules to serve as a guide to scientists with a
specific focus on computational notebook systems, such as Jupyter Notebooks,
which have become a tool of choice for many applications. Notebooks combine
detailed workflows with narrative text and visualization of results. Combined
with software repositories and open source licensing, notebooks are powerful
tools for transparent, collaborative, reproducible, and reusable data analyses
Bioconductor: open software development for computational biology and bioinformatics.
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples
A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data
Many interesting data sets available on the Internet are of a medium
size---too big to fit into a personal computer's memory, but not so large that
they won't fit comfortably on its hard disk. In the coming years, data sets of
this magnitude will inform vital research in a wide array of application
domains. However, due to a variety of constraints they are cumbersome to
ingest, wrangle, analyze, and share in a reproducible fashion. These
obstructions hamper thorough peer-review and thus disrupt the forward progress
of science. We propose a predictable and pipeable framework for R (the
state-of-the-art statistical computing environment) that leverages SQL (the
venerable database architecture and query language) to make reproducible
research on medium data a painless reality.Comment: 30 pages, plus supplementary material
SOUND SOFTWARE: TOWARDS SOFTWARE REUSE IN AUDIO AND MUSIC RESEARCH
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Skills and Knowledge for Data-Intensive Environmental Research.
The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap
Report on the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2)
This technical report records and discusses the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). The report includes a description of the alternative, experimental submission and review process, two workshop keynote presentations, a series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of exploring sustainability; software development experiences; credit & incentives; reproducibility & reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible actions that were proposed and that would lead to potential change. The workshop recognized that reliance on scientific software is pervasive in all areas of world-leading research today. The workshop participants then proceeded to explore different perspectives on the concept of sustainability. Key enablers and barriers of sustainable scientific software were identified from their experiences. In addition, recommendations with new requirements such as software credit files and software prize frameworks were outlined for improving practices in sustainable software engineering. There was also broad consensus that formal training in software development or engineering was rare among the practitioners. Significant strides need to be made in building a sense of community via training in software and technical practices, on increasing their size and scope, and on better integrating them directly into graduate education programs. Finally, journals can define and publish policies to improve reproducibility, whereas reviewers can insist that authors provide sufficient information and access to data and software to allow them reproduce the results in the paper. Hence a list of criteria is compiled for journals to provide to reviewers so as to make it easier to review software submitted for publication as a âSoftware Paper.
BEAT: An Open-Source Web-Based Open-Science Platform
With the increased interest in computational sciences, machine learning (ML),
pattern recognition (PR) and big data, governmental agencies, academia and
manufacturers are overwhelmed by the constant influx of new algorithms and
techniques promising improved performance, generalization and robustness.
Sadly, result reproducibility is often an overlooked feature accompanying
original research publications, competitions and benchmark evaluations. The
main reasons behind such a gap arise from natural complications in research and
development in this area: the distribution of data may be a sensitive issue;
software frameworks are difficult to install and maintain; Test protocols may
involve a potentially large set of intricate steps which are difficult to
handle. Given the raising complexity of research challenges and the constant
increase in data volume, the conditions for achieving reproducible research in
the domain are also increasingly difficult to meet.
To bridge this gap, we built an open platform for research in computational
sciences related to pattern recognition and machine learning, to help on the
development, reproducibility and certification of results obtained in the
field. By making use of such a system, academic, governmental or industrial
organizations enable users to easily and socially develop processing
toolchains, re-use data, algorithms, workflows and compare results from
distinct algorithms and/or parameterizations with minimal effort. This article
presents such a platform and discusses some of its key features, uses and
limitations. We overview a currently operational prototype and provide design
insights.Comment: References to papers published on the platform incorporate
- âŠ