20,427 research outputs found
A Collaborative Approach to Computational Reproducibility
Although a standard in natural science, reproducibility has been only
episodically applied in experimental computer science. Scientific papers often
present a large number of tables, plots and pictures that summarize the
obtained results, but then loosely describe the steps taken to derive them. Not
only can the methods and the implementation be complex, but also their
configuration may require setting many parameters and/or depend on particular
system configurations. While many researchers recognize the importance of
reproducibility, the challenge of making it happen often outweigh the benefits.
Fortunately, a plethora of reproducibility solutions have been recently
designed and implemented by the community. In particular, packaging tools
(e.g., ReproZip) and virtualization tools (e.g., Docker) are promising
solutions towards facilitating reproducibility for both authors and reviewers.
To address the incentive problem, we have implemented a new publication model
for the Reproducibility Section of Information Systems Journal. In this
section, authors submit a reproducibility paper that explains in detail the
computational assets from a previous published manuscript in Information
Systems
Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering
Evaluating the computational reproducibility of data analysis pipelines has
become a critical issue. It is, however, a cumbersome process for analyses that
involve data from large populations of subjects, due to their computational and
storage requirements. We present a method to predict the computational
reproducibility of data analysis pipelines in large population studies. We
formulate the problem as a collaborative filtering process, with constraints on
the construction of the training set. We propose 6 different strategies to
build the training set, which we evaluate on 2 datasets, a synthetic one
modeling a population with a growing number of subject types, and a real one
obtained with neuroinformatics pipelines. Results show that one sampling
method, "Random File Numbers (Uniform)" is able to predict computational
reproducibility with a good accuracy. We also analyze the relevance of
including file and subject biases in the collaborative filtering model. We
conclude that the proposed method is able to speedup reproducibility
evaluations substantially, with a reduced accuracy loss
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
Reproducibility of computational studies is a hallmark of scientific
methodology. It enables researchers to build with confidence on the methods and
findings of others, reuse and extend computational pipelines, and thereby drive
scientific progress. Since many experimental studies rely on computational
analyses, biologists need guidance on how to set up and document reproducible
data analyses or simulations.
In this paper, we address several questions about reproducibility. For
example, what are the technical and non-technical barriers to reproducible
computational studies? What opportunities and challenges do computational
notebooks offer to overcome some of these barriers? What tools are available
and how can they be used effectively?
We have developed a set of rules to serve as a guide to scientists with a
specific focus on computational notebook systems, such as Jupyter Notebooks,
which have become a tool of choice for many applications. Notebooks combine
detailed workflows with narrative text and visualization of results. Combined
with software repositories and open source licensing, notebooks are powerful
tools for transparent, collaborative, reproducible, and reusable data analyses
Hack Weeks as a model for Data Science Education and Collaboration
Across almost all scientific disciplines, the instruments that record our
experimental data and the methods required for storage and data analysis are
rapidly increasing in complexity. This gives rise to the need for scientific
communities to adapt on shorter time scales than traditional university
curricula allow for, and therefore requires new modes of knowledge transfer.
The universal applicability of data science tools to a broad range of problems
has generated new opportunities to foster exchange of ideas and computational
workflows across disciplines. In recent years, hack weeks have emerged as an
effective tool for fostering these exchanges by providing training in modern
data analysis workflows. While there are variations in hack week
implementation, all events consist of a common core of three components:
tutorials in state-of-the-art methodology, peer-learning and project work in a
collaborative environment. In this paper, we present the concept of a hack week
in the larger context of scientific meetings and point out similarities and
differences to traditional conferences. We motivate the need for such an event
and present in detail its strengths and challenges. We find that hack weeks are
successful at cultivating collaboration and the exchange of knowledge.
Participants self-report that these events help them both in their day-to-day
research as well as their careers. Based on our results, we conclude that hack
weeks present an effective, easy-to-implement, fairly low-cost tool to
positively impact data analysis literacy in academic disciplines, foster
collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available
at https://github.com/uwescience/HackWeek-Writeu
SOUND SOFTWARE: TOWARDS SOFTWARE REUSE IN AUDIO AND MUSIC RESEARCH
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
- …