Search CORE

20,427 research outputs found

A Collaborative Approach to Computational Reproducibility

Author: Capone Rebecca
Chirigati Fernando
Freire Juliana
Rampin Remi
Shasha Dennis
Publication venue
Publication date: 01/01/2016
Field of study

Although a standard in natural science, reproducibility has been only episodically applied in experimental computer science. Scientific papers often present a large number of tables, plots and pictures that summarize the obtained results, but then loosely describe the steps taken to derive them. Not only can the methods and the implementation be complex, but also their configuration may require setting many parameters and/or depend on particular system configurations. While many researchers recognize the importance of reproducibility, the challenge of making it happen often outweigh the benefits. Fortunately, a plethora of reproducibility solutions have been recently designed and implemented by the community. In particular, packaging tools (e.g., ReproZip) and virtualization tools (e.g., Docker) are promising solutions towards facilitating reproducibility for both authors and reviewers. To address the incentive problem, we have implemented a new publication model for the Reproducibility Section of Information Systems Journal. In this section, authors submit a reproducibility paper that explains in detail the computational assets from a previous published manuscript in Information Systems

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

FigShare

Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

Author: Barghi Soudabeh
Glatard Tristan
Salari Ali
Scaria Lalet
Publication venue
Publication date: 25/09/2018
Field of study

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss

arXiv.org e-Print Archive

Crossref

Scipedia

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Author: Altintas Ilkay
Birmingham Amanda
Huang Shih-Cheng
Knight Rob
Moshiri Niema
Nguyen Mai H.
Pérez Fernando
Rose Peter W.
Rosenthal Sara Brin
Rule Adam
Zuniga Cristal
Publication venue
Publication date: 13/10/2018
Field of study

Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with narrative text and visualization of results. Combined with software repositories and open source licensing, notebooks are powerful tools for transparent, collaborative, reproducible, and reusable data analyses

arXiv.org e-Print Archive

eScholarship - University of California

Hack Weeks as a model for Data Science Education and Collaboration

Author: Arendt Anthony
Hogg David W.
Huppenkothen Daniela
Ram Karthik
Rokem Ariel
VanderPlas Jake
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 31/10/2017
Field of study

Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available at https://github.com/uwescience/HackWeek-Writeu

arXiv.org e-Print Archive

MPG.PuRe

Open science in archaeology

Author: Barton C. Michael
Bates Lynsey A.
Baxter Michael
Bevan Andrew
Bocinsky R. Kyle
Bollwerk Elizabeth A.
Brughmans Tom
Carter Alison K.
Conrad Cyler
Contreras Daniel A.
Costa Stefano
Crema Enrico R.
Daggett Adrianne
Davies Benjamin
Drake B. Lee
Dye Thomas S.
d’Alpoim Guedes Jade
France Phoebe
Fullagar Richard
Giusti Domenico
Graham Shawn
Harris Matthew D.
Hawks John
Health Sebastian
Huffer Damien
Kansa Eric C.
Kansa Sarah Whitcher
Madsen Mark E.
Marwick Ben
Melcher Jennifer
Negre Joan
Neiman Fraser D.
Opitz Rachel
Orton David C.
Przstupa Paulina
Raviele Maria
Riel-Savatore Julien
Riris Philip
Romanowska Iza
Smith Jolene
Strupler Néhémie
Ullah Isaac I.
Van Vlack Hannah G.
VanValkenburgh Nathaniel
Watrall Ethan C.
Webster Chris
Wells Joshua
Winters Judith
Wren Colin D.
Publication venue: 'Society for American Archaeology'
Publication date: 01/09/2017
Field of study

No abstract available

Enlighten

SOUND SOFTWARE: TOWARDS SOFTWARE REUSE IN AUDIO AND MUSIC RESEARCH

Author: Cannam C
Figueira LA
IEEE
Plumbley MD
Publication venue
Publication date: 01/01/2012
Field of study

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Queen Mary Research Online