20,427 research outputs found

    A Collaborative Approach to Computational Reproducibility

    Full text link
    Although a standard in natural science, reproducibility has been only episodically applied in experimental computer science. Scientific papers often present a large number of tables, plots and pictures that summarize the obtained results, but then loosely describe the steps taken to derive them. Not only can the methods and the implementation be complex, but also their configuration may require setting many parameters and/or depend on particular system configurations. While many researchers recognize the importance of reproducibility, the challenge of making it happen often outweigh the benefits. Fortunately, a plethora of reproducibility solutions have been recently designed and implemented by the community. In particular, packaging tools (e.g., ReproZip) and virtualization tools (e.g., Docker) are promising solutions towards facilitating reproducibility for both authors and reviewers. To address the incentive problem, we have implemented a new publication model for the Reproducibility Section of Information Systems Journal. In this section, authors submit a reproducibility paper that explains in detail the computational assets from a previous published manuscript in Information Systems

    Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

    Full text link
    Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss

    Ten Simple Rules for Reproducible Research in Jupyter Notebooks

    Full text link
    Reproducibility of computational studies is a hallmark of scientific methodology. It enables researchers to build with confidence on the methods and findings of others, reuse and extend computational pipelines, and thereby drive scientific progress. Since many experimental studies rely on computational analyses, biologists need guidance on how to set up and document reproducible data analyses or simulations. In this paper, we address several questions about reproducibility. For example, what are the technical and non-technical barriers to reproducible computational studies? What opportunities and challenges do computational notebooks offer to overcome some of these barriers? What tools are available and how can they be used effectively? We have developed a set of rules to serve as a guide to scientists with a specific focus on computational notebook systems, such as Jupyter Notebooks, which have become a tool of choice for many applications. Notebooks combine detailed workflows with narrative text and visualization of results. Combined with software repositories and open source licensing, notebooks are powerful tools for transparent, collaborative, reproducible, and reusable data analyses

    Hack Weeks as a model for Data Science Education and Collaboration

    Full text link
    Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available at https://github.com/uwescience/HackWeek-Writeu

    SOUND SOFTWARE: TOWARDS SOFTWARE REUSE IN AUDIO AND MUSIC RESEARCH

    Get PDF
    © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
    • …
    corecore