95,983 research outputs found

    Principles for data analysis workflows

    Full text link
    Traditional data science education often omits training on research workflows: the process that moves a scientific investigation from raw data to coherent research question to insightful contribution. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining three phases: the Exploratory, Refinement, and Polishing Phases. Each workflow phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between principles for data-intensive research workflows and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students and current professionals

    Guix-HPC Activity Report 2017–2018

    Get PDF
    Guix-HPC is a collaborative effort to bring reproducible software deployment to scientific workflows and high-performance computing (HPC). Guix-HPC builds upon the GNU Guix software deployment tool and aims to make it a better tool for HPC practitioners and scientists concerned with reproducible research.Guix-HPC was launched in September 2017 as a joint software development project involving three research institutes: Inria, the Max Delbrück Center for Molecular Medicine (MDC), and the Utrecht Bioinformatics Center (UBC). GNU Guix for HPC and reproducible science has received contributions from additional individuals and organizations, including Cray, Inc. and Tourbillion Technology.This report highlights key achievements of Guix-HPC between its launch date in September 2017 and today, February 2019

    Citations for Software: Providing Identification, Access and Recognition for Research Software

    Get PDF
    Software plays a significant role in modern academic research, yet lacks a similarly significant presence in the scholarly record. With increasing interest in promoting reproducible research, curating software as a scholarly resource not only promotes access to these tools, but also provides recognition for the intellectual efforts that go into their development. This work reviews existing standards for identifying, promoting discovery of, and providing credit for software development work. In addition, it shows how these guidelines have been integrated into existing tools and community cultures, and provides recommendations for future software curation efforts.

    The rockerverse : packages and applications for containerisation with R

    Get PDF
    The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-based data processing, and production deployment of services. The variety of applications demonstrates the power of the Rocker Project specifically and containerisation in general. Across the diverse ways to use containers, we identified common themes: reproducible environments, scalability and efficiency, and portability across clouds. We conclude that the current growth and diversification of use cases is likely to continue its positive impact, but see the need for consolidating the Rockerverse ecosystem of packages, developing common practices for applications, and exploring alternative containerisation software

    Reproducible Econometric Research. A Critical Review of the State of the Art.

    Get PDF
    Recent software developments are reviewed from the vantage point of reproducible econometric research. We argue that the emergence of new tools, particularly in the open-source community, have greatly eased the burden of documenting and archiving both empirical and simulation work in econometrics. Some of these tools are highlighted in the discussion of three small replication exercises.Series: Research Report Series / Department of Statistics and Mathematic
    • …
    corecore