1,453 research outputs found
An introduction to Docker for reproducible research, with examples from the R environment
As computational work becomes more and more integral to many aspects of
scientific research, computational reproducibility has become an issue of
increasing importance to computer systems researchers and domain scientists
alike. Though computational reproducibility seems more straight forward than
replicating physical experiments, the complex and rapidly changing nature of
computer environments makes being able to reproduce and extend such work a
serious challenge. In this paper, I explore common reasons that code developed
for one research project cannot be successfully executed or extended by
subsequent researchers. I review current approaches to these issues, including
virtual machines and workflow systems, and their limitations. I then examine
how the popular emerging technology Docker combines several areas from systems
research - such as operating system virtualization, cross-platform portability,
modular re-usable elements, versioning, and a `DevOps' philosophy, to address
these challenges. I illustrate this with several examples of Docker use with a
focus on the R statistical environment
Urban Street Network Analysis in a Computational Notebook
Computational notebooks offer researchers, practitioners, students, and educators the ability to interactively conduct analytics and disseminate reproducible workflows that weave together code, visuals, and narratives. This article explores the potential of computational notebooks in urban analytics and planning, demonstrating their utility through a case study of OSMnx and its tutorials repository. OSMnx is a Python package for working with OpenStreetMap data and modeling, analyzing, and visualizing street networks anywhere in the world. Its official demos and tutorials are distributed as open-source Jupyter notebooks on GitHub. This article showcases this resource by documenting the repository and demonstrating OSMnx interactively through a synoptic tutorial adapted from the repository. It illustrates how to download urban data and model street networks for various study sites, compute network indicators, visualize street centrality, calculate routes, and work with other spatial data such as building footprints and points of interest. Computational notebooks help introduce methods to new users and help researchers reach broader audiences interested in learning from, adapting, and remixing their work. Due to their utility and versatility, the ongoing adoption of computational notebooks in urban planning, analytics, and related geocomputation disciplines should continue into the future
A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data
Many interesting data sets available on the Internet are of a medium
size---too big to fit into a personal computer's memory, but not so large that
they won't fit comfortably on its hard disk. In the coming years, data sets of
this magnitude will inform vital research in a wide array of application
domains. However, due to a variety of constraints they are cumbersome to
ingest, wrangle, analyze, and share in a reproducible fashion. These
obstructions hamper thorough peer-review and thus disrupt the forward progress
of science. We propose a predictable and pipeable framework for R (the
state-of-the-art statistical computing environment) that leverages SQL (the
venerable database architecture and query language) to make reproducible
research on medium data a painless reality.Comment: 30 pages, plus supplementary material
- …