Toward a containerized pipeline for longitudinal analysis of open-source software projects

Amobi, Emmanuel; Higgins, Sean; Laufer, Konstantin; Maliakal, Linette; Meister, Emily; Miller, Allan; Putter, Jean-Luc; Rose, Alex; Synovic, Nicholas; Thiruvathukal, George; Von Hatten, Sophie; Warkentin, Jonathan; Zugschwert, Martin

Toward a containerized pipeline for longitudinal analysis of open-source software projects

Authors: Emmanuel Amobi
Sean Higgins
Konstantin Laufer
Linette Maliakal
Emily Meister
Allan Miller
Jean-Luc Putter
Alex Rose
Nicholas Synovic
George Thiruvathukal
Sophie Von Hatten
Jonathan Warkentin
Martin Zugschwert
Publication date: 1 June 2020
Publisher: Loyola eCommons

Abstract

Trust in open-source software is a cornerstone of scientific progress and a foundation of high-quality public services. Just as standards are integral when judging the efficacy of a novel pharmaceutical compound or determining the spread of a new disease, the software used to make those determinations should be useful, error-free, reliable, performant, and secure. A small bug in an application, library, or framework can lead to economic loss and even loss of life. We rely on software developers to be dynamic and responsive to user review and bug-reporting. Our team developed an open-source modular pipeline to perform empirical investigations of software quality. A key innovation of our approach is to look at projects “from a distance” similar to methods used in climate, e.g. satellite images being used to observe environmental impacts in air quality/rain forests. Instead of looking at language-specific source code features, our pipeline uses a language-agnostic high-level approach to track software quality by focusing on the development process itself, which yields great insight into the processes programmers use to write and maintain their software. Our distributed modular approach to analytics allows the pipeline to be easily extended to support additional metrics in future work. We store extracted data in an embedded SQLite database, which means that analysis can proceed without complex server setup, let alone hosting the software on dedicated servers. Our analytical modules are designed for efficiency, and future runs of our software only collect missing data, supporting the incremental analysis of known, important open-source projects

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Loyola eCommons

oai:ecommons.luc.edu:grs-1013

Last time updated on 23/11/2020