Science is facing a reproducibility crisis. Previous work has proposed
incorporating data analysis replications into classrooms as a potential
solution. However, despite the potential benefits, it is unclear whether this
approach is feasible, and if so, what the involved stakeholders-students,
educators, and scientists-should expect from it. Can students perform a data
analysis replication over the course of a class? What are the costs and
benefits for educators? And how can this solution help benchmark and improve
the state of science?
In the present study, we incorporated data analysis replications in the
project component of the Applied Data Analysis course (CS-401) taught at EPFL
(N=354 students). Here we report pre-registered findings based on surveys
administered throughout the course. First, we demonstrate that students can
replicate previously published scientific papers, most of them qualitatively
and some exactly. We find discrepancies between what students expect of data
analysis replications and what they experience by doing them along with changes
in expectations about reproducibility, which together serve as evidence of
attitude shifts to foster students' critical thinking. Second, we provide
information for educators about how much overhead is needed to incorporate
replications into the classroom and identify concerns that replications bring
as compared to more traditional assignments. Third, we identify tangible
benefits of the in-class data analysis replications for scientific communities,
such as a collection of replication reports and insights about replication
barriers in scientific work that should be avoided going forward.
Overall, we demonstrate that incorporating replication tasks into a large
data science class can increase the reproducibility of scientific work as a
by-product of data science instruction, thus benefiting both science and
students