3 research outputs found

    Making data accessible: lessons learned from computational reproducibility of impact evaluations

    No full text
    Our descriptive study assesses the computational reproducibility of impact evaluation data by verifying the results presented in published 3ie reports. Using the original data and statistical code submitted by researchers, we use the push button replication protocol developed at 3ie to determine the level of comparability of the reproduced results to the original findings. Our sample includes completed 3ie-funded impact evaluations commissioned between 2008 and 2020. We find that of the 133 studies in our sample about three-fourths of the evaluations in our sample are reproducible. This high rate of replication is largely attributable to the stringent payment-linked measures that 3ie adopted during this study. In our view, donor organizations, who are often commissioners of evaluations, can play a key role in ensuring confidence in evaluation studies. To this end, we describe our experience of the reproducibility process, and offer lessons learned

    Making data reusable: lessons learned from replications of impact evlauations

    No full text
    The study aims to check the reusability of impact evaluation data by verifying the results presented in published 3ie reports. In order to verify results, we conduct push button replications on the original data and code submitted by the authors. We use the push button replication protocol developed at 3ie to determine the level of comparability of the replication results to the original findings. Our sample includes closed 3ie-funded impact evaluations commissioned between 2008 and 2018. Of the 74 studies in our sample, we successfully reproduced results from 38 studies (51%). 24 (32%) studies were categorized as incomplete and 12 (16%) studies were categorized as having major differences. The cumulative replication rate in 2018 increased to 51%, as compared to the below-40% replication rate in previous years. Overall, on average, it took about 3 hours to complete the replication of a single impact evaluation. Evidence from impact evaluations are credible when it is verifiable. Our findings suggest that greater attention is needed to ensure the reliability and reusability of evidence. We recommend push button replications as a tested method to ascertain the credibility of findings

    How Many Replicators Does It Take to Achieve Reliability? Investigating Researcher Variability in a Crowdsourced Replication

    No full text
    The paper reports findings from a crowdsourced replication. Eighty-four replicator teams attempted to verify results reported in an original study by running the same models with the same data. The replication involved an experimental condition. A “transparent” group received the original study and code, and an “opaque” group received the same underlying study but with only a methods section and description of the regression coefficients without size or significance, and no code. The transparent group mostly verified the original study (95.5%), while the opaque group had less success (89.4%). Qualitative investigation of the replicators’ workflows reveals many causes of non-verification. Two categories of these causes are hypothesized, routine and non-routine. After correcting non-routine errors in the research process to ensure that the results reflect a level of quality that should be present in ‘real-world’ research, the rate of verification was 96.1 in the transparent group and 92.4 in the opaque group. Two conclusions follow: (1) Although high, the verification rate suggests that it would take a minimum of three replicators per study to achieve replication reliability of at least 95 confidence assuming ecological validity in this controlled setting, and (2) like any type of scientific research, replication is prone to errors that derive from routine and undeliberate actions in the research process. The latter suggests that idiosyncratic researcher variability might provide a key to understanding part of the “reliability crisis” in social and behavioral science and is a reminder of the importance of transparent and well documented workflows
    corecore