Search CORE

3 research outputs found

Making data accessible: lessons learned from computational reproducibility of impact evaluations

Author: Marie Gaarder
Neeta Goel
Sayak Khatua
Publication venue: Open Science Framework
Publication date: 03/02/2021
Field of study

Our descriptive study assesses the computational reproducibility of impact evaluation data by verifying the results presented in published 3ie reports. Using the original data and statistical code submitted by researchers, we use the push button replication protocol developed at 3ie to determine the level of comparability of the reproduced results to the original findings. Our sample includes completed 3ie-funded impact evaluations commissioned between 2008 and 2020. We find that of the 133 studies in our sample about three-fourths of the evaluations in our sample are reproducible. This high rate of replication is largely attributable to the stringent payment-linked measures that 3ie adopted during this study. In our view, donor organizations, who are often commissioners of evaluations, can play a key role in ensuring confidence in evaluation studies. To this end, we describe our experience of the reproducibility process, and offer lessons learned

OSF Preprints

Making data reusable: lessons learned from replications of impact evlauations

Author: Gaarder Marie
Goel Neeta
Khatua Sayak
Publication venue: Harvard Dataverse
Publication date
Field of study

The study aims to check the reusability of impact evaluation data by verifying the results presented in published 3ie reports. In order to verify results, we conduct push button replications on the original data and code submitted by the authors. We use the push button replication protocol developed at 3ie to determine the level of comparability of the replication results to the original findings. Our sample includes closed 3ie-funded impact evaluations commissioned between 2008 and 2018. Of the 74 studies in our sample, we successfully reproduced results from 38 studies (51%). 24 (32%) studies were categorized as incomplete and 12 (16%) studies were categorized as having major differences. The cumulative replication rate in 2018 increased to 51%, as compared to the below-40% replication rate in previous years. Overall, on average, it took about 3 hours to complete the replication of a single impact evaluation. Evidence from impact evaluations are credible when it is verifiable. Our findings suggest that greater attention is needed to ensure the reliability and reusability of evidence. We recommend push button replications as a tested method to ascertain the credibility of findings

Harvard Dataverse Network

How Many Replicators Does It Take to Achieve Reliability? Investigating Researcher Variability in a Crowdsourced Replication

Author: Adem Muna
Adriaans Jule
Akdeniz Esra
Alvarez-Benjumea Amalia
Andersen Henrik Kenneth
Assche Jasper Van
Auer Daniel
Azevedo Flavio
Bahnsen Oke
Bai Ling
Balzer Dave
Bauer Gerrit
Bauer Paul C.
Baumann Markus
Baute Sharon
Benoit Verena
Bernauer Julian
Berning Carl
Berthold Anna
Bethke Felix
Biegert Thomas
Blinzler Katharina
Blumenberg Johannes
Bobzien Licia
Bohman Andrea
Bol Thijs
Bostic Amie
Breznau Nate
Brzozowska Zuzanna
Burgdorf Katharina
Burger Kaspar
Busch Kathrin
C. Lee Robin
Castillo Juan Carlos
Chan Nathan
Christmann Pablo
Connelly Roxanne
Czymara Christian S.
Damian Elena
Ecker Alejandro
Edelmann Achim
Eder Christina
Eger Maureen A.
Ellerbrock Simon
Forke Anna
Forster Andrea
Freire Danilo
Gaasendam Chris
Gavras Konstantin
Gayle Vernon
Gessler Theresa
Gnambs Timo
Godefroidt Amélie
Greinert Alexander
Groß Martin
Gruber Stefan
Grömping Max
Gummer Tobias
Hadjar Andreas
Halbherr Verena
Heisig Jan Paul
Hellmeier Sebastian
Heyne Stefanie
Hirsch Magdalena
Hjerm Mikael
Hochman Oshrat
Hootegem Arno Van
Hunger Sophia
Hunkler Christian
Huth Nora
Höffler Jan H.
Hövermann Andreas
Ignacz Zsofia
Israel Sabine
Jacobs Laura
Jacobsen Jannes
Jaeger Bastian
Jungkunz Sebastian
Jungmann Nils
Kanjana Jennifer
Kauff Mathias
Khan Salman
Khatua Sayak
Kleinert Manuel
Klinger Julia
Kolb Jan-Philipp
Kołczyńska Marta
Kuk John Seungmin
Kunißen Katharina
Lersch Philipp M.
Linden Meta Van Der
Liu David
Lutscher Philipp
Löbel Lea-Maria
Mader Matthias
Madia Joan
Malancu Natalia
Maldonado Luis
Marahrens Helge
Martin Nicole
Martinez Paul
Mayerl Jochen
Mayorga Oscar Jose
McDonnell Robert Myles
McManus Patricia
Meeusen Cecil
Meierrieks Daniel
Mellon Jonathan
Merhout Friedolin
Merk Samuel
Meyer Daniel
Micheli Leticia
Mijs Jonathan J. B.
Moya Cristóbal
Neunhoeffer Marcel
Nguyen Hung H. V.
Noll Jolanda Van Der
Nygård Olav
Nüst Daniel
Ochsenfeld Fabian
Otte Gunnar
Pechenkina Anna
Petrarca Constanza Sanhueza
Pickup Mark
Prosser Christopher
Raes Louis
Ralston Kevin
Ramos Miguel
Reichert Frank
Rinke Eike Mark
Roets Arne
Rogers Jonathan
Rooij Eline De
Ropers Guido
Samuel Robin
Sand Gregor
Schachter Ariela
Schaeffer Merlin
Schieferdecker David
Schlueter Elmar
Schmidt Katja
Schmidt Regine
Schmidt-Catran Alexander
Schmiedeberg Claudia
Schneider Jürgen
Schoonvelde Martijn
Schulte-Cloos Julia
Schumann Sandy
Schunck Reinhard
Schupp Jürgen
Seuring Julian
Silber Henning
Sinatra Dafina Kurti
Sleegers Willem
Sonntag Nico
Staudt Alexander
Steiber Nadia
Steiner Nils
Sternberg Sebastian
Sternberg Sebastian
Stojmenovska Dragana
Storz Nora
Striessnig Erich
Stroppe Anne-Kathrin
Suchow Jordan W.
Teltemann Janna
Tibajev Andrey
Tung Brian B.
Vagni Giacomo
Vogtenhuber Stefan
Voicu Bogdan
Wagemans Fieke
Wagner Kyle
Wehl Nadja
Werner Hannah
Wiernik Brenton M.
Winter Fabian
Wolf Christof
Wu Cary
Wuttke Alexander
Yamada Yuki
Zakula Björn
Zhang Nan
Ziller Conrad
Zins Stefan
Żółtak Tomasz
Publication venue: 'Center for Open Science'
Publication date: 01/01/2021
Field of study

The paper reports findings from a crowdsourced replication. Eighty-four replicator teams attempted to verify results reported in an original study by running the same models with the same data. The replication involved an experimental condition. A “transparent” group received the original study and code, and an “opaque” group received the same underlying study but with only a methods section and description of the regression coefficients without size or significance, and no code. The transparent group mostly verified the original study (95.5%), while the opaque group had less success (89.4%). Qualitative investigation of the replicators’ workflows reveals many causes of non-verification. Two categories of these causes are hypothesized, routine and non-routine. After correcting non-routine errors in the research process to ensure that the results reflect a level of quality that should be present in ‘real-world’ research, the rate of verification was 96.1 in the transparent group and 92.4 in the opaque group. Two conclusions follow: (1) Although high, the verification rate suggests that it would take a minimum of three replicators per study to achieve replication reliability of at least 95 confidence assuming ecological validity in this controlled setting, and (2) like any type of scientific research, replication is prone to errors that derive from routine and undeliberate actions in the research process. The latter suggests that idiosyncratic researcher variability might provide a key to understanding part of the “reliability crisis” in social and behavioral science and is a reminder of the importance of transparent and well documented workflows

Publications at Bielefeld University

Open Repository and Bibliography - Luxembourg