40 research outputs found

    Eliciting group judgements about replicability: a technical implementation of the IDEA Protocol

    Get PDF
    In recent years there has been increased interest in replicating prior research. One of the biggest challenges to assessing replicability is the cost in resources and time that it takes to repeat studies. Thus there is an impetus to develop rapid elicitation protocols that can, in a practical manner, estimate the likelihood that research findings will successfully replicate. We employ a novel implementation of the IDEA (‘Investigate’, ‘Discuss’, ‘Estimate’ and ‘Aggregate) protocol, realised through the repliCATS platform. The repliCATS platform is designed to scalably elicit expert opinion about replicability of social and behavioural science research. The IDEA protocol provides a structured methodology for eliciting judgements and reasoning from groups. This paper describes the repliCATS platform as a multi-user cloud-based software platform featuring (1) a technical implementation of the IDEA protocol for eliciting expert opinion on research replicability, (2) capture of consent and demographic data, (3) on-line training on replication concepts, and (4) exporting of completed judgements. The platform has, to date, evaluated 3432 social and behavioural science research claims from 637 participants

    Better Together: Reliable Application of the Post-9/11 and Post-Iraq US Intelligence Tradecraft Standards Requires Collective Analysis

    Get PDF
    Background: The events of 9/11 and the October 2002 National Intelligence Estimate on Iraq’s Continuing Programs for Weapons of Mass Destruction precipitated fundamental changes within the United States Intelligence Community. As part of the reform, analytic tradecraft standards were revised and codified into a policy document – Intelligence Community Directive (ICD) 203 – and an analytic ombudsman was appointed in the newly created Office for the Director of National Intelligence to ensure compliance across the intelligence community. In this paper we investigate the untested assumption that the ICD203 criteria can facilitate reliable evaluations of analytic products.Methods: Fifteen independent raters used a rubric based on the ICD203 criteria to assess the quality of reasoning of 64 analytical reports generated in response to hypothetical intelligence problems. We calculated the intra-class correlation coefficients for single and group-aggregated assessments.Results: Despite general training and rater calibration, the reliability of individual assessments was poor. However, aggregate ratings showed good to excellent reliability.Conclusion: Given that real problems will be more difficult and complex than our hypothetical case studies, we advise that groups of at least three raters are required to obtain reliable quality control procedures for intelligence products. Our study sets limits on assessment reliability and provides a basis for further evaluation of the predictive validity of intelligence reports generated in compliance with the tradecraft standards

    Better together: reliable application of the post-9/11 and post-Iraq US intelligence tradecraft standards requires collective analysis

    Get PDF
    Background: The events of 9/11 and the October 2002 National Intelligence Estimate on Iraq's Continuing Programs for Weapons of Mass Destruction precipitated fundamental changes within the US Intelligence Community. As part of the reform, analytic tradecraft standards were revised and codified into a policy document – Intelligence Community Directive (ICD) 203 – and an analytic ombudsman was appointed in the newly created Office for the Director of National Intelligence to ensure compliance across the intelligence community. In this paper we investigate the untested assumption that the ICD203 criteria can facilitate reliable evaluations of analytic products. Method: Fifteen independent raters used a rubric based on the ICD203 criteria to assess the quality of reasoning of 64 analytical reports generated in response to hypothetical intelligence problems. We calculated the intra-class correlation coefficients for single and group-aggregated assessments. Results: Despite general training and rater calibration, the reliability of individual assessments was poor. However, aggregate ratings showed good to excellent reliability. Conclusions: Given that real problems will be more difficult and complex than our hypothetical case studies, we advise that groups of at least three raters are required to obtain reliable quality control procedures for intelligence products. Our study sets limits on assessment reliability and provides a basis for further evaluation of the predictive validity of intelligence reports generated in compliance with the tradecraft standards

    A horizon scan of global conservation issues for 2014

    Get PDF
    This paper presents the output of our fifth annual horizon-scanning exercise, which aims to identify topics that increasingly may affect conservation of biological diversity, but have yet to be widely considered. A team of professional horizon scanners, researchers, practitioners, and a journalist identified 15 topics which were identified via an iterative, Delphi-like process. The 15 topics include a carbon market induced financial crash, rapid geographic expansion of macroalgal cultivation, genetic control of invasive species, probiotic therapy for amphibians, and an emerging snake fungal disease. © 2013 Elsevier Ltd

    Predicting reliability through structured expert elicitation with the repliCATS (Collaborative Assessments for Trustworthy Science) process

    Get PDF
    As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with a validation study producing a classification accuracy of 84% and an Area Under the Curve of 0.94, meeting or exceeding the accuracy of other techniques used to predict replicability. The repliCATS process provides other benefits. It is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform, having been used to assess 3000 research claims over an 18 month period. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to provide insight in understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies
    corecore