49 research outputs found
Can groups improve expert economic and financial forecasts?
Economic and financial forecasts are important for business planning and government policy but are notoriously challenging. We take advantage of recent advances in individual and group judgement, and a data set of economic and financial forecasts compiled over 25 years, consisting of multiple individual and institutional estimates, to test the claim that nominal groups will make more accurate economic and financial forecast than individuals. We validate the forecasts using the subsequent published (real) outcomes, explore the performance of nominal groups against institutions, identify potential superforecasters and discuss the benefits of implementing structured judgment techniques to improve economic and financial forecasts
Improving expert forecasts in reliability. Application and evidence for structured elicitation protocols
Quantitative expert judgementsare used in reliability assessmentsto informcritically important decisions. Structured elicitation protocols have been advocated to improveexpert judgements, yet their application in reliability ischallenged by a lack of examples or evidence that they improve judgements. This paper aims to overcome these barriers. We present a case study where two world-leading protocols, the IDEA protocol and the Classical Model were combined and applied by the Australian Department of Defence for a reliability assessment. We assess the practicality of the methods, and the extent to which they improve judgements. The average expert was extremely overconfident, with 90% credible intervals containing the true realisation 36% of the time. However,steps contained inthe protocols substantially improvedjudgements. In particular, an equal weighted aggregation of individual judgements, and the inclusion ofa discussion phase and revised estimate helped to improve calibration, statistical accuracy and the Classical Model score. Further improvements in precision and information were made via performance weighted aggregation. This paper provides useful insights into the application of structured elicitation protocols for reliability andthe extent to which judgements are improved. The findings raise concerns about existing practices for utilising experts in reliability assessments and suggest greater adoption of structured protocols is warranted. We encourage the reliability community to further develop examples and insights
The value of performance weights and discussion in aggregated expert judgements
In risky situations characterized by imminent decisions, scarce resources, and insufficient data, policymakers rely on experts to estimate model parameters and their associated uncertainties. Different elicitation and aggregation methods can vary substantially in their efficacy and robustness. While it is generally agreed that biases in expert judgments can be mitigated using structured elicitations involving groups rather than individuals, there is still some disagreement about how to best elicit and aggregate judgments. This mostly concerns the merits of using performanceâbased weighting schemes to combine judgments of different individuals (rather than assigning equal weights to individual experts), and the way that interaction between experts should be handled. This article aims to contribute to, and complement, the ongoing discussion on these topics
Online training courses on Expert Knowledge Elicitation (EKE)
This report summarises the training courses delivered under the contract OC/EFSA/AMU/2021/02 EKE: âDevelop and conduct online training courses on Expert Knowledge Elicitation (EKE)â. The objective of the courses was to develop and conduct online training courses on applying the methodology described in the EFSA Guidance on Expert Knowledge Elicitation in Food and Feed Safety Risk Assessmentâ for EFSA staff and experts, as well as corresponding experts from EU member states. In addition to the three standard EKE methods (Sheffield, Delphi and Cooke), the training included a semi-formal method of EKE. All these methods may be used when EKE is performed within an existing EFSA working group to support uncertainty analysis as outlined in âThe principles and methods behind EFSA\u27s Guidance on Uncertainty Analysis in Scientific Assessmentâ. In total, 12 courses were organised: two on âSteering an Expert Knowledge Elicitationâ, two on âConduct of the Sheffield protocol for an EKEâ, one on âConduct of the Cooke protocol for an EKEâ, one on âConduct of the Delphi protocol for an EKEâ, two on âConduct of a Semi-formal EKEâ, two on âReporting an Expert Knowledge Elicitationâ and two on âWriting an Evidence Dossier for an Expert Knowledge Elicitationâ. The courses had in total 149 participants and received very good feedback from the participants with a mean value of 4.2 of 5 possible, considering all numerical questions in the feedback questionnaire. Recommendations for future activities on training EKE methodologies are provided
Mathematically aggregating experts' predictions of possible futures
Structured protocols offer a transparent and systematic way to elicit and combine/aggregate, probabilistic predictions from multiple experts. These judgements can be aggregated behaviourally or mathematically to derive a final group prediction. Mathematical rules (e.g., weighted linear combinations of judgments) provide an objective approach to aggregation. The quality of this aggregation can be defined in terms of accuracy, calibration and informativeness. These measures can be used to compare different aggregation approaches and help decide on which aggregation produces the âbestâ final prediction. When expertsâ performance can be scored on similar questions ahead of time, these scores can be translated into performance-based weights, and a performance-based weighted aggregation can then be used. When this is not possible though, several other aggregation methods, informed by measurable proxies for good performance, can be formulated and compared. Here, we develop a suite of aggregation methods, informed by previous experience and the available literature. We differentially weight our expertsâ estimates by measures of reasoning, engagement, openness to changing their mind, informativeness, prior knowledge, and extremity, asymmetry or granularity of estimates. Next, we investigate the relative performance of these aggregation methods using three datasets. The main goal of this research is to explore how measures of knowledge and behaviour of individuals can be leveraged to produce a better performing combined group judgment. Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates
Recommended from our members
Assessment of the response of pollinator abundance to environmental pressures using structured expert elicitation
Policy-makers often need to rely on experts with disparate fields of expertise when making policy choices in complex, multi-faceted, dynamic environments such as those dealing with ecosystem services. For policy-makers wishing to make evidence-based decisions which will best support pollinator abundance and pollination services, one of the problems faced is how to access the information and evidence they need, and how to combine it to formulate and evaluate candidate policies. This is even more complex when multiple factors provide influence in combination. The pressures affecting the survival and pollination capabilities of honey bees (Apis mellifera), wild bees, and other pollinators are well documented, but incomplete. In order to estimate the potential effectiveness of various candidate policy choices, there is an urgent need to quantify the effect of various combinations of factors on the pollination ecosystem service. Using high-quality experimental evidence is the most robust approach, but key aspects of the system may not be amenable to experimentation or may be prohibitive based on cost, time and effort. In such cases, it is possible to obtain the required evidence by using structured expert elicitation, a method for quantitatively characterizing the state of knowledge about an uncertain quantity. Here we report and discuss the outputs of the novel use of a structured expert elicitation, designed to quantify the probability of good pollinator abundance given a variety of weather, disease, and habitat scenarios
Can groups improve expert economic and financial forecasts?
Economic and financial forecasts are important for business planning and government policy but are notoriously challenging. We take advantage of recent advances in individual and group judgement, and a data set of economic and financial forecasts compiled over 25 years, consisting of multiple individual and institutional estimates, to test the claim that nominal groups will make more accurate economic and financial forecast than individuals. We validate the forecasts using the subsequent published (real) outcomes, explore the performance of nominal groups against institutions, identify potential superforecasters and discuss the benefits of implementing structured judgment techniques to improve economic and financial forecasts
What is a Good Calibration Question?
Weighted aggregation of expert judgements based on their performance on calibration questions may improve mathematically aggregated judgements relative to equal weights. However, obtaining validated, relevant calibration questions can be difficult. If so, should analysts settle for equal weights? Or should they use calibration questions that are easier to obtain but less relevant? In this paper, we examine what happens to the out-of-sample performance of weighted aggregations of the Classical Model compared to equal weighted aggregations when the set of calibration questions includes many so-called âirrelevantâ questions, those that might ordinarily be considered to be outside the domain of the questions of interest. We find that performance weighted aggregations outperform equal weights on the combined Classical Model (CM) Score, but not on Statistical Accuracy (i.e., calibration). Importantly, there was no appreciable difference in performance when weights were developed on relevant versus irrelevant questions. Experts were unable to adapt their knowledge across vastly different domains, and in-sample validation did not accurately predict out-of-sample performance on irrelevant questions. We suggest that if relevant calibration questions cannot be found, then analysts should use equal weights, and draw on alternative techniques to improve judgements. Our study also indicates limits to the predictive accuracy of performance weighted aggregation, and the degree to which expertise can be adapted across domains. We note limitations in our study and urge further research into the effect of question type on the reliability of performance weighted aggregations