27,269 research outputs found

    Cross validation for the classical model of structured expert judgment

    Get PDF
    We update the 2008 TU Delft structured expert judgment database with data from 33 professionally contracted Classical Model studies conducted between 2006 and March 2015 to evaluate its performance relative to other expert aggregation models. We briefly review alternative mathematical aggregation schemes, including harmonic weighting, before focusing on linear pooling of expert judgments with equal weights and performance-based weights. Performance weighting outperforms equal weighting in all but 1 of the 33 studies in-sample. True out-of-sample validation is rarely possible for Classical Model studies, and cross validation techniques that split calibration questions into a training and test set are used instead. Performance weighting incurs an “out-of-sample penalty” and its statistical accuracy out-of-sample is lower than that of equal weighting. However, as a function of training set size, the statistical accuracy of performance-based combinations reaches 75% of the equal weight value when the training set includes 80% of calibration variables. At this point the training set is sufficiently powerful to resolve differences in individual expert performance. The information of performance-based combinations is double that of equal weighting when the training set is at least 50% of the set of calibration variables. Previous out-of-sample validation work used a Total Out-of-Sample Validity Index based on all splits of the calibration questions into training and test subsets, which is expensive to compute and includes small training sets of dubious value. As an alternative, we propose an Out-of-Sample Validity Index based on averaging the product of statistical accuracy and information over all training sets sized at 80% of the calibration set. Performance weighting outperforms equal weighting on this Out-of-Sample Validity Index in 26 of the 33 post-2006 studies; the probability of 26 or more successes on 33 trials if there were no difference between performance weighting and equal weighting is 0.001

    Evaluation of a Performance-Based Expert Elicitation:WHO Global Attribution of Foodborne Diseases

    Get PDF
    For many societally important science-based decisions, data are inadequate, unreliable or non-existent, and expert advice is sought. In such cases, procedures for eliciting structured expert judgments (SEJ) are increasingly used. This raises questions regarding validity and reproducibility. This paper presents new findings from a large-scale international SEJ study intended to estimate the global burden of foodborne disease on behalf of WHO. The study involved 72 experts distributed over 134 expert panels, with panels comprising thirteen experts on average. Elicitations were conducted in five languages. Performance-based weighted solutions for target questions of interest were formed for each panel. These weights were based on individual expert's statistical accuracy and informativeness, determined using between ten and fifteen calibration variables from the experts' field with known values. Equal weights combinations were also calculated. The main conclusions on expert performance are: (1) SEJ does provide a science-based method for attribution of the global burden of foodborne diseases; (2) equal weighting of experts per panel increased statistical accuracy to acceptable levels, but at the cost of informativeness; (3) performance-based weighting increased informativeness, while retaining accuracy; (4) due to study constraints individual experts' accuracies were generally lower than in other SEJ studies, and (5) there was a negative correlation between experts' informativeness and statistical accuracy which attenuated as accuracy improved, revealing that the least accurate experts drive the negative correlation. It is shown, however, that performance-based weighting has the ability to yield statistically accurate and informative combinations of experts' judgments, thereby offsetting this contrary influence. The present findings suggest that application of SEJ on a large scale is feasible, and motivate the development of enhanced training and tools for remote elicitation of multiple, internationally-dispersed panels

    A commentary on “how to interpret expert judgment assessments of twenty-first century sea-level rise” by Hylke de Vries and Roderik SW van de Wal

    Get PDF
    We clarify key aspects of the evaluation, by de Vries and van de Wal (2015), of our expert elicitation paper on the contributions of ice sheet melting to sea level rise due to future global temperature rise scenarios (Bamber and Aspinall 2013), and extend the conversation with further analysis of their proposed approach for combining expert uncertainty judgments.Applied Probabilit

    The value of performance weights and discussion in aggregated expert judgements

    Get PDF
    In risky situations characterized by imminent decisions, scarce resources, and insufficient data, policymakers rely on experts to estimate model parameters and their associated uncertainties. Different elicitation and aggregation methods can vary substantially in their efficacy and robustness. While it is generally agreed that biases in expert judgments can be mitigated using structured elicitations involving groups rather than individuals, there is still some disagreement about how to best elicit and aggregate judgments. This mostly concerns the merits of using performance‐based weighting schemes to combine judgments of different individuals (rather than assigning equal weights to individual experts), and the way that interaction between experts should be handled. This article aims to contribute to, and complement, the ongoing discussion on these topics

    Improving expert forecasts in reliability. Application and evidence for structured elicitation protocols

    Get PDF
    Quantitative expert judgementsare used in reliability assessmentsto informcritically important decisions. Structured elicitation protocols have been advocated to improveexpert judgements, yet their application in reliability ischallenged by a lack of examples or evidence that they improve judgements. This paper aims to overcome these barriers. We present a case study where two world-leading protocols, the IDEA protocol and the Classical Model were combined and applied by the Australian Department of Defence for a reliability assessment. We assess the practicality of the methods, and the extent to which they improve judgements. The average expert was extremely overconfident, with 90% credible intervals containing the true realisation 36% of the time. However,steps contained inthe protocols substantially improvedjudgements. In particular, an equal weighted aggregation of individual judgements, and the inclusion ofa discussion phase and revised estimate helped to improve calibration, statistical accuracy and the Classical Model score. Further improvements in precision and information were made via performance weighted aggregation. This paper provides useful insights into the application of structured elicitation protocols for reliability andthe extent to which judgements are improved. The findings raise concerns about existing practices for utilising experts in reliability assessments and suggest greater adoption of structured protocols is warranted. We encourage the reliability community to further develop examples and insights

    Identification of Causal Paths and Prediction of Runway Incursion Risk using Bayesian Belief Networks

    Get PDF
    In the U.S. and worldwide, runway incursions are widely acknowledged as a critical concern for aviation safety. However, despite widespread attempts to reduce the frequency of runway incursions, the rate at which these events occur in the U.S. has steadily risen over the past several years. Attempts to analyze runway incursion causation have been made, but these methods are often limited to investigations of discrete events and do not address the dynamic interactions that lead to breaches of runway safety. While the generally static nature of runway incursion research is understandable given that data are often sparsely available, the unmitigated rate at which runway incursions take place indicates a need for more comprehensive risk models that extend currently available research. This dissertation summarizes the existing literature, emphasizing the need for cross-domain methods of causation analysis applied to runway incursions in the U.S. and reviewing probabilistic methodologies for reasoning under uncertainty. A holistic modeling technique using Bayesian Belief Networks as a means of interpreting causation even in the presence of sparse data is outlined in three phases: causal factor identification, model development, and expert elicitation, with intended application at the systems or regulatory agency level. Further, the importance of investigating runway incursions probabilistically and incorporating information from human factors, technological, and organizational perspectives is supported. A method for structuring a Bayesian network using quantitative and qualitative event analysis in conjunction with structured expert probability estimation is outlined and results are presented for propagation of evidence through the model as well as for causal analysis. In this research, advances in the aggregation of runway incursion data are outlined, and a means of combining quantitative and qualitative information is developed. Building upon these data, a method for developing and validating a Bayesian network while maintaining operational transferability is also presented. Further, the body of knowledge is extended with respect to structured expert judgment, as operationalization is combined with elicitation of expert data to create a technique for gathering expert assessments of probability in a computationally compact manner while preserving mathematical accuracy in rank correlation and dependence structure. The model developed in this study is shown to produce accurate results within the U.S. aviation system, and to provide a dynamic, inferential platform for future evaluation of runway incursion causation. These results in part confirm what is known about runway incursion causation, but more importantly they shed more light on multifaceted causal interactions and do so in a modeling space that allows for causal inference and evaluation of changes to the system in a dynamic setting. Suggestions for future research are also discussed, most prominent of which is that this model allows for robust and flexible assessment of mitigation strategies within a holistic model of runway safety

    An empirical learning-based validation procedure for simulation workflow

    Full text link
    Simulation workflow is a top-level model for the design and control of simulation process. It connects multiple simulation components with time and interaction restrictions to form a complete simulation system. Before the construction and evaluation of the component models, the validation of upper-layer simulation workflow is of the most importance in a simulation system. However, the methods especially for validating simulation workflow is very limit. Many of the existing validation techniques are domain-dependent with cumbersome questionnaire design and expert scoring. Therefore, this paper present an empirical learning-based validation procedure to implement a semi-automated evaluation for simulation workflow. First, representative features of general simulation workflow and their relations with validation indices are proposed. The calculation process of workflow credibility based on Analytic Hierarchy Process (AHP) is then introduced. In order to make full use of the historical data and implement more efficient validation, four learning algorithms, including back propagation neural network (BPNN), extreme learning machine (ELM), evolving new-neuron (eNFN) and fast incremental gaussian mixture model (FIGMN), are introduced for constructing the empirical relation between the workflow credibility and its features. A case study on a landing-process simulation workflow is established to test the feasibility of the proposed procedure. The experimental results also provide some useful overview of the state-of-the-art learning algorithms on the credibility evaluation of simulation models
    corecore