4 research outputs found

    Reliability estimation of ensemble model predictions

    Get PDF
    In today's world, the reliability of a prediction is very important, especially in areas such as health and finance, where we do not want to make predictions that are not sufficiently reliable. To solve these problems in the context of machine learning, methods are being researched that assess the reliability of predictions. There are two types of methods: those specialized for a specific model and those who do not presume in advance the model type. The first may take into account additional information in determining the reliability, because they can use the parameters that are specific to the model as additional information. Others, however, are applicable to all models. In this work, we present some methods that operate on ensemble models, therefore, they are among those that are specific to a particular model. Methods operate on both the classification as well as regression datasets. Performance of methods is evaluated by Pearson correlation coefficient in the case of regression problems and Wilcoxon-Mann-Whitney statistics in the case of classification. The developed methods are compared with existing ones. We also show the results using critical distance diagrams

    Detecting Misclassification Errors in Neural Networks with a Gaussian Process Model

    Full text link
    As neural network classifiers are deployed in real-world applications, it is crucial that their failures can be detected reliably. One practical solution is to assign confidence scores to each prediction, then use these scores to filter out possible misclassifications. However, existing confidence metrics are not yet sufficiently reliable for this role. This paper presents a new framework that produces a quantitative metric for detecting misclassification errors. This framework, RED, builds an error detector on top of the base classifier and estimates uncertainty of the detection scores using Gaussian Processes. Experimental comparisons with other error detection methods on 125 UCI datasets demonstrate that this approach is effective. Further implementations on two probabilistic base classifiers and two large deep learning architecture in vision tasks further confirm that the method is robust and scalable. Third, an empirical analysis of RED with out-of-distribution and adversarial samples shows that the method can be used not only to detect errors but also to understand where they come from. RED can thereby be used to improve trustworthiness of neural network classifiers more broadly in the future.Comment: 32 pages, 3 figures, 15 table
    corecore