53,067 research outputs found

    Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation

    Full text link
    Uncertainty quantification in automated image analysis is highly desired in many applications. Typically, machine learning models in classification or segmentation are only developed to provide binary answers; however, quantifying the uncertainty of the models can play a critical role for example in active learning or machine human interaction. Uncertainty quantification is especially difficult when using deep learning-based models, which are the state-of-the-art in many imaging applications. The current uncertainty quantification approaches do not scale well in high-dimensional real-world problems. Scalable solutions often rely on classical techniques, such as dropout, during inference or training ensembles of identical models with different random seeds to obtain a posterior distribution. In this paper, we show that these approaches fail to approximate the classification probability. On the contrary, we propose a scalable and intuitive framework to calibrate ensembles of deep learning models to produce uncertainty quantification measurements that approximate the classification probability. On unseen test data, we demonstrate improved calibration, sensitivity (in two out of three cases) and precision when being compared with the standard approaches. We further motivate the usage of our method in active learning, creating pseudo-labels to learn from unlabeled images and human-machine collaboration

    Evolving Ensembles with TPOT

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine learning has become popular in recent years as a solution to various problems such as fraud detection, weather prediction, improve diagnosis accuracy, and more. One of its goals is to find the model that best explains the problem. Among the several alternatives on how to accomplish that, significant attention has been laid on the matter of accuracy using stacking ensembles: the objective is to produce a more accurate prediction by combining the predictions of various estimators. This model has often been exhibiting a superior performance in contrast to its single counterparts. Because the process of choosing the best model for a given problem can be time-consuming, a necessity to automatize the machine learning process has emerged. Different tools allow this, including TPOT, a Python library that uses genetic programming to optimize the machine learning process, evolving pipelines randomly created until the best one is found, or a previously fixed maximum number of generations for the given problem is reached. Genetic programming is a field of machine learning that uses evolutionary algorithms to generate new computer programs, and it has been shown successful in quite a few applications. TPOT uses several machine learning algorithms from the Sklearn Python library. It also features some ensembles, such as Random Forest or AdaBoost. Currently, stacking ensembles are not implemented yet on TPOT, and, considering its current accuracy rates, the objective of this thesis is to implement stacking ensembles in TPOT. After we implemented stacking ensembles successfully in TPOT, we performed some experiments with different datasets and noticed that for almost all of them, TPOT has comparable performance to TPOT with stacking ensembles. Also, we observed that, when using the light dictionary version of TPOT, the results of the Stacking configuration improved for two datasets since it used weaker learners

    Comparison of standard resampling methods for performance estimation of artificial neural network ensembles

    Get PDF
    Estimation of the generalization performance for classification within the medical applications domain is always an important task. In this study we focus on artificial neural network ensembles as the machine learning technique. We present a numerical comparison between five common resampling techniques: k-fold cross validation (CV), holdout, using three cutoffs, and bootstrap using five different data sets. The results show that CV together with holdout 0.250.25 and 0.500.50 are the best resampling strategies for estimating the true performance of ANN ensembles. The bootstrap, using the .632+ rule, is too optimistic, while the holdout 0.750.75 underestimates the true performance

    The Future of Human-AI Collaboration: A Taxonomy of Design Knowledge for Hybrid Intelligence Systems

    Get PDF
    Recent technological advances, especially in the field of machine learning, provide astonishing progress on the road towards artificial general intelligence. However, tasks in current real-world business applications cannot yet be solved by machines alone. We, therefore, identify the need for developing socio-technological ensembles of humans and machines. Such systems possess the ability to accomplish complex goals by combining human and artificial intelligence to collectively achieve superior results and continuously improve by learning from each other. Thus, the need for structured design knowledge for those systems arises. Following a taxonomy development method, this article provides three main contributions: First, we present a structured overview of interdisciplinary research on the role of humans in the machine learning pipeline. Second, we envision hybrid intelligence systems and conceptualize the relevant dimensions for system design for the first time. Finally, we offer useful guidance for system developers during the implementation of such applications

    Simple Regularisation for Uncertainty-Aware Knowledge Distillation

    Get PDF
    Considering uncertainty estimation of modern neural networks (NNs) is one of the most important steps towards deploying machine learning systems to meaningful real-world applications such as in medicine, finance or autonomous systems. At the moment, ensembles of different NNs constitute the state-of-the-art in both accuracy and uncertainty estimation in different tasks. However, ensembles of NNs are unpractical under real-world constraints, since their computation and memory consumption scale linearly with the size of the ensemble, which increase their latency and deployment cost. In this work, we examine a simple regularisation approach for distribution-free knowledge distillation of ensemble of machine learning models into a single NN. The aim of the regularisation is to preserve the diversity, accuracy and uncertainty estimation characteristics of the original ensemble without any intricacies, such as fine-tuning. We demonstrate the generality of the approach on combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and different tasks