77 research outputs found

    Upgrading the Fusion of Imprecise Classifiers

    Get PDF
    Imprecise classification is a relatively new task within Machine Learning. The difference with standard classification is that not only is one state of the variable under study determined, a set of states that do not have enough information against them and cannot be ruled out is determined as well. For imprecise classification, a mode called an Imprecise Credal Decision Tree (ICDT) that uses imprecise probabilities and maximum of entropy as the information measure has been presented. A difficult and interesting task is to show how to combine this type of imprecise classifiers. A procedure based on the minimum level of dominance has been presented; though it represents a very strong method of combining, it has the drawback of an important risk of possible erroneous prediction. In this research, we use the second-best theory to argue that the aforementioned type of combination can be improved through a new procedure built by relaxing the constraints. The new procedure is compared with the original one in an experimental study on a large set of datasets, and shows improvement.UGR-FEDER funds under Project A-TIC-344-UGR20FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades” under Project P20_0015

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Small margin ensembles can be robust to class-label noise

    Full text link
    This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, VOL 160 (2015) DOI 10.1016/j.neucom.2014.12.086Subsampling is used to generate bagging ensembles that are accurate and robust to class-label noise. The effect of using smaller bootstrap samples to train the base learners is to make the ensemble more diverse. As a result, the classification margins tend to decrease. In spite of having small margins, these ensembles can be robust to class-label noise. The validity of these observations is illustrated in a wide range of synthetic and real-world classification tasks. In the problems investigated, subsampling significantly outperforms standard bagging for different amounts of class-label noise. By contrast, the effectiveness of subsampling in random forest is problem dependent. In these types of ensembles the best overall accuracy is obtained when the random trees are built on bootstrap samples of the same size as the original training data. Nevertheless, subsampling becomes more effective as the amount of class-label noise increases.The authors acknowledge financial support from Spanish Plan Nacional I+D+i Grant TIN2013-42351-P and from Comunidad de Madrid Grant S2013/ICE-2845 CASI-CAM-CM

    On the Calibration of Probabilistic Classifier Sets

    Full text link
    Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes error, and epistemic uncertainty via the size of the set. In this paper, we extend the notion of calibration, which is commonly used to evaluate the validity of the aleatoric uncertainty representation of a single probabilistic classifier, to assess the validity of an epistemic uncertainty representation obtained by sets of probabilistic classifiers. Broadly speaking, we call a set of probabilistic classifiers calibrated if one can find a calibrated convex combination of these classifiers. To evaluate this notion of calibration, we propose a novel nonparametric calibration test that generalizes an existing test for single probabilistic classifiers to the case of sets of probabilistic classifiers. Making use of this test, we empirically show that ensembles of deep neural networks are often not well calibrated

    Binary credal classification under sparsity constraints.

    Get PDF
    Binary classification is a well known problem in statistics. Besides classical methods, several techniques such as the naive credal classifier (for categorical data) and imprecise logistic regression (for continuous data) have been proposed to handle sparse data. However, a convincing approach to the classification problem in high dimensional problems (i.e., when the number of attributes is larger than the number of observations) is yet to be explored in the context of imprecise probability. In this article, we propose a sensitivity analysis based on penalised logistic regression scheme that works as binary classifier for high dimensional cases. We use an approach based on a set of likelihood functions (i.e. an imprecise likelihood, if you like), that assigns a set of weights to the attributes, to ensure a robust selection of the important attributes, whilst training the model at the same time, all in one fell swoop. We do a sensitivity analysis on the weights of the penalty term resulting in a set of sparse constraints which helps to identify imprecision in the dataset

    Ensemble methods for classification trees under imprecise probabilities

    Get PDF
    • …
    corecore