231 research outputs found

    Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

    Get PDF
    Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities

    Completing an uncertainty criterion of classification

    Get PDF
    We present a variation of a method of classification based in uncertainty on credal set. Similarly to its origin it use the imprecise Dirichlet model to create the credal set and the same uncertainty measures. It take into account sets of two variables to reduce the uncertainty and to seek the direct relations between the variables in the data base and the variable to be classified. The success are equivalent to the success of the first method except in those where there are a direct relations between some variables that decide the value of the variable to be classified where we have a notable improvement

    CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification Tasks

    Full text link
    Uncertainty estimation is increasingly attractive for improving the reliability of neural networks. In this work, we present novel credal-set interval neural networks (CreINNs) designed for classification tasks. CreINNs preserve the traditional interval neural network structure, capturing weight uncertainty through deterministic intervals, while forecasting credal sets using the mathematical framework of probability intervals. Experimental validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN) showcase that CreINNs outperform epistemic uncertainty estimation when compared to variational Bayesian neural networks (BNNs) and deep ensembles (DEs). Furthermore, CreINNs exhibit a notable reduction in computational complexity compared to variational BNNs and demonstrate smaller model sizes than DEs

    Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

    Get PDF
    Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, which is based on general entropy, can be used as a quick variable selection method. This measure ranks the importance of the attribute variables on a variable under study via the information obtained from a dataset. The main drawback is that it is always non-negative and it requires setting the information threshold to select the set of most important variables for each dataset. We introduce here a new quick variable selection method that generalizes the method based on the Info-Gain measure. It uses imprecise probabilities and the maximum entropy measure to select the most informative variables without setting a threshold. This new variable selection method, combined with the Naive Bayes classifier, improves the original method and provides a valuable tool for handling datasets with a very large number of features and a huge amount of data, where more complex methods are not computationally feasible.This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R

    Maximum of entropy for belief intervals under Evidence Theory

    Get PDF
    The Dempster-Shafer Theory (DST) or Evidence Theory has been commonly used to deal with uncertainty. It is based on the basic probability assignment concept (BPA). The upper entropy on the credal set associated with a BPA is the only uncertainty measure in DST that verifies all the necessary mathematical properties and behaviors. Nonetheless, its computation is notably complex. For this reason, many alternatives to this measure have been recently proposed, but they do not satisfy most of the mathematical requirements and present some undesirable behaviors. Belief intervals have been frequently employed to quantify uncertainty in DST in the last years, and they can represent the uncertainty-basedinformation better than a BPA. In this research, we develop a new uncertainty measure that consists of the maximum of entropy on the credal set corresponding to belief intervals for singletons. It verifies all the crucial mathematical requirements and presents good behavior, solving most of the shortcomings found in uncertainty measures proposed recently. Moreover, its calculation is notably easier than the upper entropy on the credal set associated with the BPA. Therefore, our proposed uncertainty measure is more suitable to be used in practical applications.Spanish Ministerio de Economia y Competitividad TIN2016-77902-C3-2-PEuropean Union (EU) TEC2015-69496-

    Upgrading the Fusion of Imprecise Classifiers

    Get PDF
    Imprecise classification is a relatively new task within Machine Learning. The difference with standard classification is that not only is one state of the variable under study determined, a set of states that do not have enough information against them and cannot be ruled out is determined as well. For imprecise classification, a mode called an Imprecise Credal Decision Tree (ICDT) that uses imprecise probabilities and maximum of entropy as the information measure has been presented. A difficult and interesting task is to show how to combine this type of imprecise classifiers. A procedure based on the minimum level of dominance has been presented; though it represents a very strong method of combining, it has the drawback of an important risk of possible erroneous prediction. In this research, we use the second-best theory to argue that the aforementioned type of combination can be improved through a new procedure built by relaxing the constraints. The new procedure is compared with the original one in an experimental study on a large set of datasets, and shows improvement.UGR-FEDER funds under Project A-TIC-344-UGR20FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades” under Project P20_0015

    Conformalized Credal Set Predictors

    Full text link
    Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example

    Bagging of Credal Decision Trees for Imprecise Classification

    Get PDF
    The Credal Decision Trees (CDT) have been adapted for Imprecise Classification (ICDT). However, no ensembles of imprecise classifiers have been proposed so far. The reason might be that it is not a trivial question to combine the predictions made by multiple imprecise classifier. In fact, if the combination method used is not appropriate, the ensemble method could even worse the performance of one single classifier. On the other hand, the Bagging scheme has shown to provide satisfactory results in precise classification, specially when it is used with CDTs, which are known to be very weak and unstable classifiers. For these reasons, in this research, it is proposed a new Bagging scheme with ICDTs. It is presented a new technique for combining predictions made by imprecise classifiers that tries to maximize the precision of the bagging classifier. If the procedure for such a combination is too conservative it is easy to obtain few information and worse the results of a single classifier. Our proposal considers only the states with the minimum level of non-dominance. An exhaustive experimentation carried out in this work has shown that the Bagging of ICDTs, with our proposed combination technique, performs clearly better than a single ICDT.This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R
    • …
    corecore