350 research outputs found

    Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

    Get PDF
    Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities

    Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks

    Full text link
    A reliable modeling of uncertain evidence in Bayesian networks based on a set-valued quantification is proposed. Both soft and virtual evidences are considered. We show that evidence propagation in this setup can be reduced to standard updating in an augmented credal network, equivalent to a set of consistent Bayesian networks. A characterization of the computational complexity for this task is derived together with an efficient exact procedure for a subclass of instances. In the case of multiple uncertain evidences over the same variable, the proposed procedure can provide a set-valued version of the geometric approach to opinion pooling.Comment: 19 page

    Maximum of entropy for belief intervals under Evidence Theory

    Get PDF
    The Dempster-Shafer Theory (DST) or Evidence Theory has been commonly used to deal with uncertainty. It is based on the basic probability assignment concept (BPA). The upper entropy on the credal set associated with a BPA is the only uncertainty measure in DST that verifies all the necessary mathematical properties and behaviors. Nonetheless, its computation is notably complex. For this reason, many alternatives to this measure have been recently proposed, but they do not satisfy most of the mathematical requirements and present some undesirable behaviors. Belief intervals have been frequently employed to quantify uncertainty in DST in the last years, and they can represent the uncertainty-basedinformation better than a BPA. In this research, we develop a new uncertainty measure that consists of the maximum of entropy on the credal set corresponding to belief intervals for singletons. It verifies all the crucial mathematical requirements and presents good behavior, solving most of the shortcomings found in uncertainty measures proposed recently. Moreover, its calculation is notably easier than the upper entropy on the credal set associated with the BPA. Therefore, our proposed uncertainty measure is more suitable to be used in practical applications.Spanish Ministerio de Economia y Competitividad TIN2016-77902-C3-2-PEuropean Union (EU) TEC2015-69496-

    Completing an uncertainty criterion of classification

    Get PDF
    We present a variation of a method of classification based in uncertainty on credal set. Similarly to its origin it use the imprecise Dirichlet model to create the credal set and the same uncertainty measures. It take into account sets of two variables to reduce the uncertainty and to seek the direct relations between the variables in the data base and the variable to be classified. The success are equivalent to the success of the first method except in those where there are a direct relations between some variables that decide the value of the variable to be classified where we have a notable improvement

    Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy

    Get PDF
    Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, which is based on general entropy, can be used as a quick variable selection method. This measure ranks the importance of the attribute variables on a variable under study via the information obtained from a dataset. The main drawback is that it is always non-negative and it requires setting the information threshold to select the set of most important variables for each dataset. We introduce here a new quick variable selection method that generalizes the method based on the Info-Gain measure. It uses imprecise probabilities and the maximum entropy measure to select the most informative variables without setting a threshold. This new variable selection method, combined with the Naive Bayes classifier, improves the original method and provides a valuable tool for handling datasets with a very large number of features and a huge amount of data, where more complex methods are not computationally feasible.This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R
    • …
    corecore