5 research outputs found

    Learning extended tree augmented naive structures

    Get PDF
    This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments shows that we obtain models with better prediction accuracy than naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka

    NCC-EM: A hybrid framework for decision making with missing information

    Get PDF
    Title from PDF of title page viewed January 30, 2018Thesis advisor: Chen ZhiQiangVitaIncludes bibliographical references (pages 43-46)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017Accounting for uncertainty is important in any data driven decision making. The popular treatment of uncertainties is to employ classical probability theory by expressing variables as random variables or processes in terms of random distributions. This precise approach encounters difficulty and leads to deceptive predictions when the sources of uncertainty are epistemic in terms of incomplete (missing), conflicting, or erroneous information due to the lack of knowledge. There have been many frameworks developed against the precise probability formalism, and one of such frameworks is the Imprecise Probability (IP) based modeling. In this thesis, we develop and provide a novel hybrid framework, Naïve Credal Classifier with Expectation-Maximization data imputation, for decision making with missing information. The IP-based Credal Set concept is first introduced to model uncertainties for data with missing information. Then the Naïve Credal Classifier (NCC) is employed in this work, which is provided by the latest JNCC2 package. The key idea and research findings in this research is to model missing data using advanced imputation techniques to minimize the performance (accuracy) loss in NCC. The resulting NCC-EM framework is hybrid where the EM imputation technique is used as a preprocessing step. To verify and validate this hybrid framework, the NCC-EM is extensively tested on open machine learning datasets by simulating missing values, and it is shown that NCC-EM outperforms the existing NCC framework and traditional supervised classification methods.Introduction -- introduction to imprecise probability -- Naïve Bayes Classifier and Naïve Credal classifier -- NCC-EM: a novel Credal based framework -- Conclusion and future wor

    A tree augmented classifier based on Extreme Imprecise Dirichlet Model

    Get PDF
    AbstractWe present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier

    C.P.: A tree-augmented classifier based on Extreme Imprecise Dirichlet Model

    No full text
    In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) [1] and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous [13] TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data
    corecore