2 research outputs found

    Toward Optimal Feature Selection in Naive Bayes for Text Categorization

    Full text link
    Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MDMD) and MD−χ2MD-\chi^2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

    Hybrid classification with partial models

    No full text
    The parametric classifiers trained with the Bayesian rule are usually more accurate than the non-parametric classifiers such as nearest neighbors, neural network and support vector machine, when the class-conditional densities of distribution models are known except for some of their parameters and the training data is abundant. However, the parametric classifiers would perform poorly if these class-conditional densities are unknown and the assumed distribution models are inaccurate. In this paper, we propose a hybrid classification method for the data with partially known distribution models where only the distribution models of some classes are known. For this partial models case, the proposed hybrid classifier makes the best use of knowledge of known distribution models with Bayesian interference, while both purely parametric and non-parametric classifiers would lose a specific predictive capacity for classification. Theoretical proofs and experimental results show that the proposed hybrid classifier has much better performance than these purely parametric and non-parametric classifiers for the data with partial models
    corecore