Search CORE

4,855 research outputs found

Robust Feature Selection by Mutual Information Distributions

Author: Hutter Marcus
Zaffalon Marco
Publication venue
Publication date: 01/01/2002
Field of study

Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.Comment: 8 two-column page

arXiv.org e-Print Archive

CiteSeerX

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Author: He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/02/2016
Field of study

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (

MD

) and

MD-\chi^2

methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

arXiv.org e-Print Archive

Crossref

DigitalCommons@URI