2 research outputs found
Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Automated feature selection is important for text categorization to reduce
the feature size and to speed up the learning process of classifiers. In this
paper, we present a novel and efficient feature selection framework based on
the Information Theory, which aims to rank the features with their
discriminative capacity for classification. We first revisit two information
measures: Kullback-Leibler divergence and Jeffreys divergence for binary
hypothesis testing, and analyze their asymptotic properties relating to type I
and type II errors of a Bayesian classifier. We then introduce a new divergence
measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure
multi-distribution divergence for multi-class classification. Based on the
JMH-divergence, we develop two efficient feature selection methods, termed
maximum discrimination () and methods, for text categorization.
The promising results of extensive experiments demonstrate the effectiveness of
the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data
Engineering. 14 pages, 5 figure
Hybrid classification with partial models
The parametric classifiers trained with the Bayesian rule are usually more accurate than the non-parametric classifiers such as nearest neighbors, neural network and support vector machine, when the class-conditional densities of distribution models are known except for some of their parameters and the training data is abundant. However, the parametric classifiers would perform poorly if these class-conditional densities are unknown and the assumed distribution models are inaccurate. In this paper, we propose a hybrid classification method for the data with partially known distribution models where only the distribution models of some classes are known. For this partial models case, the proposed hybrid classifier makes the best use of knowledge of known distribution models with Bayesian interference, while both purely parametric and non-parametric classifiers would lose a specific predictive capacity for classification. Theoretical proofs and experimental results show that the proposed hybrid classifier has much better performance than these purely parametric and non-parametric classifiers for the data with partial models