Search CORE

1,649 research outputs found

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Author: He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/02/2016
Field of study

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (

MD

) and

MD-\chi^2

methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

arXiv.org e-Print Archive

DigitalCommons@URI

A novel two stage scheme utilizing the test set for model selection in text classification

Author: Mayo Michael
Pfahringer Bernhard
Reutemann Peter
Publication venue: 'University of Technology, Sydney (UTS)'
Publication date: 01/01/2005
Field of study

Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel simple two stage scheme based on dagging which allows for utilizing the test set in model selection. The dagging ensemble can also be used by itself instead of the original classifier. We evaluate the performance of a meta classifier choosing between various base learners and their respective dagging ensembles. The selection process seems to perform robustly especially for small percentages of available labels for training

Research Commons@Waikato

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

Author: Baggenstoss Paul M.
He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that class-specific features can be used in the proposed classification method, instead of a common feature subset for all classes as used in conventional approaches. We apply the proposed EEF classifier for text categorization as a case study and derive an optimal Bayesian classification rule with class-specific feature selection based on the Information Gain (IG) score. The promising performance on real-life data sets demonstrates the effectiveness of the proposed approach and indicates its wide potential applications.Comment: 9 pages, 3 figures, to be published in IEEE Signal Processing Letter. IEEE Signal Processing Letter, 201

arXiv.org e-Print Archive

Fraunhofer-ePrints

DigitalCommons@URI

Stochastic Discriminative EM

Author: Masegosa Andres R.
Publication venue
Publication date: 02/10/2014
Field of study

Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic generative models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either when being learned or when making predictions. The performance of this method is illustrated by several text classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are learned using different discriminative loss functions.Comment: UAI 2014 paper + Supplementary Material. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI 2014), edited by Nevin L. Zhang and Jian Tian. AUAI Pres

arXiv.org e-Print Archive

CiteSeerX

Scalable Text Mining with Sparse Generative Models

Author: Puurula Antti
Publication venue: 'University of Waikato'
Publication date: 22/06/2015
Field of study

The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places

arXiv.org e-Print Archive

Research Commons@Waikato