Search CORE

2,819 research outputs found

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Author: He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/02/2016
Field of study

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (

MD

) and

MD-\chi^2

methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

arXiv.org e-Print Archive

DigitalCommons@URI

FSMJ: Feature Selection with Maximum Jensen-Shannon Divergence for Text Categorization

Author: He Haibo
Tang Bo
Publication venue
Publication date: 20/06/2016
Field of study

In this paper, we present a new wrapper feature selection approach based on Jensen-Shannon (JS) divergence, termed feature selection with maximum JS-divergence (FSMJ), for text categorization. Unlike most existing feature selection approaches, the proposed FSMJ approach is based on real-valued features which provide more information for discrimination than binary-valued features used in conventional approaches. We show that the FSMJ is a greedy approach and the JS-divergence monotonically increases when more features are selected. We conduct several experiments on real-life data sets, compared with the state-of-the-art feature selection approaches for text categorization. The superior performance of the proposed FSMJ approach demonstrates its effectiveness and further indicates its wide potential applications on data mining.Comment: 8 pages, 6 figures, World Congress on Intelligent Control and Automation, 201

arXiv.org e-Print Archive

DigitalCommons@URI

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

Author: Baggenstoss Paul M.
He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that class-specific features can be used in the proposed classification method, instead of a common feature subset for all classes as used in conventional approaches. We apply the proposed EEF classifier for text categorization as a case study and derive an optimal Bayesian classification rule with class-specific feature selection based on the Information Gain (IG) score. The promising performance on real-life data sets demonstrates the effectiveness of the proposed approach and indicates its wide potential applications.Comment: 9 pages, 3 figures, to be published in IEEE Signal Processing Letter. IEEE Signal Processing Letter, 201

arXiv.org e-Print Archive

Fraunhofer-ePrints

DigitalCommons@URI

Text Catagorization Using Hybrid Na�ve Bayes Algorithm

Author: Kshitija Deshmukh, Jaya Bhargaw, Shweta
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/01/2018
Field of study

Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers .Text classification is a growing interest in the research of text mining. Correctly identifying the Text into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the present classifying approaches, Na�ve Bayes is probably smart at serving as a document classification model thanks to its simplicity. The aim of this Project is to spotlight the performance of Text categorization and sophistication prediction Na�ve Bayes in Text classification

International Journal on Future Revolution in Computer Science & Communication Engineering

An Overview on Implementation Using Hybrid Na�ve Bayes Algorithm for Text Categorization

Author: Kshitija Deshmukh, Shweta Raut, Jaya Bha
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/03/2018
Field of study

International Journal on Future Revolution in Computer Science & Communication Engineering

An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering

Author: Ghazanfar Mustansar
Prugel-Bennett Adam
Publication venue
Publication date: 20/04/2010
Field of study

Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date a number of recommendation algorithms have been proposed, where collaborative filtering and content-based filtering are the two most famous and adopted recommendation techniques. Collaborative filtering recommender systems recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content-based recommender systems recommend items based on the content information of the items. These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage. Hybrid recommender systems combine individual systems to avoid certain aforementioned limitations of these systems. In this paper, we proposed a unique switching hybrid recommendation approach by combining a Naive Bayes classification approach with the collaborative filtering. Experimental results on two different data sets, show that the proposed algorithm is scalable and provide better performance – in terms of accuracy and coverage – than other algorithms while at the same time eliminates some recorded problems with the recommender systems

Southampton (e-Prints Soton)

Text Clustering and Classification Techniques- A Review

Author: Sneh Lata, Mr. Ramesh Loar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2018
Field of study

Text classification is the task of automatically sorting a set of documents into categories from a predefined set. Text Classification is a data mining technique used to predict group membership for data instances within a given dataset. It is used for classifying data into different classes by considering some constrains. Instead of traditional feature selection techniques used for text document classification. A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. Automated Text categorization and class prediction is important for text categorization to reduce the feature size and to speed up the learning process of classifiers

International Journal on Recent and Innovation Trends in Computing and Communication

Text Clustering and Classification Techniques using Data Mining

Author: Sneh Lata, Mr. Ramesh Loar
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/04/2018
Field of study

International Journal on Future Revolution in Computer Science & Communication Engineering