Search CORE

226,505 research outputs found

Combined optimization of feature selection and algorithm parameters in machine learning of language

Author: Daelemans Walter
De Meulder Fien
Hoste Veronique
Naudts Bart
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the 'right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach. Using automatic word sense disambiguation as an example task, we show that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization. We propose genetic algorithms as a practical approach to achieve both higher accuracy within a single approach, and more reliable comparisons

CiteSeerX

Ghent University Academic Bibliography

FSMJ: Feature Selection with Maximum Jensen-Shannon Divergence for Text Categorization

Author: He Haibo
Tang Bo
Publication venue
Publication date: 20/06/2016
Field of study

In this paper, we present a new wrapper feature selection approach based on Jensen-Shannon (JS) divergence, termed feature selection with maximum JS-divergence (FSMJ), for text categorization. Unlike most existing feature selection approaches, the proposed FSMJ approach is based on real-valued features which provide more information for discrimination than binary-valued features used in conventional approaches. We show that the FSMJ is a greedy approach and the JS-divergence monotonically increases when more features are selected. We conduct several experiments on real-life data sets, compared with the state-of-the-art feature selection approaches for text categorization. The superior performance of the proposed FSMJ approach demonstrates its effectiveness and further indicates its wide potential applications on data mining.Comment: 8 pages, 6 figures, World Congress on Intelligent Control and Automation, 201

arXiv.org e-Print Archive

DigitalCommons@URI