13,382 research outputs found
Are screening methods useful in feature selection? An empirical study
Filter or screening methods are often used as a preprocessing step for
reducing the number of variables used by a learning algorithm in obtaining a
classification or regression model. While there are many such filter methods,
there is a need for an objective evaluation of these methods. Such an
evaluation is needed to compare them with each other and also to answer whether
they are at all useful, or a learning algorithm could do a better job without
them. For this purpose, many popular screening methods are partnered in this
paper with three regression learners and five classification learners and
evaluated on ten real datasets to obtain accuracy criteria such as R-square and
area under the ROC curve (AUC). The obtained results are compared through curve
plots and comparison tables in order to find out whether screening methods help
improve the performance of learning algorithms and how they fare with each
other. Our findings revealed that the screening methods were useful in
improving the prediction of the best learner on two regression and two
classification datasets out of the ten datasets evaluated.Comment: 29 pages, 4 figures, 21 table
A scalable saliency-based Feature selection method with instance level information
Classic feature selection techniques remove those features that are either
irrelevant or redundant, achieving a subset of relevant features that help to
provide a better knowledge extraction. This allows the creation of compact
models that are easier to interpret. Most of these techniques work over the
whole dataset, but they are unable to provide the user with successful
information when only instance information is needed. In short, given any
example, classic feature selection algorithms do not give any information about
which the most relevant information is, regarding this sample. This work aims
to overcome this handicap by developing a novel feature selection method,
called Saliency-based Feature Selection (SFS), based in deep-learning saliency
techniques. Our experimental results will prove that this algorithm can be
successfully used not only in Neural Networks, but also under any given
architecture trained by using Gradient Descent techniques
Nomenclature and Benchmarking Models of Text Classification Models: Contemporary Affirmation of the Recent Literature
In this paper we present automated text classification in text mining that is gaining greater relevance in various fields every day Text mining primarily focuses on developing text classification systems able to automatically classify huge volume of documents comprising of unstructured and semi structured data The process of retrieval classification and summarization simplifies extract of information by the user The finding of the ideal text classifier feature generator and distinct dominant technique of feature selection leading all other previous research has received attention from researchers of diverse areas as information retrieval machine learning and the theory of algorithms To automatically classify and discover patterns from the different types of the documents 1 techniques like Machine Learning Natural Language Processing NLP and Data Mining are applied together In this paper we review some effective feature selection researches and show the results in a table for
- …