33,423 research outputs found
Simple stopping criteria for information theoretic feature selection
Feature selection aims to select the smallest feature subset that yields the
minimum generalization error. In the rich literature in feature selection,
information theory-based approaches seek a subset of features such that the
mutual information between the selected features and the class labels is
maximized. Despite the simplicity of this objective, there still remain several
open problems in optimization. These include, for example, the automatic
determination of the optimal subset size (i.e., the number of features) or a
stopping criterion if the greedy searching strategy is adopted. In this paper,
we suggest two stopping criteria by just monitoring the conditional mutual
information (CMI) among groups of variables. Using the recently developed
multivariate matrix-based Renyi's \alpha-entropy functional, which can be
directly estimated from data samples, we showed that the CMI among groups of
variables can be easily computed without any decomposition or approximation,
hence making our criteria easy to implement and seamlessly integrated into any
existing information theoretic feature selection methods with a greedy search
strategy.Comment: Paper published in the journal of Entrop
Resampling methods for parameter-free and robust feature selection with mutual information
Combining the mutual information criterion with a forward feature selection
strategy offers a good trade-off between optimality of the selected feature
subset and computation time. However, it requires to set the parameter(s) of
the mutual information estimator and to determine when to halt the forward
procedure. These two choices are difficult to make because, as the
dimensionality of the subset increases, the estimation of the mutual
information becomes less and less reliable. This paper proposes to use
resampling methods, a K-fold cross-validation and the permutation test, to
address both issues. The resampling methods bring information about the
variance of the estimator, information which can then be used to automatically
set the parameter and to calculate a threshold to stop the forward procedure.
The procedure is illustrated on a synthetic dataset as well as on real-world
examples
- …