Search CORE

928 research outputs found

Mutual information for the selection of relevant variables in spectrometric nonlinear modelling

Author: A. Lendasse
Battiti
Benoudjit
Benoudjit
Blanco
Blanco
Chen
Cover
D. François
Dobrushin
Efron
F. Rossi
Grassberger
Gray
Harald
Kozachenko
Kraskov
Lee
Lendasse
M. Verleysen
Ozaki
Powell
Rossi
Scott
Sekulic
Shannon
Somorjai
Stone
Suykens
Suykens
Suykens
V. Wertz
Vasicek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Data from spectrophotometers form vectors of a large number of exploitable variables. Building quantitative models using these variables most often requires using a smaller set of variables than the initial one. Indeed, a too large number of input variables to a model results in a too large number of parameters, leading to overfitting and poor generalization abilities. In this paper, we suggest the use of the mutual information measure to select variables from the initial set. The mutual information measures the information content in input variables with respect to the model output, without making any assumption on the model that will be used; it is thus suitable for nonlinear modelling. In addition, it leads to the selection of variables among the initial set, and not to linear or nonlinear combinations of them. Without decreasing the model performances compared to other variable projection methods, it allows therefore a greater interpretability of the results

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

DIAL UCLouvain

Resampling methods for parameter-free and robust feature selection with mutual information

Author: Andreas Hahn
Battiti
Bellmann
Benoudjit
Bonnlander
Conrad
Craddock
D. François
Dijck
Diks
F. Rossi
Fleuret
Frank
François
Friedman
Fung
Good
Guyon
Guyon
Hammer
Hild
Hoffman
Hummel
Kraskov
Kwak
Kwak
M. Verleysen
Nicolaou
Opdyke
Purushothaman
Rossi
Rossi
Rossi
Scott
Stefansson
V. Wertz
Verikas
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

DIAL UCLouvain

Fast Selection of Spectral Variables with B-Spline Compression

Author: François Damien
Meurens Marc
Rossi Fabrice
Verleysen Michel
Wertz Vincent
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual information, allowing to find nonlinear dependencies between variables, on the contrary of the generally used correlation. The spline representation is used to get interpretability of the results, as groups of consecutive spectral variables will be selected. The experiments conducted on NIR spectra from fescue grass and diesel fuels show that the method provides clearly identified groups of selected variables, making interpretation easy, while keeping a low computational load. The prediction performances obtained using the selected coefficients are higher than those obtained by the same method applied directly to the original variables and similar to those obtained using traditional models, although using significantly less spectral variables

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

DIAL UCLouvain

Advances in Feature Selection with Mutual Information

Author: A. Kraskov
C. Borggaard
C. Krier
D. François
D. François
D. Scott
F. Rossi
L.F. Kozachenko
M.N. Goria
T. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The selection of features that are relevant for a prediction or classification problem is an important problem in many domains involving high-dimensional data. Selecting features helps fighting the curse of dimensionality, improving the performances of prediction or classification methods, and interpreting the application. In a nonlinear context, the mutual information is widely used as relevance criterion for features and sets of features. Nevertheless, it suffers from at least three major limitations: mutual information estimators depend on smoothing parameters, there is no theoretically justified stopping criterion in the feature selection greedy procedure, and the estimation itself suffers from the curse of dimensionality. This chapter shows how to deal with these problems. The two first ones are addressed by using resampling techniques that provide a statistical basis to select the estimator parameters and to stop the search procedure. The third one is addressed by modifying the mutual information criterion into a measure of how features are complementary (and not only informative) for the problem at hand

arXiv.org e-Print Archive

Crossref

HAL Descartes

DIAL UCLouvain

Feature Selection for Interpatient Supervised Heart Beat Classification

Author: de Lannoy G.
Doquire G.
François D.
Verleysen M.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2011
Field of study

Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and the features retained in the final model are either chosen using domain knowledge or an exhaustive search in the feature sets without evaluating the relevance of each individual feature included in the classifier. As a consequence, the results obtained by these models can be suboptimal and difficult to interpret. In this work, feature selection techniques are considered to extract optimal feature subsets for state-of-the-art ECG classification models. The performances are evaluated on real ambulatory recordings and compared to previously reported feature choices using the same models. Results indicate that a small number of individual features actually serve the classification and that better performances can be achieved by removing useless features

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

DIAL UCLouvain

Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images

Author: Auffarth B.
Cerquides J.
Lopez M.
Publication venue: Springer Heidelberg
Publication date: 17/07/2008
Field of study

We study filter–based feature selection methods for classification of biomedical images. For feature selection, we use two filters — a relevance filter which measures usefulness of individual features for target prediction, and a redundancy filter, which measures similarity between features. As selection method that combines relevance and redundancy we try out a Hopfield network. We experimentally compare selection methods, running unitary redundancy and relevance filters, against a greedy algorithm with redundancy thresholds [9], the min-redundancy max-relevance integration [8,23,36], and our Hopfield network selection. We conclude that on the whole, Hopfield selection was one of the most successful methods, outperforming min-redundancy max-relevance when\ud more features are selected

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

EEG Signal Processing for Epilepsy

Author: Angel Navia-Vazquez
Armando Malanda Trigueros
Carlos Guerrero-Mosquera
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

IntechOpen

CiteSeerX

Crossref

Mutual information-based feature selection enhances fMRI brain activity classification

Author: Damon Cécilia
Michel Vincent
Thirion Bertrand
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/05/2008
Field of study

International audienceIn this paper, we adress the question of decoding cognitive information from functional Magnetic Resonance (MR) images using classification techniques. The main bottleneck for accurate prediction is the selection of informative features (voxels). We develop a multivariate approach based on a mutual information criterion, estimated by nearest neighbors. This method can handle a large number of dimensions and is able to detect the non-linear correlations between the features and the label. We show that, by using MI-based feature selection, we can achieve better perfomance together with sparse feature selection, and thus a better understanding of information coding within the brain than the reference method which is a mass univariate selection (ANOVA)

Crossref

INRIA a CCSD electronic archive server

HAL-CEA