Search CORE

10,096 research outputs found

Resampling methods for parameter-free and robust feature selection with mutual information

Author: Andreas Hahn
Battiti
Bellmann
Benoudjit
Bonnlander
Conrad
Craddock
D. François
Dijck
Diks
F. Rossi
Fleuret
Frank
François
Friedman
Fung
Good
Guyon
Guyon
Hammer
Hild
Hoffman
Hummel
Kraskov
Kwak
Kwak
M. Verleysen
Nicolaou
Opdyke
Purushothaman
Rossi
Rossi
Rossi
Scott
Stefansson
V. Wertz
Verikas
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

DIAL UCLouvain

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

Author: Chen Yun-Mei
Tian Li
Wu Wen-Ming
Xu Shuang
Yang Li-Jun
Yang Xiao-Hui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table

arXiv.org e-Print Archive

Crossref

Query-dependent metric learning for adaptive, content-based image browsing and retrieval

Author: Han Junwei
McKenna Stephen
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/10/2014
Field of study

University of Dundee Online Publications

Fast Selection of Spectral Variables with B-Spline Compression

Author: François Damien
Meurens Marc
Rossi Fabrice
Verleysen Michel
Wertz Vincent
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual information, allowing to find nonlinear dependencies between variables, on the contrary of the generally used correlation. The spline representation is used to get interpretability of the results, as groups of consecutive spectral variables will be selected. The experiments conducted on NIR spectra from fescue grass and diesel fuels show that the method provides clearly identified groups of selected variables, making interpretation easy, while keeping a low computational load. The prediction performances obtained using the selected coefficients are higher than those obtained by the same method applied directly to the original variables and similar to those obtained using traditional models, although using significantly less spectral variables

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

DIAL UCLouvain