10,096 research outputs found
Resampling methods for parameter-free and robust feature selection with mutual information
Combining the mutual information criterion with a forward feature selection
strategy offers a good trade-off between optimality of the selected feature
subset and computation time. However, it requires to set the parameter(s) of
the mutual information estimator and to determine when to halt the forward
procedure. These two choices are difficult to make because, as the
dimensionality of the subset increases, the estimation of the mutual
information becomes less and less reliable. This paper proposes to use
resampling methods, a K-fold cross-validation and the permutation test, to
address both issues. The resampling methods bring information about the
variance of the estimator, information which can then be used to automatically
set the parameter and to calculate a threshold to stop the forward procedure.
The procedure is illustrated on a synthetic dataset as well as on real-world
examples
Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition
Sparse representation based classification (SRC) methods have achieved
remarkable results. SRC, however, still suffer from requiring enough training
samples, insufficient use of test samples and instability of representation. In
this paper, a stable inverse projection representation based classification
(IPRC) is presented to tackle these problems by effectively using test samples.
An IPR is firstly proposed and its feasibility and stability are analyzed. A
classification criterion named category contribution rate is constructed to
match the IPR and complete classification. Moreover, a statistical measure is
introduced to quantify the stability of representation-based classification
methods. Based on the IPRC technique, a robust tumor recognition framework is
presented by interpreting microarray gene expression data, where a two-stage
hybrid gene selection method is introduced to select informative genes.
Finally, the functional analysis of candidate's pathogenicity-related genes is
given. Extensive experiments on six public tumor microarray gene expression
datasets demonstrate the proposed technique is competitive with
state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table
Fast Selection of Spectral Variables with B-Spline Compression
The large number of spectral variables in most data sets encountered in
spectral chemometrics often renders the prediction of a dependent variable
uneasy. The number of variables hopefully can be reduced, by using either
projection techniques or selection methods; the latter allow for the
interpretation of the selected variables. Since the optimal approach of testing
all possible subsets of variables with the prediction model is intractable, an
incremental selection approach using a nonparametric statistics is a good
option, as it avoids the computationally intensive use of the model itself. It
has two drawbacks however: the number of groups of variables to test is still
huge, and colinearities can make the results unstable. To overcome these
limitations, this paper presents a method to select groups of spectral
variables. It consists in a forward-backward procedure applied to the
coefficients of a B-Spline representation of the spectra. The criterion used in
the forward-backward procedure is the mutual information, allowing to find
nonlinear dependencies between variables, on the contrary of the generally used
correlation. The spline representation is used to get interpretability of the
results, as groups of consecutive spectral variables will be selected. The
experiments conducted on NIR spectra from fescue grass and diesel fuels show
that the method provides clearly identified groups of selected variables,
making interpretation easy, while keeping a low computational load. The
prediction performances obtained using the selected coefficients are higher
than those obtained by the same method applied directly to the original
variables and similar to those obtained using traditional models, although
using significantly less spectral variables
- …