52,388 research outputs found
A hybrid feature selection method for complex diseases SNPs
Machine learning techniques have the potential to revolutionize medical diagnosis. Single Nucleotide Polymorphisms (SNPs) are one of the most important sources of human genome variability; thus, they have been implicated in several human diseases. To separate the affected samples from the normal ones, various techniques have been applied on SNPs. Achieving high classification accuracy in such a high-dimensional space is crucial for successful diagnosis and treatment. In this work, we propose an accurate hybrid feature selection method for detecting the most informative SNPs and selecting an optimal SNP subset. The proposed method is based on the fusion of a filter and a wrapper method, i.e., the Conditional Mutual Information Maximization (CMIM) method and the support vector machine-recursive feature elimination, respectively. The performance of the proposed method was evaluated against four state-of-The-Art feature selection methods, minimum redundancy maximum relevancy, fast correlation-based feature selection, CMIM, and ReliefF, using four classifiers, support vector machine, naive Bayes, linear discriminant analysis, and k nearest neighbors on five different SNP data sets obtained from the National Center for Biotechnology Information gene expression omnibus genomics data repository. The experimental results demonstrate the efficiency of the adopted feature selection approach outperforming all of the compared feature selection algorithms and achieving up to 96% classification accuracy for the used data set. In general, from these results we conclude that SNPs of the whole genome can be efficiently employed to distinguish affected individuals with complex diseases from the healthy ones. 1 2013 IEEE.Scopu
Resampling methods for parameter-free and robust feature selection with mutual information
Combining the mutual information criterion with a forward feature selection
strategy offers a good trade-off between optimality of the selected feature
subset and computation time. However, it requires to set the parameter(s) of
the mutual information estimator and to determine when to halt the forward
procedure. These two choices are difficult to make because, as the
dimensionality of the subset increases, the estimation of the mutual
information becomes less and less reliable. This paper proposes to use
resampling methods, a K-fold cross-validation and the permutation test, to
address both issues. The resampling methods bring information about the
variance of the estimator, information which can then be used to automatically
set the parameter and to calculate a threshold to stop the forward procedure.
The procedure is illustrated on a synthetic dataset as well as on real-world
examples
Extensions of stability selection using subsamples of observations and covariates
We introduce extensions of stability selection, a method to stabilise
variable selection methods introduced by Meinshausen and B\"uhlmann (J R Stat
Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly
to random observation subsamples and covariate subsets under scrutiny, and to
select covariates based on their selection frequency. We analyse the effects
and benefits of these extensions. Our analysis generalizes the theoretical
results of Meinshausen and B\"uhlmann (J R Stat Soc 72:417-473, 2010) from the
case of half-samples to subsamples of arbitrary size. We study, in a
theoretical manner, the effect of taking random covariate subsets using a
simplified score model. Finally we validate these extensions on numerical
experiments on both synthetic and real datasets, and compare the obtained
results in detail to the original stability selection method.Comment: accepted for publication in Statistics and Computin
Collaborative Summarization of Topic-Related Videos
Large collections of videos are grouped into clusters by a topic keyword,
such as Eiffel Tower or Surfing, with many important visual concepts repeating
across them. Such a topically close set of videos have mutual influence on each
other, which could be used to summarize one of them by exploiting information
from others in the set. We build on this intuition to develop a novel approach
to extract a summary that simultaneously captures both important
particularities arising in the given video, as well as, generalities identified
from the set of videos. The topic-related videos provide visual context to
identify the important parts of the video being summarized. We achieve this by
developing a collaborative sparse optimization method which can be efficiently
solved by a half-quadratic minimization algorithm. Our work builds upon the
idea of collaborative techniques from information retrieval and natural
language processing, which typically use the attributes of other similar
objects to predict the attribute of a given object. Experiments on two
challenging and diverse datasets well demonstrate the efficacy of our approach
over state-of-the-art methods.Comment: CVPR 201
Feature Selection via Coalitional Game Theory
We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets
- …