52,388 research outputs found

    A hybrid feature selection method for complex diseases SNPs

    Get PDF
    Machine learning techniques have the potential to revolutionize medical diagnosis. Single Nucleotide Polymorphisms (SNPs) are one of the most important sources of human genome variability; thus, they have been implicated in several human diseases. To separate the affected samples from the normal ones, various techniques have been applied on SNPs. Achieving high classification accuracy in such a high-dimensional space is crucial for successful diagnosis and treatment. In this work, we propose an accurate hybrid feature selection method for detecting the most informative SNPs and selecting an optimal SNP subset. The proposed method is based on the fusion of a filter and a wrapper method, i.e., the Conditional Mutual Information Maximization (CMIM) method and the support vector machine-recursive feature elimination, respectively. The performance of the proposed method was evaluated against four state-of-The-Art feature selection methods, minimum redundancy maximum relevancy, fast correlation-based feature selection, CMIM, and ReliefF, using four classifiers, support vector machine, naive Bayes, linear discriminant analysis, and k nearest neighbors on five different SNP data sets obtained from the National Center for Biotechnology Information gene expression omnibus genomics data repository. The experimental results demonstrate the efficiency of the adopted feature selection approach outperforming all of the compared feature selection algorithms and achieving up to 96% classification accuracy for the used data set. In general, from these results we conclude that SNPs of the whole genome can be efficiently employed to distinguish affected individuals with complex diseases from the healthy ones. 1 2013 IEEE.Scopu

    Resampling methods for parameter-free and robust feature selection with mutual information

    Get PDF
    Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

    Extensions of stability selection using subsamples of observations and covariates

    Full text link
    We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and B\"uhlmann (J R Stat Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly to random observation subsamples and covariate subsets under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and B\"uhlmann (J R Stat Soc 72:417-473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method.Comment: accepted for publication in Statistics and Computin

    Collaborative Summarization of Topic-Related Videos

    Full text link
    Large collections of videos are grouped into clusters by a topic keyword, such as Eiffel Tower or Surfing, with many important visual concepts repeating across them. Such a topically close set of videos have mutual influence on each other, which could be used to summarize one of them by exploiting information from others in the set. We build on this intuition to develop a novel approach to extract a summary that simultaneously captures both important particularities arising in the given video, as well as, generalities identified from the set of videos. The topic-related videos provide visual context to identify the important parts of the video being summarized. We achieve this by developing a collaborative sparse optimization method which can be efficiently solved by a half-quadratic minimization algorithm. Our work builds upon the idea of collaborative techniques from information retrieval and natural language processing, which typically use the attributes of other similar objects to predict the attribute of a given object. Experiments on two challenging and diverse datasets well demonstrate the efficacy of our approach over state-of-the-art methods.Comment: CVPR 201

    Feature Selection via Coalitional Game Theory

    Get PDF
    We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets
    corecore