2,726 research outputs found

    Feature selection for microarray gene expression data using simulated annealing guided by the multivariate joint entropy

    Get PDF
    In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous TAFS algorithms-- implements a simulated annealing technique specially designed for feature subset selection. The algorithm is applied to the maximization of gene subset relevance in several public-domain microarray data sets. The experimental results show a notoriously high classification performance and low size subsets formed by biologically meaningful genes.Postprint (published version

    Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks

    Full text link
    We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl

    A Triclustering Approach for Time Evolving Graphs

    Full text link
    This paper introduces a novel technique to track structures in time evolving graphs. The method is based on a parameter free approach for three-dimensional co-clustering of the source vertices, the target vertices and the time. All these features are simultaneously segmented in order to build time segments and clusters of vertices whose edge distributions are similar and evolve in the same way over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make an a priori discretization. Experiments conducted on a synthetic dataset illustrate the good behaviour of the technique, and a study of a real-life dataset shows the potential of the proposed approach for exploratory data analysis

    Effect of Feature Selection on Gene Expression Datasets Classification Accurac

    Get PDF
    Feature selection attracts researchers who deal with machine learning and data mining. It consists of selecting the variables that have the greatest impact on the dataset classification, and discarding the rest. This dimentionality reduction allows classifiers to be fast and more accurate. This paper traits the effect of feature selection on the accuracy of widely used classifiers in literature. These classifiers are compared with three real datasets which are pre-processed with feature selection methods. More than 9% amelioration in classification accuracy is observed, and k-means appears to be the most sensitive classifier to feature selection
    • …
    corecore