97,125 research outputs found

    Conditional-Entropy Metrics for Feature Selection

    Get PDF
    Institute for Communicating and Collaborative SystemsWe examine the task of feature selection, which is a method of forming simplified descriptions of complex data for use in probabilistic classifiers. Feature selection typically requires a numerical measure or metric of the desirability of a given set of features. The thesis considers a number of existing metrics, with particular attention to those based on entropy and other quantities derived from information theory. A useful new perspective on feature selection is provided by the concepts of partitioning and encoding of data by a feature set. The ideas of partitioning and encoding, together with the theoretical shortcomings of existing metrics, motivate a new class of feature selection metrics based on conditional entropy. The simplest of the new metrics is referred to as expected partition entropy or EPE. Performances of the new and existing metrics are compared by experiments with a simplified form of part-of-speech tagging and with classification of Reuters news stories by topic. In order to conduct the experiments, a new class of accelerated feature selection search algorithms is introduced; a member of this class is found to provide significantly increased speed with minimal loss in performance, as measured by feature selection metrics and accuracy on test data. The comparative performance of existing metrics is also analysed, giving rise to a new general conjecture regarding the wrapper class of metrics. Each wrapper is inherently tied to a specific type of classifier. The experimental results support the idea that a wrapper selects feature sets which perform well in conjunction with its own particular classifier, but this good performance cannot be expected to carry over to other types of model. The new metrics introduced in this thesis prove to have substantial advantages over a representative selection of other feature selection mechanisms: Mutual information, frequency-based cutoff, the Koller-Sahami information loss measure, and two different types of wrapper method. Feature selection using the new metrics easily outperforms other filter-based methods such as mutual information; additionally, our approach attains comparable performance to a wrapper method, but at a fraction of the computational expense. Finally, members of the new class of metrics succeed in a case where the Koller-Sahami metric fails to provide a meaningful criterion for feature selection

    Breast Cancer Classification by Gene Expression Analysis using Hybrid Feature Selection and Hyper-heuristic Adaptive Universum Support Vector Machine

    Get PDF
    Comprehensive assessments of the molecular characteristics of breast cancer from gene expression patterns can aid in the early identification and treatment of tumor patients. The enormous scale of gene expression data obtained through microarray sequencing increases the difficulty of training the classifier due to large-scale features. Selecting pivotal gene features can minimize high dimensionality and the classifier complexity with improved breast cancer detection accuracy. However, traditional filter and wrapper-based selection methods have scalability and adaptability issues in handling complex gene features. This paper presents a hybrid feature selection method of Mutual Information Maximization - Improved Moth Flame Optimization (MIM-IMFO) for gene selection along with an advanced Hyper-heuristic Adaptive Universum Support classification model Vector Machine (HH-AUSVM) to improve cancer detection rates. The hybrid gene selection method is developed by performing filter-based selection using MIM in the first stage followed by the wrapper method in the second stage, to obtain the pivotal features and remove the inappropriate ones. This method improves standard MFO by a hybrid exploration/exploitation phase to accomplish a better trade-off between exploration and exploitation phases. The classifier HH-AUSVM is formulated by integrating the Adaptive Universum learning approach to the hyper- heuristics-based parameter optimized SVM to tackle the class samples imbalance problem. Evaluated on breast cancer gene expression datasets from Mendeley Data Repository, this proposed MIM-IMFO gene selection-based HH-AUSVM classification approach provided better breast cancer detection with high accuracies of 95.67%, 96.52%, 97.97% and 95.5% and less processing time of 4.28, 3.17, 9.45 and 6.31 seconds, respectively

    A hybrid feature selection method for complex diseases SNPs

    Get PDF
    Machine learning techniques have the potential to revolutionize medical diagnosis. Single Nucleotide Polymorphisms (SNPs) are one of the most important sources of human genome variability; thus, they have been implicated in several human diseases. To separate the affected samples from the normal ones, various techniques have been applied on SNPs. Achieving high classification accuracy in such a high-dimensional space is crucial for successful diagnosis and treatment. In this work, we propose an accurate hybrid feature selection method for detecting the most informative SNPs and selecting an optimal SNP subset. The proposed method is based on the fusion of a filter and a wrapper method, i.e., the Conditional Mutual Information Maximization (CMIM) method and the support vector machine-recursive feature elimination, respectively. The performance of the proposed method was evaluated against four state-of-The-Art feature selection methods, minimum redundancy maximum relevancy, fast correlation-based feature selection, CMIM, and ReliefF, using four classifiers, support vector machine, naive Bayes, linear discriminant analysis, and k nearest neighbors on five different SNP data sets obtained from the National Center for Biotechnology Information gene expression omnibus genomics data repository. The experimental results demonstrate the efficiency of the adopted feature selection approach outperforming all of the compared feature selection algorithms and achieving up to 96% classification accuracy for the used data set. In general, from these results we conclude that SNPs of the whole genome can be efficiently employed to distinguish affected individuals with complex diseases from the healthy ones. 1 2013 IEEE.Scopu

    Robust Feature Selection by Mutual Information Distributions

    Full text link
    Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.Comment: 8 two-column page

    Improving the Generalisability of Brain Computer Interface Applications via Machine Learning and Search-Based Heuristics

    Get PDF
    Brain Computer Interfaces (BCI) are a domain of hardware/software in which a user can interact with a machine without the need for motor activity, communicating instead via signals generated by the nervous system. These interfaces provide life-altering benefits to users, and refinement will both allow their application to a much wider variety of disabilities, and increase their practicality. The primary method of acquiring these signals is Electroencephalography (EEG). This technique is susceptible to a variety of different sources of noise, which compounds the inherent problems in BCI training data: large dimensionality, low numbers of samples, and non-stationarity between users and recording sessions. Feature Selection and Transfer Learning have been used to overcome these problems, but they fail to account for several characteristics of BCI. This thesis extends both of these approaches by the use of Search-based algorithms. Feature Selection techniques, known as Wrappers use ‘black box’ evaluation of feature subsets, leading to higher classification accuracies than ranking methods known as Filters. However, Wrappers are more computationally expensive, and are prone to over-fitting to training data. In this thesis, we applied Iterated Local Search (ILS) to the BCI field for the first time in literature, and demonstrated competitive results with state-of-the-art methods such as Least Absolute Shrinkage and Selection Operator and Genetic Algorithms. We then developed ILS variants with guided perturbation operators. Linkage was used to develop a multivariate metric, Intrasolution Linkage. This takes into account pair-wise dependencies of features with the label, in the context of the solution. Intrasolution Linkage was then integrated into two ILS variants. The Intrasolution Linkage Score was discovered to have a stronger correlation with the solutions predictive accuracy on unseen data than Cross Validation Error (CVE) on the training set, the typical approach to feature subset evaluation. Mutual Information was used to create Minimum Redundancy Maximum Relevance Iterated Local Search (MRMR-ILS). In this algorithm, the perturbation operator was guided using an existing Mutual Information measure, and compared with current Filter and Wrapper methods. It was found to achieve generally lower CVE rates and higher predictive accuracy on unseen data than existing algorithms. It was also noted that solutions found by the MRMR-ILS provided CVE rates that had a stronger correlation with the accuracy on unseen data than solutions found by other algorithms. We suggest that this may be due to the guided perturbation leading to solutions that are richer in Mutual Information. Feature Selection reduces computational demands and can increase the accuracy of our desired models, as evidenced in this thesis. However, limited quantities of training samples restricts these models, and greatly reduces their generalisability. For this reason, utilisation of data from a wide range of users is an ideal solution. Due to the differences in neural structures between users, creating adequate models is difficult. We adopted an existing state-of-the-art ensemble technique Ensemble Learning Generic Information (ELGI), and developed an initial optimisation phase. This involved using search to transplant instances between user subsets to increase the generalisability of each subset, before combination in the ELGI. We termed this Evolved Ensemble Learning Generic Information (eELGI). The eELGI achieved higher accuracy than user-specific BCI models, across all eight users. Optimisation of the training dataset allowed smaller training sets to be used, offered protection against neural drift, and created models that performed similarly across participants, regardless of neural impairment. Through the introduction and hybridisation of search based algorithms to several problems in BCI we have been able to show improvements in modelling accuracy and efficiency. Ultimately, this represents a step towards more practical BCI systems that will provide life altering benefits to users

    Feature selection for chemical sensor arrays using mutual information

    Get PDF
    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

    Distribution of Mutual Information from Complete and Incomplete Data

    Full text link
    Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
    corecore