29 research outputs found

    Simple stopping criteria for information theoretic feature selection

    Full text link
    Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Renyi's \alpha-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy.Comment: Paper published in the journal of Entrop

    Feature Selection for Interpatient Supervised Heart Beat Classification

    Get PDF
    Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and the features retained in the final model are either chosen using domain knowledge or an exhaustive search in the feature sets without evaluating the relevance of each individual feature included in the classifier. As a consequence, the results obtained by these models can be suboptimal and difficult to interpret. In this work, feature selection techniques are considered to extract optimal feature subsets for state-of-the-art ECG classification models. The performances are evaluated on real ambulatory recordings and compared to previously reported feature choices using the same models. Results indicate that a small number of individual features actually serve the classification and that better performances can be achieved by removing useless features

    Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System

    Get PDF
    Keeping computer reliability to confirm reliable, secure, and truthful correspondence of data between different enterprises is a major security issue. Ensuring information correspondence over the web or computer grids is always under threat of hackers or intruders. Many techniques have been utilized in intrusion detections, but all have flaws. In this paper, a new hybrid technique is proposed, which combines the Ensemble of Feature Selection (EFS) algorithm and Teaching Learning-Based Optimization (TLBO) techniques. In the proposed, EFS-TLBO method, the EFS strategy is applied to rank the features for choosing the ideal best subset of applicable information, and the TLBO is utilized to identify the most important features from the produced datasets. The TLBO algorithm uses the Extreme Learning Machine (ELM) to choose the most effective attributes and to enhance classification accuracy. The performance of the recommended technique is evaluated in a benchmark dataset. The experimental outcomes depict that the proposed model has high predictive accuracy, detection rate, false-positive rate, and requires less significant attributes than other techniques known from the literature

    Hyperspectral Images Classification and Dimensionality Reduction using spectral interaction and SVM classifier

    Full text link
    Over the past decades, the hyperspectral remote sensing technology development has attracted growing interest among scientists in various domains. The rich and detailed spectral information provided by the hyperspectral sensors has improved the monitoring and detection capabilities of the earth surface substances. However, the high dimensionality of the hyperspectral images (HSI) is one of the main challenges for the analysis of the collected data. The existence of noisy, redundant and irrelevant bands increases the computational complexity, induce the Hughes phenomenon and decrease the target's classification accuracy. Hence, the dimensionality reduction is an essential step to face the dimensionality challenges. In this paper, we propose a novel filter approach based on the maximization of the spectral interaction measure and the support vector machines for dimensionality reduction and classification of the HSI. The proposed Max Relevance Max Synergy (MRMS) algorithm evaluates the relevance of every band through the combination of spectral synergy, redundancy and relevance measures. Our objective is to select the optimal subset of synergistic bands providing accurate classification of the supervised scene materials. Experimental results have been performed using three different hyperspectral datasets: "Indiana Pine", "Pavia University" and "Salinas" provided by the "NASA-AVIRIS" and the "ROSIS" spectrometers. Furthermore, a comparison with the state of the art band selection methods has been carried out in order to demonstrate the robustness and efficiency of the proposed approach. Keywords: Hyperspectral images, remote sensing, dimensionality reduction, classification, synergic, correlation, spectral interaction information, mutual infor

    Multiple-input multiple-output causal strategies for gene selection

    Get PDF
    Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.Journal ArticleResearch Support, N.I.H. ExtramuralResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore