29 research outputs found
Simple stopping criteria for information theoretic feature selection
Feature selection aims to select the smallest feature subset that yields the
minimum generalization error. In the rich literature in feature selection,
information theory-based approaches seek a subset of features such that the
mutual information between the selected features and the class labels is
maximized. Despite the simplicity of this objective, there still remain several
open problems in optimization. These include, for example, the automatic
determination of the optimal subset size (i.e., the number of features) or a
stopping criterion if the greedy searching strategy is adopted. In this paper,
we suggest two stopping criteria by just monitoring the conditional mutual
information (CMI) among groups of variables. Using the recently developed
multivariate matrix-based Renyi's \alpha-entropy functional, which can be
directly estimated from data samples, we showed that the CMI among groups of
variables can be easily computed without any decomposition or approximation,
hence making our criteria easy to implement and seamlessly integrated into any
existing information theoretic feature selection methods with a greedy search
strategy.Comment: Paper published in the journal of Entrop
Feature Selection for Interpatient Supervised Heart Beat Classification
Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and the features retained in the final model are either chosen using domain knowledge or an exhaustive search in the feature sets without evaluating the relevance of each individual feature included in the classifier. As a consequence, the results obtained by these models can be suboptimal and difficult to interpret. In this work, feature selection techniques are considered to extract optimal feature subsets for state-of-the-art ECG classification models. The performances are evaluated on real ambulatory recordings and compared to previously reported feature choices using the same models. Results indicate that a small number of individual features actually serve the classification and that better performances can be achieved by removing useless features
Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System
Keeping computer reliability to confirm reliable, secure, and truthful correspondence of data between different enterprises is a major security issue. Ensuring information correspondence over the web or computer grids is always under threat of hackers or intruders. Many techniques have been utilized in intrusion detections, but all have flaws. In this paper, a new hybrid technique is proposed, which combines the Ensemble of Feature Selection (EFS) algorithm and Teaching Learning-Based Optimization (TLBO) techniques. In the proposed, EFS-TLBO method, the EFS strategy is applied to rank the features for choosing the ideal best subset of applicable information, and the TLBO is utilized to identify the most important features from the produced datasets. The TLBO algorithm uses the Extreme Learning Machine (ELM) to choose the most effective attributes and to enhance classification accuracy. The performance of the recommended technique is evaluated in a benchmark dataset. The experimental outcomes depict that the proposed model has high predictive accuracy, detection rate, false-positive rate, and requires less significant attributes than other techniques known from the literature
Hyperspectral Images Classification and Dimensionality Reduction using spectral interaction and SVM classifier
Over the past decades, the hyperspectral remote sensing technology
development has attracted growing interest among scientists in various domains.
The rich and detailed spectral information provided by the hyperspectral
sensors has improved the monitoring and detection capabilities of the earth
surface substances. However, the high dimensionality of the hyperspectral
images (HSI) is one of the main challenges for the analysis of the collected
data. The existence of noisy, redundant and irrelevant bands increases the
computational complexity, induce the Hughes phenomenon and decrease the
target's classification accuracy. Hence, the dimensionality reduction is an
essential step to face the dimensionality challenges. In this paper, we propose
a novel filter approach based on the maximization of the spectral interaction
measure and the support vector machines for dimensionality reduction and
classification of the HSI. The proposed Max Relevance Max Synergy (MRMS)
algorithm evaluates the relevance of every band through the combination of
spectral synergy, redundancy and relevance measures. Our objective is to select
the optimal subset of synergistic bands providing accurate classification of
the supervised scene materials. Experimental results have been performed using
three different hyperspectral datasets: "Indiana Pine", "Pavia University" and
"Salinas" provided by the "NASA-AVIRIS" and the "ROSIS" spectrometers.
Furthermore, a comparison with the state of the art band selection methods has
been carried out in order to demonstrate the robustness and efficiency of the
proposed approach.
Keywords: Hyperspectral images, remote sensing, dimensionality reduction,
classification, synergic, correlation, spectral interaction information, mutual
infor
Multiple-input multiple-output causal strategies for gene selection
Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.Journal ArticleResearch Support, N.I.H. ExtramuralResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe