269 research outputs found

    Dimensionality Reduction and Classification feature using Mutual Information applied to Hyperspectral Images : A Filter strategy based algorithm

    Full text link
    Hyperspectral images (HIS) classification is a high technical remote sensing tool. The goal is to reproduce a thematic map that will be compared with a reference ground truth map (GT), constructed by expecting the region. The HIS contains more than a hundred bidirectional measures, called bands (or simply images), of the same region. They are taken at juxtaposed frequencies. Unfortunately, some bands contain redundant information, others are affected by the noise, and the high dimensionality of features made the accuracy of classification lower. The problematic is how to find the good bands to classify the pixels of regions. Some methods use Mutual Information (MI) and threshold, to select relevant bands, without treatment of redundancy. Others control and eliminate redundancy by selecting the band top ranking the MI, and if its neighbors have sensibly the same MI with the GT, they will be considered redundant and so discarded. This is the most inconvenient of this method, because this avoids the advantage of hyperspectral images: some precious information can be discarded. In this paper we'll accept the useful redundancy. A band contains useful redundancy if it contributes to produce an estimated reference map that has higher MI with the GT.nTo control redundancy, we introduce a complementary threshold added to last value of MI. This process is a Filter strategy; it gets a better performance of classification accuracy and not expensive, but less preferment than Wrapper strategy.Comment: 11 pages, 5 figures, journal pape

    A Novel Memetic Feature Selection Algorithm

    Get PDF
    Feature selection is a problem of finding efficient features among all features in which the final feature set can improve accuracy and reduce complexity. In feature selection algorithms search strategies are key aspects. Since feature selection is an NP-Hard problem; therefore heuristic algorithms have been studied to solve this problem. In this paper, we have proposed a method based on memetic algorithm to find an efficient feature subset for a classification problem. It incorporates a filter method in the genetic algorithm to improve classification performance and accelerates the search in identifying core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the multivariate feature information. Empirical study on commonly data sets of the university of California, Irvine shows that the proposed method outperforms existing methods

    A Rank Minrelation - Majrelation Coefficient

    Full text link
    Improving the detection of relevant variables using a new bivariate measure could importantly impact variable selection and large network inference methods. In this paper, we propose a new statistical coefficient that we call the rank minrelation coefficient. We define a minrelation of X to Y (or equivalently a majrelation of Y to X) as a measure that estimate p(Y > X) when X and Y are continuous random variables. The approach is similar to Lin's concordance coefficient that rather focuses on estimating p(X = Y). In other words, if a variable X exhibits a minrelation to Y then, as X increases, Y is likely to increases too. However, on the contrary to concordance or correlation, the minrelation is not symmetric. More explicitly, if X decreases, little can be said on Y values (except that the uncertainty on Y actually increases). In this paper, we formally define this new kind of bivariate dependencies and propose a new statistical coefficient in order to detect those dependencies. We show through several key examples that this new coefficient has many interesting properties in order to select relevant variables, in particular when compared to correlation

    An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

    Full text link
    Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.Comment: European Signal Processing Conference 201

    Weighted Heuristic Ensemble of Filters

    Get PDF
    Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to β€˜good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability

    ΠžΡ‚Π±ΠΎΡ€ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ… гСомСтричСских ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ядСр ΠΊΠ»Π΅Ρ‚ΠΎΠΊ Π½Π° Π»ΡŽΠΌΠΈΠ½Π΅ΡΡ†Π΅Π½Ρ‚Π½Ρ‹Ρ… изобраТСниях Ρ€Π°ΠΊΠΎΠ²Ρ‹Ρ… ΠΊΠ»Π΅Ρ‚ΠΎΠΊ

    Get PDF
    The methods of geometric informative features selection of nuclei on fluorescent images of cancer cells are considered. During the survey, a review of existing geometric features was carried out, including both the signs of rotation resisted shape and displacement of the image, as well as signs of location in space. For the selection of characteristics, the methods were used: median, correlation with calculation of the Pearson correlation coefficient, correlation with calculation of the Spearman correlation coefficient, logistic regression model, random forest with CART trees and Gini criterion, random forest with CART trees and error minimization criterion. As a result of the investigation 11 characteristics were selected from 59 features, the quality of classification and time costs were calculated depending on the number of features for describing the objects. The use of 11 features is sufficient for the accuracy of classification as it allows to reduce time costs in 2,3 times.РассмотрСны ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ ΠΎΡ‚Π±ΠΎΡ€Π° ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² для выдСлСния гСомСтричСских ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΏΡ€ΠΈ описании ядСр Π½Π° Π»ΡŽΠΌΠΈΠ½Π΅ΡΡ†Π΅Π½Ρ‚Π½Ρ‹Ρ… изобраТСниях Ρ€Π°ΠΊΠΎΠ²Ρ‹Ρ… ΠΊΠ»Π΅Ρ‚ΠΎΠΊ. Π’Ρ‹ΠΏΠΎΠ»Π½Π΅Π½ ΠΎΠ±Π·ΠΎΡ€ ΡΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΡ… гСомСтричСских ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ², ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Π² сСбя ΠΊΠ°ΠΊ ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΈ Ρ„ΠΎΡ€ΠΌΡ‹, устойчивыС ΠΊ ΠΏΠΎΠ²ΠΎΡ€ΠΎΡ‚Ρƒ ΠΈ ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Ρ‰Π΅Π½ΠΈΡŽ изобраТСния, Ρ‚Π°ΠΊ ΠΈ ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΈ располоТСния Π² пространствС. Для ΠΎΡ‚Π±ΠΎΡ€Π° Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Π½Ρ‹ ΡˆΠ΅ΡΡ‚ΡŒ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ²: ΠΌΠ΅Π΄ΠΈΠ°Π½Π½Ρ‹ΠΉ, коррСляционный с расчСтом коэффициСнта коррСляции ΠΏΠΎ ΠŸΠΈΡ€ΡΠΎΠ½Ρƒ, коррСляционный с расчСтом коэффициСнта коррСляции ΠΏΠΎ Π‘ΠΏΠΈΡ€ΠΌΠ΅Π½Ρƒ, ΠΌΠ΅Ρ‚ΠΎΠ΄ логистичСской рСгрСссии, случайного лСса с CART-Π΄Π΅Ρ€Π΅Π²ΡŒΡΠΌΠΈ ΠΈ ΠΊΡ€ΠΈΡ‚Π΅Ρ€ΠΈΠ΅ΠΌ Gini, случайного лСса с CART-Π΄Π΅Ρ€Π΅Π²ΡŒΡΠΌΠΈ ΠΈ ΠΊΡ€ΠΈΡ‚Π΅Ρ€ΠΈΠ΅ΠΌ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ ошибки. Π’ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Π΅ исслСдования ΠΈΠ· 59 ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΎΡ‚ΠΎΠ±Ρ€Π°Π½Ρ‹ 11 Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠ²Π½Ρ‹Ρ…, Π²Ρ‹ΠΏΠΎΠ»Π½Π΅Π½ Π°Π½Π°Π»ΠΈΠ· качСства классификации с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ ΠΌΠ΅Ρ‚ΠΎΠ΄Π° случайного лСса ΠΈ рассчитаны Π²Ρ€Π΅ΠΌΠ΅Π½Π½Ρ‹Π΅ Π·Π°Ρ‚Ρ€Π°Ρ‚Ρ‹ Π² зависимости ΠΎΡ‚ количСства ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² для описания ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ². Для ΠΌΠ΅Ρ‚ΠΎΠ΄Π° случайного лСса использованиС 11 ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² являСтся достаточным ΠΏΠΎ точности классификации ΠΈ позволяСт ΡΠ½ΠΈΠ·ΠΈΡ‚ΡŒ Π²Ρ€Π΅ΠΌΠ΅Π½Π½Ρ‹Π΅ Π·Π°Ρ‚Ρ€Π°Ρ‚Ρ‹ Π² 2,3 Ρ€Π°Π·Π°

    An Effective Algorithm for Correlation Attribute Subset Selection by Using Genetic Algorithm Based On Naive Bays Classifier

    Full text link
    In recent years, application of feature selection methods in various datasets has greatly increased. Feature selection is an important topic in data mining, especially for high dimensional datasets. Feature selection (also known as subset selection) is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of a learning algorithm. The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. The challenging task in feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without increasing the complexity of the modeling task. Feature selection that selects a subset of most salient features and removes irrelevant, redundant and noisy features is a process commonly employed in machine learning to solve the high dimensionality problem. It focuses learning algorithms on most useful aspects of data, thereby making learning task faster and more accurate. A data warehouse is designed to consolidate and maintain all features that are relevant for the analysis processes
    • …
    corecore