Search CORE

269 research outputs found

Dimensionality Reduction and Classification feature using Mutual Information applied to Hyperspectral Images : A Filter strategy based algorithm

Author: Aboutajdine Driss
Hammouch Ahmed
Sarhrouni ELkebir
Publication venue
Publication date: 28/09/2012
Field of study

Hyperspectral images (HIS) classification is a high technical remote sensing tool. The goal is to reproduce a thematic map that will be compared with a reference ground truth map (GT), constructed by expecting the region. The HIS contains more than a hundred bidirectional measures, called bands (or simply images), of the same region. They are taken at juxtaposed frequencies. Unfortunately, some bands contain redundant information, others are affected by the noise, and the high dimensionality of features made the accuracy of classification lower. The problematic is how to find the good bands to classify the pixels of regions. Some methods use Mutual Information (MI) and threshold, to select relevant bands, without treatment of redundancy. Others control and eliminate redundancy by selecting the band top ranking the MI, and if its neighbors have sensibly the same MI with the GT, they will be considered redundant and so discarded. This is the most inconvenient of this method, because this avoids the advantage of hyperspectral images: some precious information can be discarded. In this paper we'll accept the useful redundancy. A band contains useful redundancy if it contributes to produce an estimated reference map that has higher MI with the GT.nTo control redundancy, we introduce a complementary threshold added to last value of MI. This process is a Filter strategy; it gets a better performance of classification accuracy and not expensive, but less preferment than Wrapper strategy.Comment: 11 pages, 5 figures, journal pape

arXiv.org e-Print Archive

A Novel Memetic Feature Selection Algorithm

Author: Faraahi Ahmad
Montazeri Mitra
Montazeri Mohadeseh
Naji Hamid Reza
Publication venue
Publication date: 30/05/2013
Field of study

Feature selection is a problem of finding efficient features among all features in which the final feature set can improve accuracy and reduce complexity. In feature selection algorithms search strategies are key aspects. Since feature selection is an NP-Hard problem; therefore heuristic algorithms have been studied to solve this problem. In this paper, we have proposed a method based on memetic algorithm to find an efficient feature subset for a classification problem. It incorporates a filter method in the genetic algorithm to improve classification performance and accelerates the search in identifying core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the multivariate feature information. Empirical study on commonly data sets of the university of California, Irvine shows that the proposed method outperforms existing methods

Crossref

Simorgh Research Repository

A Rank Minrelation - Majrelation Coefficient

Author: Meyer Patrick E.
Publication venue
Publication date: 09/05/2013
Field of study

Improving the detection of relevant variables using a new bivariate measure could importantly impact variable selection and large network inference methods. In this paper, we propose a new statistical coefficient that we call the rank minrelation coefficient. We define a minrelation of X to Y (or equivalently a majrelation of Y to X) as a measure that estimate p(Y > X) when X and Y are continuous random variables. The approach is similar to Lin's concordance coefficient that rather focuses on estimating p(X = Y). In other words, if a variable X exhibits a minrelation to Y then, as X increases, Y is likely to increases too. However, on the contrary to concordance or correlation, the minrelation is not symmetric. More explicitly, if X decreases, little can be said on Y values (except that the uncertainty on Y actually increases). In this paper, we formally define this new kind of bivariate dependencies and propose a new statistical coefficient in order to detect those dependencies. We show through several key examples that this new coefficient has many interesting properties in order to select relevant variables, in particular when compared to correlation

arXiv.org e-Print Archive

CiteSeerX

An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

Author: bradley
duda
gu
he
hsu
liu
luo
nie
tang
yu
zaffalon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/08/2017
Field of study

Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.Comment: European Signal Processing Conference 201

arXiv.org e-Print Archive

Crossref

Weighted Heuristic Ensemble of Filters

Author: Aldehim Ghadah
Wang Wenjia
Publication venue
Publication date: 22/12/2015
Field of study

Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability

Crossref

University of East Anglia digital repository

Отбор информативных геометрических признаков ядер клеток на люминесцентных изображениях раковых клеток

Author: M. M. Yatskou
P. D. Pavel D. Kryvasheyeu
V. V. Apanasovich
V. V. Skakun
Ya. U. Lisitsa
В. В. Апанасович
В. В. Скакун
Е. В. Лисица
Н. Н. Яцков
П. Д. Кривошеев
Publication venue: UIIP NASB
Publication date: 06/12/2018
Field of study

The methods of geometric informative features selection of nuclei on fluorescent images of cancer cells are considered. During the survey, a review of existing geometric features was carried out, including both the signs of rotation resisted shape and displacement of the image, as well as signs of location in space. For the selection of characteristics, the methods were used: median, correlation with calculation of the Pearson correlation coefficient, correlation with calculation of the Spearman correlation coefficient, logistic regression model, random forest with CART trees and Gini criterion, random forest with CART trees and error minimization criterion. As a result of the investigation 11 characteristics were selected from 59 features, the quality of classification and time costs were calculated depending on the number of features for describing the objects. The use of 11 features is sufficient for the accuracy of classification as it allows to reduce time costs in 2,3 times.Рассмотрены методы отбора информативных признаков для выделения геометрических признаков при описании ядер на люминесцентных изображениях раковых клеток. Выполнен обзор существующих геометрических признаков, который включает в себя как признаки формы, устойчивые к повороту и перемещению изображения, так и признаки расположения в пространстве. Для отбора наиболее информативных признаков использованы шесть методов: медианный, корреляционный с расчетом коэффициента корреляции по Пирсону, корреляционный с расчетом коэффициента корреляции по Спирмену, метод логистической регрессии, случайного леса с CART-деревьями и критерием Gini, случайного леса с CART-деревьями и критерием минимизации ошибки. В результате исследования из 59 признаков отобраны 11 наиболее информативных, выполнен анализ качества классификации с помощью метода случайного леса и рассчитаны временные затраты в зависимости от количества признаков для описания объектов. Для метода случайного леса использование 11 признаков является достаточным по точности классификации и позволяет снизить временные затраты в 2,3 раза

Informatics (E-Journal) / Информатика

An Effective Algorithm for Correlation Attribute Subset Selection by Using Genetic Algorithm Based On Naive Bays Classifier

Author: Kumar M. A. (Mr)
Sharma M. R. (Mr)
Sharma M. S. (Mr)
Publication venue: 'Engineering Research Publication ERP'
Publication date: 01/06/2017
Field of study

In recent years, application of feature selection methods in various datasets has greatly increased. Feature selection is an important topic in data mining, especially for high dimensional datasets. Feature selection (also known as subset selection) is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of a learning algorithm. The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. The challenging task in feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without increasing the complexity of the modeling task. Feature selection that selects a subset of most salient features and removes irrelevant, redundant and noisy features is a process commonly employed in machine learning to solve the high dimensionality problem. It focuses learning algorithms on most useful aspects of data, thereby making learning task faster and more accurate. A data warehouse is designed to consolidate and maintain all features that are relevant for the analysis processes

Neliti